Cache Advice – what could it be doing?
Addendum 23rd Nov 2005
I received an email recently asking me what the connection was between the buffer cache hit ratio (BCHR)and the buffer cache advisory feature (v$db_cache_advice) in Oracle 9i and 10g. The answer is simple – as much, or as little, as you want to pretend.
On one hand, you could start with the claim that the buffer cache hit ratio is a number that compares the count of latched buffer acquisitions (logical I/Os) with the count of Oracle blocks requested from the file system (physical reads) and then point out that the v$db_cache_advice is trying to tell you what this fraction would have been if the buffer cache had been a different size. Starting from this viewpoint, you can certainly read a line from v$db_cache_advice and say “my buffer cache hit ratio is X, and if I had had a buffer cache of size M, then my buffer cache hit ratio would have been Y”. All you have to do take the estd_physical_reads column from v$db_cache_advice and put it into your chosen formula for the buffer cache hit ratio in the place where you would normally put the actual physical read count. (That’s assuming you’ve picked a version of the BCHR formula that actually counts physical reads the same way that Oracle is counting physical reads).
On the other hand, you could point out that all you get from the buffer cache hit ratio is a simple piece of arithmetic based on counts of logical I/Os and physical reads. If you try to work towards the v$db_cache_advice from this viewpoint you have to ask where, in those counts, is the information that could allow an advisor to come up with indications that (for example) 256MB of extra cache won’t make much difference to physical I/O levels but 288MB could cut the physical I/O by 90%. Even if you were to bounce the database every 24 hours, increasing the buffer cache by 16MB every time, you could find a virtually flat trend line in the buffer cache hit ratio – and still be told that a specific value for the cache would make a big difference. The work of the advisor is much more subtle and detailed than anything you could emulate by playing about with the buffer cache hit ratio. (Although you could do something quite useful with v$segstat – segment level statistics – but I’ll have to leave that topic for another day).
So if someone tells you that the buffer cache hit ratio must be a good thing because Oracle has based their v$db_cache_advice technology on it, then they are displaying a lack of understanding about the deficiencies of the buffer cache hit ratio in particular, and how LRU (least recently used) caching mechanisms work in general.
But if someone tells you that you may be able to improve your buffer cache hit ratio by using the information in v$db_cache_advice, they are correct – because it’s been designed to help you choose the best place to allocate the memory you have, given the current workload you typically see, if you want to get a particular buffer cache hit ratio (since startup).
Personally, of course, I would rather be told that I could reduce physical I/O to relieve the stress on the I/O subsystem by X%, or reduce the logical I/O by Y% to reduce CPU cycles and latch contention, thereby improving response times on critical actions by Z% , rather than hear someone say that the target was simply to change an inherently meaningless number.
There are two significant features to the design: the conceptual bit, which is very easy, and the mechanical implementation bit, which is very difficult if you want to do something which is both useful and efficient.
First you have to understand that Oracle runs an LRU chain through the list of buffer headers (x$bh), and implements a touch-count algorithm that indicates the number of visits that a buffer header has received. To keep things simple I am going to pretend that there is only one such chain, even though the buffer cache is usually split into multiple working data sets (x$kcbwds) with an LRU chain (and LRU latch) per set.
When Oracle needs a buffer in order to read a new block into memory it checks the touch count of the buffer header at the end of the LRU chain – if this shows that people have been using the related block the buffer header is moved (relinked) to the top of the LRU chain (and its touch count is reset). If the touch count shows that the block was not popular the buffer is cleared and re-used.
Obviously, if you clear block X from the tail-end buffer, you may have to re-read that block at some later stage. The function of the cache advisory is to let you know whether you could have avoided a later re-read of block X if the buffer cache had been a little larger – just large enough that block X hadn’t got to the tail-end position before someone wanted to re-read it. Conversely, the cache advisory can also let you know that you could safely reduce the buffer cache size without putting block X into jeopardy.
If you want to implement an advisor, the easiest thing to do is to keep two copies of the buffer headers, each with their own LRU chain running through them, and make the second copy much bigger than the primary copy but strip out all the stuff that isn’t relevant to block identities. So when a buffer is dumped from the primary copy its reference could still exist in the secondary copy. Now all you have to do on a block request is compare what would have happened on the secondary LRU with what actually happens on the real (primary) LRU. Let’s do a worked example:
Imagine the buffer cache (hence LRU chain) is 1,000 blocks.
Assume the advisory is supposed to report the effects of having buffer caches sized at 500, 1,000, 1,500 and 2,000 blocks.
We want to read block X, so we check the secondary LRU chain – and happen to find the block at location 1,800. Check the primary LRU, the block is (obviously) not in memory, so we read it. Because we have done a physical read on the primary LRU, we have to promote the block on the secondary LRU (but where to?). Now we record some counts: Since the block was in the secondary LRU at location 1,800, we increment the ‘estimated physical reads’ counter for the three cache sizes less than 1,800 but don’t increment the counter for the largest cache size.
A little later we want to read block X again, so we check the secondary LRU chain – we find that the block is at location 750. Check the primary LRU, the block is in memory, so we don’t need to read it. But we would have needed to read it if the buffer cache had been only 500 blocks. So we increment the ‘estimated physical reads’ counter for the 500 block buffer cache size.
That’s it. The concept is very easy – but then you have to implement it correctly and efficiently.
An alternative strategy – which may, in fact, be the one followed by Oracle – is to have N (in our example 4) copies of the buffer headers, where each copy is a different size (but all copies share the same LRU latches) so that you can deal with the whole LRU/TCH algorithm properly for each copy individually. I suspect that Oracle is not doing this because when I’ve checked the content of x$kcbsh, the block addresses recorded in the structure haven’t shown any signs of overlapping or interleaving.
You couldn’t run two copies of the LRU with a pretend double-size buffer without paying a significant performance penalty. In my conceptual description, every buffer request now does twice the work it used to because two LRU chains have to be searched for every single block request. So there has to be a cunning sampling strategy that trades precision with cost. You can get a clue about this from x$kcbsh (Kernel Cache Buffer Simulated Headers ?) which is broken into sets (set_id) and segments (segidx). The sets correspond to the working data sets (x$kcbwds) of the buffer cache, and each segment holds one row for every 128 rows in the corresponding set from x$bh – a scale factor probably controlled by the hidden parameter _db_cache_advice_batch_size.
If you are going to use a sampling mechanism, then you probably need a clever strategy for deciding how to sample – a round-robin (e.g. every 13th block in the file) is the easiest and least CPU intensive (as in - I only check the secondary LRU if mod(block_id, 13) = 0; and only push a buffer header into the secondary LRU on the same condition). But unless the sampling strategy was a little devious, some systems could end up sampling blocks that gave a very misleading image of how a change in cache size would affect things.
Oracle uses a modified LRU mechanism that loads blocks into the middle of the LRU chain and promotes them to the top of the LRU chain only if they are known to be useful by the time they reach the end of the LRU chain. This makes it much harder to emulate fairly what would happen with a different sized buffer cache. In my example, the block at location 1,800 ought to have its touch count (which isn’t in the secondary buffer header) incremented – in fact I had to promote it, and decided to promoted it to the top of the secondary LRU. Maybe I should have promoted it to position 500 – which is where it would loaded in the primary LRU – or position 1,000 which is where it would have been loaded in a buffer cache matching the size of the next larger advisory limit.
Similar, but opposite, problems arise with the simulation of caches that are smaller than the current cache – you cannot simulate promotions that would have taken place in a reduced cache because the simulated buffer header has to stay where it is for the benefit of all the larger simulation sizes.
The problems of promotion and touch count from the last two paragraphs explain why I made the comment in the concepts section above that Oracle may be maintaining N (where N = 20) copies of the buffer headers – it would certainly allow for a more realistic simulation of block promotion. However, if they are keeping 20 copies, then each copy holds only 1 row for every 128 rows in the real x$bh – so the scope for anomalies based on data distribution goes up.
Even though the idea is simple, the technology, and intelligence, behind the db cache advisories (and the other advisories based on LRU activity) is vastly more sophisticated than the simple minded division of a couple of counters. Don’t be fooled into thinking that the existence of the advisories retrospectively justifies the use of cache hit ratios as targets for performance analysis.
If you are interested in hacking around with Oracle’s implementation, here are a couple of (probably) related details.
Apart from the parameter mentioned above, there is another parameter labeled _db_cache_advice_sample_factor which may have something to do with the frequency with which an event is sampled – perhaps Oracle only plays cache games on every fourth physical read request.
In v$latch_children, you can find at least two latches that are probably relevant, the simulator hash latch (I had 32 of these on an Oracle 9.2 system with 1024 cache buffers chains latches), and the simulator lru latch (and my system showed the same number for this as it showed for the cache buffers lru chain latch.
I’ve just received an email warning me about a bug in 9.2 that can cause problems on highly active systems. If your system is a very busy one and you use alter system to switch db_cache_advice to ON after you have resized the cache then you may crash the instance.
In earlier versions of Oracle there used to be a bug that stopped you from enabling cache advice because the simulator code needed to allocate some memory from the shared pool and could not acquire it. The work around was to ensure that you always started the database with db_cache_advice set to READY so that the memory was pre-allocated though unused.. Of course, if you then increased the size of the cache that did beg the question about how the extra memory gets allocated.
In this latest bug, the crash can occur even if the db_cache_advice is being changed from READY to ON.