Problems with System Statistics (CPU Costing)
Update Jan 2007
System Statistics – also known as CPU Costing – is a feature of Oracle 9 that allows the optimizer to generate a better estimate of the completion time (cost) of a query. There is an article with the title Understanding System Statistics that I wrote for OTN quite recently, so I won’t go into details here.
The conclusion of the article points out that system statistics are really important – the step of enabling them is almost as dramatic as the change from Rule Based Optimization (RBO) to Cost Based Optimization (CBO), and if you are still running Oracle 8i then part of your migration testing should include enabling system statistics and dealing with any side-effects.
Although the article is very positive about system statistics there is a problem to be faced; a problem that is similar to the problem I raised in an article with the title Oh I see a problem where I discuss the parameter optimizer_index_cost_adjust.
Bear in mind that system statistics and the optimizer_index_cost_adjust parameter have a similar effect – they allow the optimizer to give single block reads and multi-block reads different costs. The big difference between them is that the calculations that use system statistics tend to scale UP the cost of a MULTI-block read to make it more expensive, whilst the calculations that use the optimizer_index_cost_adjust scale DOWN the cost of a SINGLE block read to make it cheaper. Since the results of these calculations are then subject to rounding (which changes in 10g), the impact of the optimizer_index_cost_adjust is to exaggerate rounding errors and make the optimizer switch from a good index to a bad index !
So using system statistics is a better bet than fiddling with the optimizer_index_cost_adjust. But where is the common problem? The answer lies in that (now infamous) problem of trying to “tune a system”. System-level averages hide process-level details – and the semi-automatic generation of system statistics uses system-level averages.
If you enable sql_trace before doing a dbms_stats.gather_system_stats(‘start’)and corresponding dbms_stats.gather_system_stats(‘stop’), you will find that the calls simply take snapshots of v$filestat (actually the underlying x$kcfio) and v$sysstat to find out how much I/O has taken place in the interval, the number of single reads, the number of multi-block reads, and the I/O times. Based on these figures, Oracle works out the mreadtim, sreadtim, and mbrc for the period.
The problem is this: if you have a session which does frequent “large”** table scans or index fast full scans in the interval, then these will turn into requests to the file-system and register as physical read requests. But if you have file-system buffering enabled – or some other form of buffer between Oracle and the real physical disk device – then the time taken for these reads will be (artificially) small and, on average, Oracle will acquire an unrealistic idea of the cost of a ‘typical’ multi-block read. This is a particular problem in a SAN environment, as SANs tend to have very good algorithms for handling high volume reads, using predictive read-ahead methods to pre-load the SAN cache.
In other words: if you have enough bad SQL in your system, then using the gather_system_stats() routine to collect system statistics could lead to the optimizer thinking that table scans and index fast full scans were really cheap and efficient, and doing lots more of them. Of course, if you have a perfectly designed and written system this problem won’t affect you; unfortunately there are plenty of systems around the world that are far from perfect and will run into this problem.
The solution to this problem (as far as I can tell) is to use the calls in dbms_stats to acquire the cpuspeed and mbrc, but use an external calibration tool (such as the program available on www.iozone.org, perhaps, or Oracle’s recently launched Orion) to get realistic times for random I/Os of both your standard block size and random I/Os of the size indicated by the value of mbrc that you acquired. Remember that you want to do this calibration at the same ‘representative processing’ time, as the response time may vary with your Oracle workload.
As a closing thought – if you are using system statistics then you can still set the optimizer_index_cost_adjust – but for values between 1 and 100 you might like to think of it as a way of telling Oracle about the percentage of table block visits that will not be cached. As an indication of the effect this has, you may have the parameter set to about 30 before enabling system statistics, and change it to 90 after enabling system statistics – the swing from relatively low to relatively high would not be surprising.
** Note – a “large” tablescan is one that will be read into the tail-end of the LRU (least recently used) list hence rapidly discarded without crowding out the rest of the buffer cache. Note that a table is not “large” until its size exceeds two percent of the buffer cache.