Author's name: Jonathan Lewis
Author's Email: Jonathan@jlcomp.demon.co.uk
Date written: 4th Jan 2007
Oracle version(s): Various
Our data warehouse is using analyze to gather stats, because when we tried to use dbms_stats on partitioned tables, it takes days to gather stats on a 100 million row table even with a very small sample size. Why is this happening ?
The person who last posed this question didn’t mention a version number – but the problem appears in many versions of Oracle as you make the change from one technology to the other, mainly because the activity carried out by the dbms_stats does not, by default, match the activity carried out by the analyze command.
Not only does dbms_stats differ from the analyze command in its behaviour, virtually every version of Oracle has introduced a few extra parameters in the calls to dbms_stats and even changed some of the default values from previous versions, so that calls to dbms_stats that used to complete in a timely fashion in one version of Oracle suddenly take much longer after an upgrade – it’s not just the switch from analyze to dbms_stats that causes problems.
In the worst case, when left to its own devices, dbms_stats in 10g will work out for itself the best sample size to use on the tables and indexes, which columns to create histograms for, the number of buckets in the histograms, and the sample size to use for those histograms. The volume of work may be much larger than anything you would choose to do yourself.
The first guideline for using dbms_stats is: read the manual – or the script $ORACLE_HOME/rdbms/admin/dbmsstat.sql to see what it does, and how it changes from release to release. This gives you some chance of making it do what you used to do with the analyze command. The second guideline is to try a test run with lots of tracing (e.g. events 10046, 10033, and calls to v$session_event and v$sesstat) set so that you make sure that you can see how much work the tools do, and where they are losing time.
Partitioned tables have always been a particular problem for statistics – and Oracle is constantly fiddling with the code to try and reduce the impact of collecting reasonable statistics. If you have large partitioned tables, and don’t have to collect global statistics, bear in mind that you probably know more about the data than Oracle does. The best strategy for partitioned tables (probably) is to write your own code that limits Oracle to collecting statistics on the most recently added partition and then derives reasonable table level statistics programmatically by combining the latest statistics with the existing table stats. Look very carefully at the procedure with names like get_column_stats, set_column_stats.
Further reading: $ORACLE_HOME/rdbms/admin/dbmsstat.sql