JL Computer Consultancy

Index Efficiency (again)

Sep 2004


How much work does it take to decide if an index is a lot bigger than it ought to be? For many people the answer will be “not a lot – you just run validate index, or the equivalent analyze command and look at index_stats for the answer”.

That’s good enough for some systems – some systems don’t have many indexes, or have plenty of down-time every night or weekend and don’t mind that the validate action locks the underlying table while the walk through the index is running. On bigger, busier systems you may wish to be more selective and less aggressive.

So, if you do have a very large, busy, system with a few important indexes that you suspect might be candidates for re-organising, how do you decide whether the benefit is likely to be worth the cost – and how do you make that decision without crippling the system to find out?

Here’s one example of how you can get a rough idea of how big an index “ought” to be without doing nasty things to the system. It’s a technique that I’ve been using (on paper, at least) since the early days of Oracle Version 6 when Oracle introduced the current structure of their B-tree indexes. It’s not something that I’ve had to do very often, so I’ve never got around to writing a script to do it – especially since there are so many little variations on a theme. But since I was asked about this topic at a recent presentation, here’s an outline.

Warning – the purpose of this script is to tell you roughly how big index would be if it were a bland, boring index that had grown over time as data has arrived, been updated, and been deleted in random fashion. There are classes of indexes – dictated by business usage – that do not meet that behaviour pattern. This script is not for the amateur “tuning-twiddler”, it is for the thoughtful professional who understands the systems they have to care for. The notes at the start of script list (I hope comprehensively) the limitations that apply. (See also the ideas and code for another index analysis mechanism).

Updated 16th May 2005:

Someone raised a question about the second query which reports lf_actual, and adjusted_70. This made me realize that I needed to explain why that query was there – which I have done in the comments in the script.

I also re-ran the script against 10.1.0.4 and 9.2.0.6, and found that the final query against user_indexes reported the wrong values for the number of leaf blocks. For some unknown reason, the dbms_stats call had apparently not updated the statistics in the data dictionary – although a block dump showed that the index had been rebuilt as required.

One last point – the “analyze compute statistics” command counts the number of leaf blocks in the index structure; the dbms_stats.gather() procedures count the number of index leaf blocks currently holding rows. In special cases, these numbers may be very different from each other. Some indexes may have lots of empty leaf blocks waiting on the freelist to be re-used. One mechanism counts them as leaf blocks, the other does not.

Updated 25th April 2009:

Whilst I’ve often done the type of calculation suggested the script below, it’s only been against one or two critical indexes, and I’ve done the entire approximation in my head. Consequently I’ve never tested the script to against a large batch of indexes – until a client recently asked me to check every single script in their system. So I tool the basic script, switched it to the dba_* views, used dba_tab_cols instead of dba_tab_columns (thus addressing the “function-based indexes” issue, and wrapped the whole thing in a pl/sql block – and discovered two errors in the script that I hadn’t previously noticed.

First – The avg_col_len allows for the number of nulls in a column, so the script doesn’t need the (num_rowsnum_nulls) factor. (As a side effect, this means that the rounding of avg_col_len when multiplied up by num_rows can introduce a very significant error in the estimate).

Secondly – if a column in an index is defined as descending it does take one extra byte of storage per row in the index, but the column name that appears in user_ind_columns is not the original column name, it is a name like sys_nc000013$, which will appear in user_tab_cols but not in user_tab_columns; and the value for avg_col_len in user_tab_cols allows for the one extra byte. So the code, as it stands, loses descending columns, and when corrected to capture descending columns doesn’t need the desc_ind calculation.

I hope to publish a corrected and more helpful version of the code (including the full pl/sql wrapper) on my blog some time fairly soon.

Updated 28th Feb 2010:

The corrected version of the code including the pl/sql wrapper is now on my blog, along with a few supporting notes.

Back to Index of Topics


 

rem

rem

rem   Script:     index_estimate.sql

rem   Author:     Jonathan Lewis

rem   Dated:      Aug 2004

rem   Purpose:    How big should an index be.

rem

rem   Notes:

rem   Last tested 10.1.0.4 with 8K blocksize

rem   Last tested  9.2.0.6 with 8K blocksize

rem   Last tested  8.1.7.4 with 8K blocksize

rem

rem   A quick and dirty script to work out roughly

rem   how big a simple b-tree index should be.

rem

rem   Based on a manual method I first used with v6 when

rem   I found the odd index that looked suspiciously large

rem

rem   The concept is simple -

rem         Number of entries in index * entry overhead plus

rem               data content of index entries

rem         Assume 100 bytes per block overhead

rem         Assume 70% packing      (effectively pctfree 30)

rem         Allow 1% for branch blocks

rem

rem   It's not accurate, but it doesn't need to be if

rem   all you want to know is whether the index about

rem   the right size or way off.

rem

rem   Room for error:

rem         Can't cope with rounding errors on avg_col_len

rem         Can't cope with generic function-based indexes

rem         Doesn't allow for compressed indexes

rem         Doesn't allow for large ITLs

rem         Doesn't allow for 'long' columns (>127 bytes)

rem         Doesn't allow for side-effects of null columns

rem

rem   Needs a recent stats collection on the table and index

rem         If you use analyze, then add one to avg_col_len

rem         If you use dbms_stats, don't.

rem

rem   In this example, the block size is hard coded at 8K

rem

rem   If you want to modify this code to produce a version that

rem   handles partitioning, the rowid for a global index or

rem   globally partitioned index is 10 bytes, not 6.

rem

rem   There are a few other considerations for dealing with IOTs

rem   and secondary indexes on IOTs.

rem

 

 

drop table t1;

 

create table t1

nologging         -- adjust as necessary

pctfree 10        -- adjust as necessary

as

select

      owner,

      object_type,

      object_name,

      object_id,

      last_ddl_time

from

      all_objects

where

      rownum <= 10000

;

 

 

create index t1_i1 on t1(owner, object_type, object_name);

 

begin

      dbms_stats.gather_table_stats(

            user,

            't1',

            cascade => true,

            estimate_percent => null,

            method_opt => 'for all columns size 1'

      );

end;

/

 

 

rem

rem   To start with, we want:

rem         index_entries (user_indexes.num_rows) * (

rem               rowid entry (6 or 7 depending on uniqueness)

rem               +

rem               4 (rowindex entry + 'row' overhead)

rem         )

rem         +

rem         sum((avg_col_len) * (user_tables.num_rows - num_nulls))

rem

rem   Note: for descending columns, add one per column

rem   Note: if you use ANALYZE, you need avg_col_len + 1

rem   Note: if applied to global (partitioned) indexes, the rowid is 10 bytes

rem

 

 

prompt

prompt      From a purely arithmetic approach, this is

prompt      an estimate of how big the index is likely

prompt      to be if rebuilt with pctfree 30.

prompt

 

select

      round(

            100/70 *                -- assumed packing efficiency

            1.01 *                  -- allow for branch blocks

            (

                  ind.num_rows * (6 + uniq_ind + 4) + -- fixed row costs

                  sum(

                        (avg_col_len + desc_ind) *

                        (tab.num_rows - num_nulls)

                  )                             -- column data bytes

            ) / (8192 - 100)                    -- block size  - block overhead

      )                       index_block_estimate_70

from  (

      select      /*+ no_merge */

            num_rows,

            decode(uniqueness,'UNIQUE',0,1)     uniq_ind

      from  user_indexes

      where index_name = 'T1_I1'

      )                       ind,

      (

      select      /*+ no_merge */

            num_rows

      from  user_tables

      where table_name = 'T1'

      )                       tab,

      (

      select      /*+ no_merge */

            column_name,

            decode(descend,'ASC',0,1)     desc_ind

      from  user_ind_columns

      where index_name = 'T1_I1'

      )                       ic,

      (

      select      /*+ no_merge */

            column_name,

            avg_col_len,

            num_nulls

      from  user_tab_columns

      where table_name = 'T1'

      )                       tc

where

      tc.column_name = ic.column_name

group by

      ind.num_rows,

      ind.uniq_ind

;

 

rem

rem   We know that we have just built this index at

rem   90% efficiency (pctfree 10), so let’s get an

rem   estimate of the probable size it would be at

rem   70% efficiency (pctfree 30) using a quick and

rem   dirty query.  note – on a production system,

rem   the adjusted_70 value from this query will not

rem   mean anything.

rem

 

prompt

prompt      Comparison figure – known leaf_blocks

prompt      at a known 90%, scaled to see what the

prompt      index would look like at 70%

prompt

     

select     

      leaf_blocks                   lf_actual,

      round(leaf_blocks * 90/70)    adjusted_70

from  user_indexes

where index_name = 'T1_I1'

;

 

alter index t1_i1 rebuild pctfree 30;

 

begin

      dbms_stats.gather_table_stats(

            user,

            't1',

            cascade => true,

            estimate_percent => null,

            method_opt => 'for all columns size 1'

      );

end;

/

 

prompt

prompt      Leaf blocks when rebuilt at 70% packing

prompt

 

select

      leaf_blocks

from  user_indexes

where index_name = 'T1_I1'

;

 

 


Back to Index of Topics