JL Computer Consultancy

Is this an index I use ?

April 2006


A few days ago I was contemplating the question of how you check whether an index is useful (and used) or whether it could be dropped. Having decided that there was a reasonably helpful strategy for identifying such indexes, I started to write this note.

Unfortunately (or possibly fortunately), I received an email from James Morle before the note was finished asking me if a hypothesis he had had about an odd optimization problem was true.  I explained that it was, and whipped up a quick demonstration of the issue. His reply was “Very smart. But also makes the age-old question of 'which indexes do we actually use' infinitely more complex!” And he was, of course, right. It is possible to have an index which isn’t actually used, but which has to exist because it makes the optimizer do the right thing.


First, the mechanism for identifying possible candidates – if your are running Oracle 9i or later.

One of the many enhancements that came in with 9i was the dynamic performance view v$segstat (with an associated view v$segment_statistics – which you shouldn’t normally use because it is far more resource-intensive for very little added value). The view gives you a dozen or so statistics per segment relating to work done on that segment. The list of possible statistics is given in the view v$segstat_name, which lists the following for Oracle 9i (you’ll notice that there may be hidden ones when you check the statistic# column – there are also of couple of new statistics in 10g):

STATISTIC# NAME                                     SAM

---------- ---------------------------------------- ---

         0 logical reads                            YES

         1 buffer busy waits                        NO

         2 db block changes                         YES

         3 physical reads                           NO

         4 physical writes                          NO

         5 physical reads direct                    NO

         6 physical writes direct                   NO

         8 global cache cr blocks served            NO

         9 global cache current blocks served       NO

        10 ITL waits                                NO

        11 row lock waits                           NO

Notice in particular that there are statistics about “logical reads” and “db block changes” and think how these numbers relate to the use of indexes.

If you do nothing but inserts/updates/deletes to the index (in other words the index exists, but is never used for simple queries) then each time you change an entry, you will probably read the root block, a branch block, and a leaf block, before changing the leaf block – which means that the number of “logical reads” will be about three (or, for large indexes, four) times the number of “db block changes”.

If you make use of the index to satisfy queries, then a typical system is likely update the index a relatively small number of times compared to the number of times it is used for querying, and the number of “logical reads” will be much larger than the number of “db block changes”.

So run a query like this against v$segstat, and see what clues you get:

select

        statistic_name, value

from    v$segstat

where

        obj# = {object_id from dba_objects}

and     dataobj# = {data_object_id from dba_objects}

and     value != 0

order by

        statistic#

;

 


As an example – the following two sets of output came from a couple of very simple demonstrations. In the first case I created a table (with a CTAS), indexed it, then did a few hundred inserts and deletes on the table, but no queries.

STATISTIC_NAME                           VALUE

----------------------------------- ----------

logical reads                             6192

db block changes                          2096

physical reads                               7

physical writes                              7

physical writes direct                       7

 

In the second case after creating and indexing the table, I ran a few thousand simple queries that used the index.

STATISTIC_NAME                           VALUE

----------------------------------- ----------

logical reads                            18329

db block changes                            87

physical reads                               7

physical writes                              7

physical writes direct                       7

 

In this small and extreme example, the difference in behaviour is clearly visible.

There are several warnings that go with this strategy. First, and foremost, it can ONLY be an indicator, not a guarantee that the index is not used for queries; you still have to decide, based on you knowledge of the application whether the index is likely to have a real purpose in life – but it does help you focus on the more likely candidates by taking advantage of existing overheads.

There are also a couple of version-dependent bugs. You’ll notice the “order by” clause in my query.  That’s there as a workaround to a problem in some versions (possibly only 9i, I don’t recall checking 10g) where you get a memory leak in the SGA every time you query this view – the leak doesn’t happen if you have an “order by” in your query. You will also see that I included the obj# and dataobj# in my query selection – this is necessary for things like clusters, of course, but also when your move, rebuild, or truncate objects, they get new data object ids. Unfortunately, you may find that v$segstat is not cleaned up properly when you do things like truncates, move and rebuild; statistics for older versions of objects may hang around.

 


And now – the problem.

Here’s an extract from a script I wrote in answer to James Morle’s original question:

set autotrace traceonly explain

 

select

        t1.small_vc, t2.small_vc, t3.small_vc

from

        t1, t2, t3

where

        t1.n1 between 40 and 50

and     t1.id1 = t2.id1

and     t1.ind_pad = t2.ind_pad

and     t1.id2 = t2.id2

and     t3.id = t1.id1

;

 

drop index t2_i1;

 

select

        t1.small_vc, t2.small_vc, t3.small_vc

from

        t1, t2, t3

where

        t1.n1 between 40 and 50

and     t1.id1 = t2.id1

and     t1.ind_pad = t2.ind_pad

and     t1.id2 = t2.id2

and     t3.id = t1.id1

;

set autotrace off

Note that the query is the same query in both cases, but I have dropped an index (on the t2 table) before running it the second time. This is the result of running the script on 9.2.0.6:

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=274 Card=484 Bytes=64856)

   1    0   HASH JOIN (Cost=274 Card=484 Bytes=64856)

   2    1     HASH JOIN (Cost=18 Card=484 Bytes=57596)

   3    2       TABLE ACCESS (FULL) OF 'T2' (Cost=5 Card=20 Bytes=1160)

   4    2       TABLE ACCESS (FULL) OF 'T1' (Cost=12 Card=484 Bytes=29524)

   5    1     TABLE ACCESS (FULL) OF 'T3' (Cost=255 Card=5000 Bytes=75000)

 

Index dropped.

 

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=66 Card=48 Bytes=6432)

   1    0   NESTED LOOPS (Cost=66 Card=48 Bytes=6432)

   2    1     HASH JOIN (Cost=18 Card=48 Bytes=5712)

   3    2       TABLE ACCESS (FULL) OF 'T2' (Cost=5 Card=20 Bytes=1160)

   4    2       TABLE ACCESS (FULL) OF 'T1' (Cost=12 Card=484 Bytes=29524)

   5    1     TABLE ACCESS (BY INDEX ROWID) OF 'T3' (Cost=1 Card=1 Bytes=15)

   6    5       INDEX (UNIQUE SCAN) OF 'T3_PK' (UNIQUE)

The plan has changed because I dropped the index – which, at first sight, you may think is not surprising. But look closely at the original execution plan. It does not use the index I have dropped. The plan really has changed (and the query does use these plans) because I’ve dropped an index I wasn’t using. How can that possibly make sense?

The answer is in an optimizer feature described in the 10.2 optimizer trace file (10053) as the ‘Concatenated index card’ sanity check. (The same thing appears in the 8i and 9i arithmetic, it’s just that the feature is reported explicitly in 10.2). Look at the “Card=” component of the hash join line in both plans. The predicted number of rows in the join between t1 and t2 has changed dramatically. The index I dropped (t2_i1) matched the join predicates between t1 and t2 precisely, so the optimizer used the distinct_keys value for the index when working out the join cardinality – even though it then declined to use the index to execute the join operation.

So – even when you demonstrate conclusively (if you can find a way of doing so) that an index is not actually used at run-time; you may not be able to drop it because its statistics are used at optimize-time. However, it looks as if this particular sanity check applies only to multi-column unique indexes – at present - so it’s not quite as much of a threat – at present – as it might seem.


Conclusion

Like all the other methods for deciding whether or not an index can be dropped, checking v$segstat is only an indicator of a possible candidate – you still have to know what your application is doing in order to be sure that the index does not exist to meet some specific functional requirement.

However, even if an index appears never to be used, the statistics about that index may be contributing to the optimizer’s calculations in ways that would make some execution plans change (potentially for the worse) if the index disappeared. Fortunately, to cause this problem, the index currently has to be a unique, concatenated index – which narrows the field a little.


Back to Index of Topics