JL Computer Consultancy

Bug hunting with 10053

Jan 2007


If Oracle seems to be ignoring a hint, it’s either because you’ve made a mistake in the syntax, or in the use of the hint, or because you’ve hit an Oracle bug. And sometimes the only way to work out what’s going on is to look at the 10053 trace file.

In a recent posting on the comp.databases.oracle.server newsgroup, someone came up with an example of Oracle 10gR2 behaving a little strangely with the index_ss() hint, which should make Oracle do a skip scan on a specified index – if it’s legal. Their code – slightly stripped for brevity and anonymity – looked like this (prsn_id is the second column of the primary key).

Select

{list of columns}

from  (

            select --+ index_ss(tf tf_pk)

                  {list of columns}

            from

                  tf,

                  tt,

                  ot

            where

                  tf.trtp_id = tt.trtp_id

            and   ot.transaction_type_dscr = tt.transaction_type_dscr

      )

where

tf.prsn_id = 5485186

;

The execution plans, taken from the 10053 trace file, were as follows:

====================

Plan Table  (Hinted)

====================

-----------------------------------------------------+-----------------------------------+---------------+

| Id  | Operation                           | Name   | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |

---------------------------------------------------------------------+-------------------+---------------+

| 0   | SELECT STATEMENT                    |        |       |       | 2322K |           |       |       |

| 1   |  TABLE ACCESS BY GLOBAL INDEX ROWID | TF     |     1 |    69 |   46K |  00:11:53 | ROW L | ROW L |

| 2   |   NESTED LOOPS                      |        |    57 |  8037 | 2322K |  09:15:44 |       |       |

| 3   |    MERGE JOIN                       |        |    51 |  3672 |     7 |  00:00:01 |       |       |

| 4   |     TABLE ACCESS BY INDEX ROWID     | OT     |    51 |  2346 |     2 |  00:00:01 |       |       |

| 5   |      INDEX FULL SCAN                | OT_O1  |    51 |       |     1 |  00:00:01 |       |       |

| 6   |     SORT JOIN                       |        |    51 |  1326 |     5 |  00:00:01 |       |       |

| 7   |      TABLE ACCESS FULL              | TT     |    51 |  1326 |     4 |  00:00:01 |       |       |

| 8   |    INDEX FULL SCAN                  | TF_PK  |     6 |       |   46K |  00:11:53 |       |       |

-----------------------------------------------------+-----------------------------------+---------------+

 

 

======================

Plan Table  (Unhinted)

======================

-----------------------------------------------+-----------------------------------+---------------+

| Id  | Operation                      | Name  | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |

-----------------------------------------------+-----------------------------------+---------------+

| 0   | SELECT STATEMENT               |       |       |       |   28K |           |       |       |

| 1   |  HASH JOIN                     |       |    57 |  8037 |   28K |  00:07:48 |       |       |

| 2   |   MERGE JOIN                   |       |    51 |  3672 |     7 |  00:00:01 |       |       |

| 3   |    TABLE ACCESS BY INDEX ROWID | OT    |    51 |  2346 |     2 |  00:00:01 |       |       |

| 4   |     INDEX FULL SCAN            | OT_01 |    51 |       |     1 |  00:00:01 |       |       |

| 5   |    SORT JOIN                   |       |    51 |  1326 |     5 |  00:00:01 |       |       |

| 6   |     TABLE ACCESS FULL          | TT    |    51 |  1326 |     4 |  00:00:01 |       |       |

| 7   |   PARTITION RANGE ALL          |       |    57 |  3933 |   28K |  00:07:48 | 1     | 37    |

| 8   |    TABLE ACCESS FULL           | TF    |    57 |  3933 |   28K |  00:07:48 | 1     | 37    |

-----------------------------------------------+-----------------------------------+---------------+

Clearly, the presence of the hint has affected the execution plan; but you see in line eight of the hinted plan that the specified index has been used for a full scan, not for a skip scan. So has Oracle not ignored the hint, but it hasn’t done the right thing. To me, this clearly says: “bug” – but how do you confirm that? You’ve got to look at the 10053 trace file.

To demonstrate my point, I’ve picked out the “single table access” sections to show that Oracle is obeying the hint; then I’ve picked up the parts of a “Now Joining” section to show that Oracle is (still) obeying the hint; then I’ve picked out an example of a “Now Joining” section that shows Oracle going wrong. Finally I’ve extracted all the join “Join Order” lines to show you when Oracle did the right thing and when it did the wrong thing.

Single table access paths:

From the unhinted trace file – note that the presence of the calculations for tablescan, index skip scan, and index full scan, with the skip scan rejected and the tablescan accepted.

SINGLE TABLE ACCESS PATH

  Column (#2): PRSN_ID(NUMBER)

    AvgLen: 5.00 NDV: 229384 Nulls: 0 Density: 4.3595e-06 Min: -1 Max: 5766028

  Table: SDWH_TCKT_TRNSCTN_FCT  Alias: TF    

    Card: Original: 12965145  Rounded: 57  Computed: 56.52  Non Adjusted: 56.52

  Access Path: TableScan

    Cost:  29085.71  Resp: 29085.71  Degree: 0

      Cost_io: 28450.00  Cost_cpu: 5147272783

      Resp_io: 28450.00  Resp_cpu: 5147272783

kkofmx: index filter:"TF"."PRSN_ID"=5485186 AND "TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

  Access Path: index (skip-scan)

    SS sel: 4.3595e-06  ANDV (#skips): 4388009

    SS io: 4388009.00 vs. table scan io: 28450.00

    Skip Scan rejected

  Access Path: index (FullScan)

    Index: SDWH_TCTR_FCT_PK

    resc_io: 46226.00  resc_cpu: 2935717121

    ix_sel: 1  ix_sel_with_filters: 4.3595e-06

    Cost: 46668.64  Resp: 46668.64  Degree: 1

  Best:: AccessPath: TableScan

         Cost: 29085.71  Degree: 1  Resp: 29085.71  Card: 56.52  Bytes: 0

Now compare this with the section from the hinted trace file. Note that the only access path considered is the skip scan, the hint is obeyed. It strikes me as a little odd that the calculations are done twice, and don’t quite match – this may (I guess) be part of the bugginess that we are about to expose.

SINGLE TABLE ACCESS PATH

  Column (#2): PRSN_ID(NUMBER)

    AvgLen: 5.00 NDV: 229384 Nulls: 0 Density: 4.3595e-06 Min: -1 Max: 5766028

  Table: TF  Alias: TF    

    Card: Original: 12965145  Rounded: 57  Computed: 56.52  Non Adjusted: 56.52

kkofmx: index filter:"TF"."PRSN_ID"=5485186 AND "TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

  Access Path: index (skip-scan)

    SS sel: 4.3595e-06  ANDV (#skips): 4388009

    SS io: 4388009.00 vs. table scan io: 28450.00

    Skip Scan chosen

  Access Path: index (SkipScan)

    Index: SDWH_TCTR_FCT_PK

    resc_io: 4388065.00  resc_cpu: 42752426477

    ix_sel: 4.3595e-06  ix_sel_with_filters: 4.3595e-06

    Cost: 4393345.11  Resp: 4393345.11  Degree: 1

  Best:: AccessPath: IndexRange  Index: SDWH_TCTR_FCT_PK

         Cost: 4393345.11  Degree: 1  Resp: 4393345.11  Card: 56.52  Bytes: 0

So now we look at a “Now Joining” section when Oracle did the right thing – this is from the join order {OT, TF, TT}, which happens to be the fourth join order examined. We start with the unhinted example and, as you can see, Oracle evaluates the costs of using a tablescan, or index full scan, to access TF as the inner table. (Strangely, no sign of a skip scan calculation  - could this possibly mean that skip scans are actually not allowed for joins ?).

***********************

Join order[4]:  OT[OT]#1  TF[TF]#2  TT[TT]#0

***************

Now joining: TF [TF]#2

***************

NL Join

  Outer table: Card: 51.00  Cost: 4.01  Resp: 4.01  Degree: 1  Bytes: 46

  Inner table: TF  Alias: TF

  Access Path: TableScan

    NL Join:  Cost: 1483288.21  Resp: 1483288.21  Degree: 0

      Cost_io: 1450867.00  Cost_cpu: 262510981535

      Resp_io: 1450867.00  Resp_cpu: 262510981535

kkofmx: index filter:"TF"."PRSN_ID"=5485186 AND "TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

  Access Path: index (FullScan)

    Index: TF_PK

    resc_io: 46226.00  resc_cpu: 2935717121

    ix_sel: 1  ix_sel_with_filters: 4.3595e-06

    NL Join: Cost: 2380104.44  Resp: 2380104.44  Degree: 1

      Cost_io: 2357530.00  Cost_cpu: 182782762539

      Resp_io: 2357530.00  Resp_cpu: 182782762539

  Best NL cost: 1483288.21

          resc: 1483288.21 resc_io: 1450867.00 resc_cpu: 262510981535

          resp: 1483288.21 resp_io: 1450867.00 resp_cpu: 262510981535

Join Card:  2882.60 = outer (51.00) * inner (56.52) * sel (1)

Join Card - Rounded: 2883 Computed: 2882.60

Join order aborted: cost > best plan cost

When we switch to the hinted trace, this is what we see in the same part of the trace file. Oracle evaluates only the option for a skip scan (twice, again). So it has obeyed the hint – it did a calculation that hasn’t appeared by default, and it didn’t do the calculations that it would have done by default.

This is an important point about hints, by the way: some of them mean “do calculation X”, but that isn’t the same as an order to “perform operation X”. For those hints, the optimizer has to do the calculation that it wouldn’t otherwise do, but it can still say “that’s not the cheapest option”.

***********************

Join order[4]:  OT[OT]#1  TF[TF]#2  TT[TT]#0

***************

Now joining: TF[TF]#2

***************

NL Join

  Outer table: Card: 51.00  Cost: 4.01  Resp: 4.01  Degree: 1  Bytes: 46

kkofmx: index filter:"TF"."PRSN_ID"=5485186 AND "TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

  Access Path: index (skip-scan)

    SS sel: 4.3595e-06  ANDV (#skips): 4388009

    SS io: 4388009.00 vs. table scan io: 28450.00

    Skip Scan chosen

  Access Path: index (SkipScan)

    Index: SDWH_TCTR_FCT_PK

    resc_io: 4388065.00  resc_cpu: 42752426477

    ix_sel: 4.3595e-06  ix_sel_with_filters: 4.3595e-06

    NL Join: Cost: 224060604.37  Resp: 224060604.37  Degree: 1

      Cost_io: 223791319.00  Cost_cpu: 2180373819962

      Resp_io: 223791319.00  Resp_cpu: 2180373819962

  Best NL cost: 224060604.37

          resc: 224060604.37 resc_io: 223791319.00 resc_cpu: 2180373819962

          resp: 224060604.37 resp_io: 223791319.00 resp_cpu: 2180373819962

Join Card:  2882.60 = outer (51.00) * inner (56.52) * sel (1)

Join Card - Rounded: 2883 Computed: 2882.60

Join order aborted: cost > best plan cost

Now, let’s look at a piece of the trace file where something goes wrong. This is the first join order: {TT, OT, TF}. First the unhinted version, and you’ll notice that the optimizer tries four options, the two we saw in the earlier example plus a full scan on a different index (which is possible in this case because columns from both TT and OT are available at this point in this join order) and an option for using bitmap conversion (again because of the availability of columns from both TT and OT):

***************

Now joining: SDWH_TCKT_TRNSCTN_FCT[TF]#2

***************

NL Join

  Outer table: Card: 51.00  Cost: 8.52  Resp: 8.52  Degree: 1  Bytes: 72

  Inner table: TF Alias: TF

  Access Path: TableScan

    NL Join:  Cost: 1483292.72  Resp: 1483292.72  Degree: 0

      Cost_io: 1450871.00  Cost_cpu: 262515101134

      Resp_io: 1450871.00  Resp_cpu: 262515101134

kkofmx: index filter:"TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

kkofmx: index filter:"TF"."PRSN_ID"=5485186 AND "TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

kkofmx: index filter:"TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

  Access Path: index (FullScan)

    Index: TF_I2

    resc_io: 134431.00  resc_cpu: 4595772062

    ix_sel: 1  ix_sel_with_filters: 0.11111

    NL Join: Cost: 2100968.38  Resp: 2100968.38  Degree: 1

      Cost_io: 2081471.00  Cost_cpu: 157868142107

      Resp_io: 2081471.00  Resp_cpu: 157868142107

  Access Path: index (FullScan)

    Index: TF_PK

    resc_io: 46178.00  resc_cpu: 2935227863

    ix_sel: 1  ix_sel_with_filters: 4.8439e-07

    NL Join: Cost: 2377405.55  Resp: 2377405.55  Degree: 1

      Cost_io: 2354834.00  Cost_cpu: 182759342462

      Resp_io: 2354834.00  Resp_cpu: 182759342462

  ****** trying bitmap/domain indexes ******

  Best NL cost: 1483292.72

          resc: 1483292.72 resc_io: 1450871.00 resc_cpu: 262515101134

          resp: 1483292.72 resp_io: 1450871.00 resp_cpu: 262515101134

Join Card:  56.52 = outer (51.00) * inner (56.52) * sel (0.019608)

Join Card - Rounded: 57 Computed: 56.52

Compare this with what happens in the hinted version. Oracle evaluates just one option – ignoring the bitmap options, the tablescan, and the index full scan on the “unspecified” index. All it tries to use is a mechanism involving the index specified in the hint. But it’s the wrong method. I think that this means there is a bug – somewhere a pointer has slipped or a branch has been taken incorrectly, and we’ve followed the wrong code path whilst “trying” to follow the right one.

***************

Now joining: TF [TF]#2

***************

NL Join

  Outer table: Card: 51.00  Cost: 8.52  Resp: 8.52  Degree: 1  Bytes: 72

kkofmx: index filter:"TF"."PRSN_ID"=5485186 AND "TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

kkofmx: index filter:"TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

  Access Path: index (FullScan)

    Index: TF_PK

    resc_io: 46178.00  resc_cpu: 2935227863

    ix_sel: 1  ix_sel_with_filters: 4.8439e-07

    NL Join: Cost: 2377405.55  Resp: 2377405.55  Degree: 1

      Cost_io: 2354834.00  Cost_cpu: 182759342462

      Resp_io: 2354834.00  Resp_cpu: 182759342462

  Best NL cost: 2377405.55

          resc: 2377405.55 resc_io: 2354834.00 resc_cpu: 182759342462

          resp: 2377405.55 resp_io: 2354834.00 resp_cpu: 182759342462

Join Card:  56.52 = outer (51.00) * inner (56.52) * sel (0.019608)

Join Card - Rounded: 57 Computed: 56.52

Having got this far, we have to ask why we saw this happening on one join order, but the correct actions on the other join orders. All over the place in the trace file, the optimizer followed the hint correctly – it’s just one special set of circumstance where it goes wrong. Here’s a summary of the join orders, showing what happened on each nested loop join order:

Join order[1]:  [TT]#0  [OT]#1  [TF]#2          Full Scan

Join order[2]:  [TT]#0  [TF]#2  [OT]#1          Full Scan

Join order[3]:  [OT]#1  [TT]#0  [TF]#2          Full Scan

Join order[4]:  [OT]#1  [TF]#2  [TT]#0                      Skip Scan

Join order[5]:  [TF]#2  [TT]#0  [OT]#1                            N/A (driving table) but Skip Scan

Join order[6]:  [TF]#2  [OT]#1  [TT]#0                            N/A (driving table) but Skip Scan

There is a pattern here, though not immediately obvious. If TT appears in the join order before TF, you get the full scan in the nested loop calculation (I didn’t show you any of the merge or hash join calculations, but they always used the skip scan). Look again at the trace file contents for the nested loop calculations and note the kkofmx lines:

kkofmx: index filter:"TF"."PRSN_ID"=5485186 AND "TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

kkofmx: index filter:"TF"."TRTP_ID"="TT"."TRTP_ID" AND "OT"."TRANSACTION_TYPE_DSCR"="TT"."TRANSACTION_TYPE_DSCR"

When we have a join order where TT appears before TF, the optimizer can “do something” with the predicate "TF"."TRTP_ID"="TT"."TRTP_ID", and for some reason, this results in it calculating the full scan strategy rather than the skip scan strategy. Now, it just happens that trtp_id is another column (the eighth out of nine) in the index referenced in the hint – and maybe this is what is confusing the optimizer at this point.

I advised the original poster to raise an SR with Oracle about this, and also suggested that (just out of curiosity) he might like to change the suspect predicate to tf.trtp_id+0 =tt.trtp_id so that the column could not appear as an access predicate for the index, only as a filter predicate. With this change in place the optimizer re-discovered the skip scan – with an important limitation. Oracle used the skip scan to visit table TF once, and then used a hash join to join the rows to the previous intermediate result set, with the following execution plan:

===========================================

Plan Table  (Hinted, and using trtp_id + 0)

===========================================

-----------------------------------------------------+-----------------------------------+---------------+

| Id  | Operation                            | Name  | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |

-----------------------------------------------------+-----------------------------------+---------------+

| 0   | SELECT STATEMENT                     |       |       |       | 4286K |           |       |       |

| 1   |  HASH JOIN                           |       |    57 |  7980 | 4286K |  17:04:05 |       |       |

| 2   |   MERGE JOIN                         |       |    51 |  3621 |     7 |  00:00:01 |       |       |

| 3   |    TABLE ACCESS BY INDEX ROWID       | OT    |    51 |  2295 |     2 |  00:00:01 |       |       |

| 4   |     INDEX FULL SCAN                  | OT_01 |    51 |       |     1 |  00:00:01 |       |       |

| 5   |    SORT JOIN                         |       |    51 |  1326 |     5 |  00:00:01 |       |       |

| 6   |     TABLE ACCESS FULL                | TT    |    51 |  1326 |     4 |  00:00:01 |       |       |

| 7   |   TABLE ACCESS BY GLOBAL INDEX ROWID | TF    |    57 |  3933 | 4286K |  17:04:05 | ROW L | ROW L |

| 8   |    INDEX SKIP SCAN                   | TF_PK |    57 |       | 4286K |  17:04:04 |       |       |

-----------------------------------------------------+-----------------------------------+---------------+

 

Conclusion:

I can’t claim to have proved that there is a bug here, but I think the evidence is fairly convincing that Oracle is trying to obey the hint but something has gone wrong at a very specific point in the optimizer code path. Of course, it is possible that a skip scan (for reasons we don’t know) is not considered to be a legal access path for the second table in a nested loop join – in which case we have a documentation bug.

I still believe that Oracle doesn’t ignore hints.


Back to Index of Topics