JL Computer Consultancy

Trendy new ways with Statspack.

Sept 2004


Updated Sept 2006


From time to time (though not very often) I find that a site I’m visiting has installed Statspack and is using dbms_job to take regular snapshots of their system-level performance. There are various ways in which a string of snapshots can be a useful source of strategic information, so when I find the data available I often end up writing a quick SQL statement to pick up trending information about just one or two critical statistics.

Typically, I would produce a report that took some raw data (say ‘db file sequential read’ waits) from stats$system_event, and converted it into something that showed the number of events in each 30 minutes (or whatever the client’s interval was set to) for the last 24 hours.  If the results look interesting, I would then extend the code a little and compare data from the corresponding period the previous week against data for this week to see if there was a pattern or an anomaly.

Browsing the internet recently, I discovered that most of the published SQL for doing this type of thing seemed to employ a fairly outdated and inefficient style of code that used a self-join on two copies the table that was to be reported (One exception showed up in Tim Gorman’s paper on Statspack at RMOUG 2004, which can also be found on his own website – no doubt there are others). The following code fragment gives you the general idea:


 

select

      ...

      se2.total_waits – se1.total_waits   total_waits,

      ...

from

stats$snapshot          ss

stats$sytstem_event     se1,

stats$sytstem_event     se2

where

se1.snap_id = ss.snap_id

and   se2.snap_id = se1.snap_id + 1

and   ...

Back to Index of Topics


There is a better way – analytic functions: a feature that, co-incidentally, made its public appearance in the same version of Oracle as statspack (see footnote) and happens to be ideally suited to ‘trend-like’ reporting of statspack results. (For a short note on analytic functions that I wrote in 2000, follow this link, for a more complex example that I published in the magazine of the UK Oracle Users’ Group, follow this link.)

Here’s a bare-bones example, picking up the ‘db file sequential read’ figure from the system event history. The site I wrote this at was taking snapshots every half-hour, so each line of the report shows the activity in the last 30 minutes. The idea is simple – we start with a query that gets data from a time-range, then uses the lag() function to align a copy of the values from the ‘previous’ timestamp reported against the values for the ‘current’ timestamp. Then we can wrap that query into an inline view and report the difference between the previous and current values as the work done in the period.


 

 

rem

rem   Simple example of accessing statspack tables with

rem   analytic functions. The effect of lag(,1) is to

rem   report the values from the ‘previous’ row against

rem   the current row.

rem

rem   Note: to do the job properly, you should pre-query

rem   v$instance and v$database to get the instance number

rem   and DBID into substitution (or bind) variables.

rem  

 

column snap_waits  format 999,999,990

column snap_waited format 999,999,990

column wait_average format 999,999.99

 

select

      to_char(snap_time,'Mon-dd hh24:mi:ss')    time_stamp,

      total_waits - prev_waits                  snap_waits,

      time_waited - prev_time                   snap_waited,

      round(

            (time_waited - prev_time) /

            decode(

                  (total_waits - prev_waits),

                        0,null,

                  (total_waits - prev_waits)

            ),

            2

      )                             wait_average

from

      (

      select     

--    starting query: get current and previous values

--    note: time_waited_micro is the version 9 column

--    and time_waited is the version 8 column.

--    The division by 10,000 converts 9i values centiseconds.

            ss.snap_time,

            se.total_waits,

            lag(se.total_waits,1) over (order by se.snap_id)             prev_waits,

            se.time_waited_micro/10000                                   time_waited,

            lag(se.time_waited_micro/10000,1) over (order by se.snap_id) prev_time

--          se.time_waited                                               time_waited,

--          lag(se.time_waited,1) over (order by se.snap_id)             prev_time

      from

            perfstat.stats$snapshot       ss,

            perfstat.stats$system_event   se

      where

            ss.snap_time between sysdate - 1 and sysdate

      and   se.snap_id = ss.snap_id

      and   se.event = 'db file sequential read'

      --

      --    Technically I should include the DBID and INSTANCE_NUMBER in the

      --    join, as these are part of the primary keys of the two tables.

      --    But most people have just one instance and one database.

      )

where

      total_waits - prev_waits >= 0       -- dirty trick to skip instance restarts

order by

      snap_time

;

 

Back to Index of Topics


Once you have applied the basic lag() approach, you can start getting sophisticated about the way in which you report data. In principle, you could (for example) write a report that took two Monday sets of data, and reported them side by side so that you could check whether the pattern from one Monday repeated the next Monday; you could even add a third set of figures that reported the differences in absolute value or percentages. Personally I rarely bother to write code to produce such prettily formatted output, I would just report the two days one after the other and judge changes by eye – but for management reports, you may want to clean the report up and make it easier to interpret.

If you do get interested in this type of manipulation of statspack data, remember that the indexes on the (default) perfstat schema are just primary key indexes designed for the way that statspack works. If you are leaving a large history in place, you may want to create an index on stats$snapshot with the columns (dbid, instance_number, snap_time) with compress 2 so that you can pick your target time range efficiently. (You may choose to forget about the first two columns if you are using statpack for just one database and instance, of course).

As an example of the output – the script above produced the following output on a site which scheduled statspack every half hour, and (luckily) didn’t have any extra snapshots fired off manually in the interval reported.


TIME_STAMP        SNAP_WAITS  SNAP_WAITED WAIT_AVERAGE

--------------- ------------ ------------ ------------

May-19 17:30:22          307          158          .51

May-19 18:00:25            0            0

May-19 18:30:23            4            2           .5

May-19 19:00:25       67,654       35,544          .53

May-19 19:30:25          266          100          .38

May-19 20:00:28          718          303          .42

May-19 20:30:24          702          288          .41

May-19 21:30:23          670          326          .49

May-19 22:00:31        1,093          741          .68

May-19 22:30:25        1,208          655          .54

May-19 23:00:27        1,356          819           .6

May-19 23:30:23        1,373          728          .53

May-20 00:00:28       12,301       12,617         1.03

May-20 00:30:23        1,406          924          .66

May-20 01:00:30       15,740       13,790          .88

May-20 01:30:26        1,422          742          .52

May-20 02:00:24            0            0

May-20 02:30:22        9,456       10,347         1.09

May-20 03:00:27       18,415       10,471          .57

May-20 03:30:28       13,901        7,855          .57

May-20 04:00:28       90,948       58,901          .65

May-20 04:30:25            5            0            0

May-20 05:00:24            0            0

May-20 05:30:24            0            0

May-20 06:00:23            1            0            0

May-20 06:30:23            1            0            0

May-20 07:00:28            0            0

May-20 07:30:22            0            0

May-20 08:00:37            0            0

May-20 08:30:26          110           67          .61

May-20 09:00:31           21            9          .43

May-20 09:30:25            4            4            1

May-20 10:00:26        1,370        5,716         4.17

May-20 10:30:24           23           14          .61

May-20 11:00:30            1            0            0

May-20 11:30:28           37           30          .81

May-20 12:00:30            7            3          .43

May-20 12:30:24           93           50          .54

May-20 13:00:26       55,317       39,901          .72

May-20 13:30:27        3,991        1,475          .37

May-20 14:00:26       11,335        5,201          .46

May-20 14:30:24            3            0            0

May-20 15:00:26            3            2          .67

May-20 15:30:23            4            2           .5

May-20 16:00:26           64           66         1.03

May-20 16:30:23            5            3           .6

May-20 17:00:29           85           81          .95

 

47 rows selected.


As you can see, the site has a couple of evening and early morning batch jobs; but, most importantly for their end-users, something happened around lunch time on this day that happened to thrash the I/O sub-system quite severely.  Running the report for a week, it became apparent that whatever it was, it happened every week day some time between 12:00 and 2:00, and did about the same amount of work every time – so the next step was simply to wait until midday the following day and chase down any sessions that suddenly got aggressive at around that time.

Footnote:

I couldn’t remember when statspack first appeared in production, so I had to call on Graham Wood (one of the key players in the creationof statspack) to check, and he told me that although it had been in use for a very long time in within Oracle, it finally went production (largely due to the efforts of Connie Green) in Oracle 8.1.6 – the same release that gave us the Analytic functions.


Update 2006

I’ve recently discovered a good argument for sticking with something like the old-style code. Analytic functions are not available with Standard Edition. If you are in this position there are two important thoughts to consider.  (Correction courtesy of Tom Kyte – they are part of Standard Edition in 9 – but the second point may still be useful anyway for performance and clarity of code)

First – the typical old-style code usually has a predicate like: and se2.snap_id = se1.snap_id + 1 to connect together the data extracted from the two separate calls to the history tables. But the snap_id is generated from a sequence and there is a fair chance that on a busy production system the sequence will be flushed from the library cache from time to time, losing sequence values. If you want to depend on getting consecutive snap_ids, you will probably want to modify the stats$snapshot_id sequence to nocache.

Secondly – if you don’t keep the snapshot history small the work done in joining a history table to itself may be a bit resource intensive. If you are running Oracle 9i, you may get better performance, even allowing for the global temporary table that gets created and populated, by taking advantage of subquery factoring (which I think is available in standard edition) to extract the relevant subset of the history, materialize it, then join it to itself, for example:


 

 

rem

rem   Sample code to extract a small section of snapshot

rem   history into a (cursor-level) global temporary table

rem

with  event_history as (

      select     

            /*+ materialize */

            ss.snap_id,

            ss.snap_time,

            se.total_waits,

            se.time_waited_micro/10000          time_waited

      from

            perfstat.stats$snapshot       ss,

            perfstat.stats$system_event   se

      where

            ss.dbid = 1663614895

      and   ss.instance_number = 1

      and   ss.snap_time between sysdate - 1 and sysdate

/*                                              */

      and   se.dbid = ss.dbid

      and   se.instance_number = ss.instance_number

      and   se.snap_id = ss.snap_id

      and   se.event = 'db file sequential read'

)

select

      ev1.snap_id,

      to_char(ev1.snap_time,'Mon-dd hh24:mi:ss')      snap_time,

      ev2.total_waits - ev1.total_waits               total_waits,

      round(ev2.time_waited - ev1.time_waited,2)      time_waited

from

      event_history     ev1,

      event_history     ev2

where

      ev2.snap_id = ev1.snap_id + 1

order by

      ev1.snap_id

;


The following is an example of the sort of execution you get (in 9.2.0.6, using explain plan and package dbms_xplan. Notice that statspack (in this version of Oracle, at least) has an ideal index for uniquely identifying a wait by name in a given snapshot. Note also how my dbid and instance_number have been transferred across the join by means of transitive closure – always check the execution paths and performance traps of code you download from the internet before you try it on production.



------------------------------------------------------------------------------------------------

| Id  | Operation                        |  Name                       | Rows  | Bytes | Cost  |

------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                 |                             |       |       |       |

|   2 |  TEMP TABLE TRANSFORMATION       |                             |       |       |       |

|   1 |   RECURSIVE EXECUTION            | SYS_LE_2_0                  |       |       |       |

|   0 |    INSERT STATEMENT              |                             |    60 |  8820 |    26 |

|   1 |     LOAD AS SELECT               |                             |       |       |       |

|   2 |      NESTED LOOPS                |                             |    60 |  8820 |    26 |

|*  3 |       TABLE ACCESS FULL          | STATS$SNAPSHOT              |    24 |  1152 |     2 |

|   4 |       TABLE ACCESS BY INDEX ROWID| STATS$SYSTEM_EVENT          |     3 |   297 |     1 |

|*  5 |        INDEX UNIQUE SCAN         | STATS$SYSTEM_EVENT_PK       |     1 |       |       |

|   3 |   SORT ORDER BY                  |                             |       |       |       |

|   4 |    MERGE JOIN                    |                             |       |       |       |

|   5 |     SORT JOIN                    |                             |       |       |       |

|   6 |      VIEW                        |                             |       |       |       |

|   7 |       TABLE ACCESS FULL          | SYS_TEMP_0FD9D660A_728BD3C  |       |       |       |

|*  8 |     SORT JOIN                    |                             |       |       |       |

|   9 |      VIEW                        |                             |       |       |       |

|  10 |       TABLE ACCESS FULL          | SYS_TEMP_0FD9D660A_728BD3C  |       |       |       |

------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   3 - filter("SS"."INSTANCE_NUMBER"=1 AND "SS"."DBID"=1663614895)

   5 - access("SE"."SNAP_ID"="SS"."SNAP_ID" AND "SE"."DBID"=1663614895 AND

              "SE"."INSTANCE_NUMBER"=1 AND "SE"."EVENT"='db file sequential read')

   8 - access("EV2"."SNAP_ID"="EV1"."SNAP_ID"+1)

       filter("EV2"."SNAP_ID"="EV1"."SNAP_ID"+1)

 


Back to Index of Topics