JL Computer Consultancy

Test carefully before you trust the internet.

August 2006


Part 3 – It must be okay, I got it off Metalink (Sept 2006)

Part 2 – An ancient script to hammer your modern database. (Sept 2006)


A little while ago, I had a consultancy agreement with a fairly new small company that wanted me to drop in for a couple of days each month to help out their DBA and check that their developers were following sound coding standards. One of the strangest things about this assignment was that every time I met the DBA, he always greeted me with a stack of paper about an inch thick and the comment: “I’ve got a load of great new scripts off the internet … what do they do.”

You will, of course, have noticed the logical inconsistency – how did he know they were great if he didn’t know what they did. His criterion for greatness seemed to be that they referenced x$ objects, v$ objects, or any of the sys tables.  Still, on a positive note, he didn’t usually run any of them until he’d shown them to me – and he certainly didn’t run many of them after he’d shown them to me. Few of them were useful, most of them were incomplete, and some of them were a positive threat to a production system.

There are a lot of “scripts for DBAs on the internet. Some of them can supply some useful information when you know their limitations and understand and what they are saying. Many of them have been written by individuals trying to solve particular problems on their own specific systems. Some of them are clearly the results of people experimenting on toy databases – all of them should be reviewed and tested carefully before you run them on your own production systems.

I was prompted to remember this DBA a couple of nights ago because I was browsing the internet and came across several scripts in the space of 30 minutes that fell into the category of experiments that had probably never been run on a big production system. (One of them even came from Metalink).

The first was an example of a script that would have been of dubious benefit – even if the stunning omissions in the code had been corrected. It appeared in a  presentation given by Don Burleson  at the ODTUG conference earlier on this year:


 

select

        to_char(

            sn.begin_interval_time,

            'yy-mm-dd hh24'

        )                                 snap_time,

        count(*)                          ct,

        sum(st.rows_processed_delta)      row_ct,

        sum(st.disk_reads_delta)          disk,

        sum(st.cpu_time_delta)            cpu

from

        dba_hist_snapshot   sn,

        dba_hist_sqlstat    st,

        dba_hist_sql_plan   sp

where

        st.snap_id = sn.snap_id

and     sp.sql_id = st.sql_id

and     sp.operation = 'NESTED LOOPS'

group by

        to_char(sn.begin_interval_time,'yy-mm-dd hh24')

having

        count(*) > {input value}

;

The comments that went with this code suggested it was something to do with tracking nested loop joins. Given the references to the dba_hist_xxxx views – from the AWR (automatic workload repository) – you will realise that this query is supposed to do something with Oracle 10g. The question is what?

Roughly speaking, the query is looking for historic executions plans which used nested loop joins and producing a report that is attempting to produce some measures of work done by nested loops, showing the variation over time. In principal this could be a good thing as it allows us to ask things like: how did metric X today compare with metric X yesterday (or on the same day last week if we adjust the AWR defaults to keep two weeks of data); how did metric Y look different around 3:00 pm today when the users were complaining about the performance. But before you get too keen on any particular metrics, you need to make sure that you define them properly and write code that calculates them properly. Let’s look closely at this script:

First – the joins. The primary key on wrm$_snapshot (the thing underpinning dba_hist_snapshot) is defined as (dbid, snap_id, instance_number). A single repository can hold information from multiple databases (identified by dbid) and many instance from the same database (instance_number). Virtually every table in the workload repository has the same critical columns – because each set of snapshot statistics belongs to one instance of one database. So if you’ve managed to get multiple databases, or multiple instances, collecting statistics into the same repository then the results from your query are going to get distorted in all sorts of strange ways if you don’t get this basic join right.

Second (corrected Sept 2006) – the joins, again. The view dba_hist_sqlstat is based on the dynamic performance view v$sqlarea_plan_hash  Each SQL statement could end up with multiple execution plans, and in the more well-known dynamic performance views there is a column called child_number that acts as part of the key to the data. But child numbers are irrelevant (for obvious reasons) in the history tables – so the history collects statistics summarized by execution plan – which is what you find in v$sqlarea_plan_hash If two child cursors have different execution plans they will have different plan hash values – so it is the column plan_hash_value that becomes part of the key to dba_hist_sqlstat and dba_hist_sql_plan (or rather their underpinning tables). This is another column that should be in the query’s join clause. If you omit it, the reported impact of a query in the result set will be scaled up by a factor dependent on the number of execution plans for that query.

Of course you should always check data structures carefully. Although the plan_hash_value is part of the primary key of dba_hist_sql_plan (wrh$_sql_plan) the snap_id is not, nor is the instance_number (in fact the instance_number is not even a column in the table). If you capture a plan, with its hash value, you don’t need to store a copy of it for every instance attached to the database, and you don’t need to store a copy in every snapshot – it would be a great waste of space. (The same argument is true for dba_hist_sqltext (wrh$_sqltext) – although both tables still carry a snap_id, which is not part of the key, but perhaps is the last snapshot id where the plan/text was in use). In fact it is slightly odd that the dbid still appears in both these tables as part of the primary key – it seems to be unnecessary.

Third – what do the metrics mean anyway?  Consider the metric sum(st.disk_reads_delta). To get this value we read the column disk_reads_delta from view dba_hist_sqlstat and then (because we join this one row to dba_hist_sql_plan) multiply it by the number of times a nested loop operation appears in the execution plan (we’ll avoid the complication of there being several different plans for the text) before adding it to the total. So a multi-table join with several nested loop joins has a much bigger impact on the result than a simple two-table join. What does it actually tell us if we see this metric change by (say) 5%, or by (say) an absolute value of 250,000 – have you stopped running a complex statement a few times, or a simple statement lots of times.

From the perspective of my learner dba it’s a “great script”. It uses all those lovely dba_hist_xxxx views that are so new and special in 10g, and it produces a report which clearly allows you to look for changes in workload metrics with time. The trouble is that the SQL is clearly defective, and even after correcting the obvious errors in the SQL you’d be hard pushed to make the metrics mean much.

The only good thing about this query is that you probably wouldn’t be running it more than once per day – if you thought you needed to run it at all. But even once a day may be too much; some of the systems I have been handling recently have been fairly large web-facing systems where it becomes fairly easy to get several child cursors – with different execution plans – for the more popular SQL statements. The resulting multiplication in data sizes as you run a query with the plan_hash_value join predicate missing could be a little brutal.

If you really want to run this query then the following may be a slightly more appropriate version, but only because it corrects the join errors. I haven’t wasted much time to check it very carefully as even with an extra in-line view to factor out the effect of multiple nested loop operations I can’t really see it serving any useful purpose.

select

        to_char(

            sn.begin_interval_time,

            'yy-mm-dd hh24'

        )                                 snap_time,

        count(*)                          ct,

        sum(st.rows_processed_delta)      row_ct,

        sum(st.disk_reads_delta)          disk,

        sum(st.cpu_time_delta)            cpu

from

        dba_hist_snapshot   sn,

        dba_hist_sqlstat    st,

        dba_hist_sql_plan   sp

where

/*                                              */

/*      Correct join from snapshot to sqlstat   */

/*                                              */

        st.snap_id = sn.snap_id
and     st.dbid = sn.dbid
and     st.instance_number = sn.instance_number

/*                                              */

/*      Correct join from sqlstat to sql_plan   */

/*                                              */

and     sp.sql_id = st.sql_id
and     sp.dbid = st.dbid
and     sp.plan_hash_value = st.plan_hash_value

/*                                              */

and     sp.operation = 'NESTED LOOPS'

group by

        to_char(sn.begin_interval_time,'yy-mm-dd hh24')

having

        count(*) > {input value}

;

My conclusion on this script:  it may have been created as an experimental little hack on a toy database to try and work out a few principles; it may have been tested (without too much overhead) on a simple development system; but until I’d examined the query and all the underlying views and tables very carefully, I wouldn’t think about running it on a serious production system.


Sometimes the best thing to do is to forget.

In a recent whitepaper, I came across the following script with the heading “Segments occupying hot buffers touch count > 10”.

SELECT /*+ use_hash(bh) */

       de.owner ||'.'|| de.segment_name segment_name,

       de.segment_type segment_type, de.extent_id extent#,

       bh.dbablk - de.block_id+1 block#, bh.tch

FROM

       dba_extents de, x$bh bh

WHERE

       de.file_id = bh.file#

AND    bh.dbablk between de.block_id AND de.block_id + de.blocks – 1

AND    bh.tch > 10

ORDER BY

       bh.tch

;

The associated presentation was, I think, about tuning 9i and 10g.  Before you read any further you might like to look at this FAQ written by Connor McDonald in 2001. Even in 8i the view dba_extents was a nasty thing to query – unless you were very careful about how you targeted it.

In the query above the author has included the hint use_hash(bh) – presumably with the intent that the relevant columns from every row of dba_extents should be acquired and then scattered into an in-memory hash table before you touch x$bh to look for hot blocks. On my system – with a baseline install – this took roughly 20 seconds and 5,500 physical reads – not to mention all those function calls that Oracle made because I was using locally managed tablespaces.

Moreover, if you managed to get Oracle doing a hash join between these two data sets the only join predicate – remember you can only use an equality in a  hash join – is on the file number; which probably means not many distinct join values and lots of hash collisions in the hash table. That’s not good for CPU.

If you were to worry only about efficiency you might want to remember that eliminating data early (e.g. all those buffer headers where the touch count is not greater than 10) is usually a desirable strategy – surely the join should drive from the hot blocks! But if you really wanted to write something that gave you the sort of information you get from this query you might first look carefully at x$bh. Even in 8i it had a column called obj – which was the data_object_id of the object – so if you want to minimise the damage when thrashing through x$bh you might consider joining x$bh to obj$ on the dataobj# column.

Going for an efficient selection and join does have its drawbacks, of course. First, if you look at obj$ you won’t get the undo segments reported because x$bh shows their object id as power(2,32) -1.  But that’s not really a big issue as x$bh also has a class column, which manages to identify the undo segment number very easily – and even separates the undo segment headers from the rest of the undo blocks.

Secondly, though, the supplied script reports the extent id and block number within extent – which you can’t get by looking at obj$. However, I don’t think I’m too bothered by that. The output may look cute, but doesn’t add value – you can simply report the x$bh.class column to spot any special cases such as segment headers and bitmap blocks; and for all other cases your next trouble-shooting step would probably be to dump the block to see what’s in it that makes it hot – and you need the file and block number for that, not the extent id and block within extent.

My conclusion on this script:  it was probably based on an original from Oracle 6 – when databases were small and any script using an x$ object was considered a sign of real machismo – and hacked to included the touch count (tch) when 8i appeared – without due consideration of what might happen on a production sized database. If you’re trying to do trouble-shooting, kicking the database to death is not a good way to go about it.

As a general warning – be very cautious of any script that tries to wade through dba_extents with the classic predicate:

WHERE

       de.file_id = {file number}

AND    {block number} between de.block_id AND de.block_id + de.blocks – 1


Always be cautious about x$ and v$

Several years ago, I wrote a script to access x$ksmsp – the visible representation of the shared pool because I thought it might be useful to have an indication of how the memory was being used. However, I never found the need to use it until some time last year when I was called in to look at a couple of very odd problems that were causing a rather busy web-based system to “stall” for a couple of minutes at random intervals during the week.

Having failed to put a finger on the problem after checking all the usual suspects, I decided that this was an opportunity where my old x$ script might actually be useful. Of course, since I had never run it on a production system before, and since it was a query against a very important structure in the SGA, I showed it to the resident DBA team and asked if I could run it.

To my amazement their response was something like “Oh no, not that one – we’ve seen that one before, you’re not running that on our system”.  Since it was a script I had never published, this surprised me for a moment – but of course there is such a thing as convergent evolution, someone else was bound to have come up with the same idea.

In the end, after a whole day on site where I hadn’t been able to nail down the problem, the DBA team finally relented and said that I could run my query around 10:00 am the following day when the system would probably be quite quiet and it probably wouldn’t cause too much damage.

So the next day, as the clock ticked its way to 10:00 I started the query off. I should mention at this point that the DBAs were based in the room next to the large open plan office where I was working – the significance of this arrangement is that moments after I launched the query, the DBAs came bursting through the connecting door saying “You just ran that query, didn’t you!”.

They had a 46” flat panel TV screen in their office which had a continuous graphic display of key indicators of system performance. As I started my query, it jammed the shared pool, which virtually stopped all database activity, which made the front-end (Web) application servers fire in a couple of hundred new connections because they weren’t getting any response from the database. Since the shared pool was jammed, the new sessions couldn’t connect and got stuck in a queue – which made the front-end panic and fire in a couple of thousand new connections within the space of 30 seconds. The DBAs’ 46” display had exploded in a riot of colours as all the threat indicators shot off into outer space.

Eventually my query ended, releasing the shared pool – allowing 2,500 new connections to get into the system, annihilating the contents of the library cache and hammering the library cache and shared pool latches to death as they did so. Five minutes later the excess sessions were all connected – just in time for the front-end to decide that the sessions were now surplus to requirements and could all be shut down, hammering the latches to death again.

Note: You might like to check the configuration of whatever software you use as your Web Application Server – it seems as if the standard configuration is to start creating new connections fairly promptly if there are no free connections available. This means that if the database “slows down” for any reason, the Web Server invariably ends up detecting this as an absence of free connections and creates more. In a nutshell – if your database starts to struggle as the load climbs, the Web Server leaps in and kicks the database to death. Good Web Applications enabled queueing in the Web Server layer.

My simple query, which took only a few seconds to run, caused a severe stall on the production system for about 10 minutes. Beware of poking the x$. On a large-scale production system nasty things can happen.

Remember that queries against the x$ objects don’t support read-consistency, they just happen to look at whatever is in memory at the time. Of course this can be quite efficient sometimes – some of the memory structures are fixed arrays, and Oracle can just walk the length of the array quite safely; the results may not be self-consistent, but at least the query won’t crash out into unknown territory. On the other hand, some of the structures are complicated objects hanging off linked lists. If you decide to eliminate or recreate an object while I’m walking a linked list, then my code may end up following a pointer into a garbage trap then crash out anywhere in  memory. So some queries against x$ need to be protected from interference – usually by grabbing latches aggressively: try this little test (which I’ve cut and pasted from a 10g SQL*Plus session just after starting up the database).

SQL> select gets from v$latch where name = 'library cache';

 

      GETS

----------

    763733

 

1 row selected.

 

SQL> select count(*) from v$sql;

 

  COUNT(*)

----------

      1473

 

1 row selected.

 

SQL> select gets from v$latch where name = 'library cache';

 

      GETS

----------

    768731

Strangely, my query against the x$ksmsp didn’t show up as a massive latching issue – my session showed very little latch activity, and other sessions weren’t showing any surprising latch waits. I still haven’t figured out what Oracle does when you run this query – all I know is that I’m not planning to run it on a production system again – at least, not one that’s sitting behind a web-based application.

Of course we have to come back to that odd comment from the DBAs that they’d seen the query before. My query was just a simple aggregate query against x$ksmsp, breaking the allocation down by class and size, reporting number, total memory and average memory, using a granularity of 256 bytes to reduce the length of the output. Just a simple sweep through x$ksmsp.

The query they had seen came from Metalink note 146599.1 “Diagnosing and Resolving Error ORA-04031”, (last updated 16th July 2006) and looked like this:

select

       '0 (<140)' BUCKET, KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10) "From",

       count(*) "Count" , max(KSMCHSIZ) "Biggest",

       trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"

from

       x$ksmsp                                                       -- pass 1

where

       KSMCHSIZ<140

and    KSMCHCLS='free'

group by

       KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10)

UNION ALL

select

       '1 (140-267)' BUCKET, KSMCHCLS, KSMCHIDX,20*trunc(KSMCHSIZ/20) ,

       count(*) , max(KSMCHSIZ) ,

       trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"

from

       x$ksmsp                                                       -- pass 2

where

       KSMCHSIZ between 140 and 267

and    KSMCHCLS='free'

group by

       KSMCHCLS, KSMCHIDX, 20*trunc(KSMCHSIZ/20)

UNION ALL

select

       '2 (268-523)' BUCKET, KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50) ,

       count(*) , max(KSMCHSIZ) ,

       trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"

from

       x$ksmsp                                                       -- pass 3

where

       KSMCHSIZ between 268 and 523

and    KSMCHCLS='free'

group by

       KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50)

UNION ALL

select

       '3-5 (524-4107)' BUCKET, KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500) ,

       count(*) , max(KSMCHSIZ) ,

       trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"

from

       x$ksmsp                                                       -- pass 4

where

       KSMCHSIZ between 524 and 4107

and    KSMCHCLS='free'

group by

       KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500)

UNION ALL

select

       '6+ (4108+)' BUCKET, KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000) ,

       count(*) , max(KSMCHSIZ) ,

       trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"

from

       x$ksmsp                                                       -- pass 5

where

       KSMCHSIZ >= 4108

and    KSMCHCLS='free'

group by

       KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000)

;

This thing sweeps through x$ksmsp five times in succession looking for free space. I really wanted to run it on their system to see what would happen – but decided it wouldn’t be diplomatic to do so. I can’t help feeling that this may have been written a long time ago by someone who wasn’t running on a big system; although there is a warning following the query that says:

Also be aware that running this query too often is likely to cause other memory issues in the shared pool.

In this context I think I would have to interpret ‘too often’ as ‘once’; and the memory issue does seem to be the rather drastic one that no-one else can get any memory to do anything whilst the query is running.  I may, however, be feeling a little too pessimistic after my experience.  Maybe there are only a few systems large enough and busy enough to suffer a melt-down when you lock the shared pool for a brief period. Still, if you do want to try a query like the Metalink query maybe this is a perfect opportunity for using subquery factoring:

with free_ksmsp as (

    select /*+ materialize */

        ksmchcls, ksmchidx, ksmchsiz

    from

        x$ksmsp

    where

        ksmchcls='free'

)

select  ... the rest of the query, but referencing free_ksmsp instead of x$ksmsp

With this syntax (and especially the materialize hint) your session creates a global temporary table from one pass through x$ksmsp and then uses that data five times to get the required result. Of course, you will be writing to the temporary tablespace as you build the global temporary table; but I’m guessing that’s a price worth paying to avoid locking up x$ksmsp for five passes.

In passing, you’ll notice that there’s no order by clause in the query. Possibly the original author wasn’t concerned with seeing a report where the data came out sorted by chunk size within bucket – but perhaps the author thought that the group by clauses would automatically force the data to appear in the right order. Bad practice there; the 10g results probably won’t come out ordered because 10g has implemented a “hash aggregation” algorithm that does not do the implicit sort that earlier versions of Oracle did.


Addendum

It’s just been brought to my attention that some of my comments about the ‘AWR’ script I used in the first example have been copied into a page advertising a book by Don Burleson. The script has been reprinted in a book titled Oracle Tuning The Definitive Reference apparently with just as many errors in it as it had in the presentation.

The advert quotes my comment in the form of two “errata” entries. The first (for a script on page 689 – tracking hash joins) saying:

Jonathan Lewis has noted that this script omits a specific join to the dbid for the snapshot and offers other excellent refinements.  This is the corrected script:

Amusingly one of the quoted reviews says “The discussion of the DBA_HIST views is alone worth the price.” Surely such a worthwhile discussion could not have missed the significance of the primary key on the basic snapshot table underpinning the dba_hist views, and a proper description of how it should be used in joins to the rest of the tables.

The second entry (for a similar script on page 667, with the phrase ‘NESTED LOOPS’ instead of ‘HASH JOIN’) says:

Jonathan Lewis has noted some great enhancements to this script, noting: …

and then quotes my description of what the script appears to be doing, and the first two errors in it (it’s interesting to see that a basic correction is considered to be an enhancement) but it doesn’t bother to quote the rest of the criticisms about the script.

However, despite my warning that I’ve only corrected the join condition and couldn’t be bothered to do anything useful with the script, it does show a ‘revised script’ that captures the one trivial correction I made. (Another trivial error, by the way, is the omission of an order by clause to sort the output data. I assume that the intention is to report the results ordered by time stamp rather than randomly. It’s an omission that may be forgiven in a Powerpoint slide that needs to conserve space but not in a ‘definitive’ tuning reference).

I can’t help thinking that it is also somewhat ironic that Don Burleson didn’t take my title to heart when copying the text from this web page. “Test carefully before you trust the Internet”. When I first wrote this piece I said that the view dba_hist_sqlstat was based on the view v$sqlstats. I realized while reading Don Burleson’s advert that this didn’t make sense – in fact the view is based on the content of another 10g view. Apologies to my readers – but I have now corrected the error; I wonder if the definitive reference has got it right.


Back to Index of Topics