JL Computer Consultancy –

A review of 911 Oracle Emergency Support by Don Burleson

March 2005


The article under review makes the attempt to justify the claim that there are “silver bullets” for Oracle performance tuning by quoting a number of “real-life” experiences to prove the point. It suffers from two major weaknesses.

First, the author defines a “silver bullet” to mean “a small set of commands that quickly relieves an acute performance bottleneck”. Of course, this does mean we have a clear understanding of what the author means by a “silver bullet” so it’s a good thing. On the other hand, the author’s use of the expression is somewhat idiosyncratic, and doesn’t match the common interpretation of the expression.

For example, “adding a missing index” requires just one command (you don’t even have to have a second command to compute statistics for it, but it would be sensible to do so given the bugs currently associated with computing stats whilst creating or rebuilding an index), so this clearly agrees with the author’s definition of a silver bullet – provided it relieves an acute performance bottleneck.

But in real-life, with real systems, there is usually a lot more to it than first meets the eye. Only occasionally will it be obvious which index should be created and that the cost and benefit of creating it outweighs the risk of creating it. And even when it is obvious, you don’t find many major organizations that will allow you to hack the production database without first going through a stack of change-control procedures – including regression testing.  Adding the missing index is only a small part of the problem – the infrastructure of correctly identifying the missing index is the real job.

The second problem with the article is that it has a significant number of simple, and obvious, factual errors. So many, that it sheds doubt on the author’s claims that his fixes were real production fixes with the effects claimed. It is this point that I shall pursue in some detail, picking just a few examples from the article. There are many other points in the article which are of dubious quality, but they are less open to purely objective assessment; there are also several minor details (such as references to ‘partitioned tablespaces’) which are probably more indicative of carelessness than anything else. Each of the following sections will have a headline matching a headline from the original article.


Implement Cursor-sharing = force

This references a ’9.0.2.4’ database. This version doesn’t exist, but it is only a finger-slip away from 9.0.1.4 or 9.2.0.4 – but in either case, we can assume that the database is version 9i. The author shows the following as an extract from a statspack report.

Top 5 Wait Events

~~~~~~~~~~~~~~~~~                                          Wait  % Total
Event                                        Waits     Time (cs) Wt Time
------------------------------------- ------------ ------------  -------
enqueue                                     25,901      479,654    46.71
db file scattered read                  10,579,442
      197,205    29.20
db file sequential read                    724,325      196,583     9.14
latch free                               1,150,979       51,084     4.97
log file parallel write                    148,932       39,822     3.88

The first defect is that this happens to be an extract from an Oracle 8 statspack report, not from an Oracle 9 database as claimed by the author. Notice in particular that the wait times are in centiseconds – Oracle 9 reports seconds. Note, also, that the heading is “Top 5 Wait Events” when Oracle 9 uses “Top 5 Timed Events”

Next we notice with some surprise that the “db file sequential read” event showed just a few seconds less time than the 1,970 seconds reported against db file scattered read, despite the fact that the percentage of time reported drops from 29.2% to 9.14%.

The author then explains that he investigated the v$sql view and found lots of code that used literal strings, so set the parameter cursor_sharing to force, thus reducing contention on the library cache latch, and CPU consumption, and giving the users a 75% performance improvement.

There is no indication of how that 75% performance improvement was measured, but if the statspack report was used as the basis of the ‘before/after’ comparison, then it seems a little strange that eliminating the (reported) 4.97% lost time on latches whilst leaving 45% lost time on enqueues and 38% (or 58% if you add up the times rather than the percentages) on file I/O would make that much difference.

Unfortunately, the author didn’t initially suggest that CPU consumption was a problem, and didn’t give us any clue to think that it might be having such a serious impact on the system – and hasn’t given us any figures about before and after CPU consumption that might prove his point.


Add missing indexes

A financial house in New York has a performance problem – the system gets slower as they add more data (Ed - This is often a good clue that there may be an index design error – though not necessarily any missing indexes as such). The author has a script which shows the objects that are subject to tablescans and runs it with the following results:

                     Full table scans and counts
                                        
OWNER      NAME                      NUM_ROWS  C  K   BLOCKS  NBR_FTS   
---------- ------------------------  --------  -  - --------  -------

APPLSYS    FND_CONC_RELEASE_DISJS       14,293 N       4,293  498,864   
APPLSYS    FND_CONC_RELEASE_PERIODS    384,173 N      67,915  134,864   
DONALD     PERSON_LOGON_ID          18,263,390 N     634,272   96,212
DONALD     SITE_AMDMNT               2,371,232 N      51,020   50,719
DONALD     CLIN_PTCL_VIS_MAP        23,123,384 N     986,395   11,273       

This clearly shows a number of “large full tablescans”, and a follow-up check of v$sql showed that a common where clause for these tablescans was

    WHERE customer_status = ‘:v1’ and customer_age > :v2;

So the author creates ‘an’ index on (customer_status, customer_age) with terrific results.

Now, there’s nothing wrong with this approach (although you may wonder why investigating symptoms, identifying the causes, and applying a carefully targeted fix is a magic bullet, rather than a scientific approach) but I couldn’t help noticing that the worst tables in this production problem came from a schema called Donald.

This isn’t a schema that normally exists in Oracle Applications (where APPLSYS comes from). So it looks like someone called Donald has bolted on a few extra tables to the system, and forgotten that tables often need to be indexed to meet critical query requirements – and that’s a bit of a lead bullet.  (None of the APPLSYS tables identified in the report have columns customer_stats or customer_age, if you were wondering).

Luckily someone (also called Donald) came along later and spotted the problem and fixed it (with a careful analysis of the situation – not by picking a random table and creating a random index) in an appropriate sort of fashion.  Of course, the report has highlighted five different tables – so it would be interesting to know why only one index was created and others were deemed to be unnecessary.


Employ Materialized Views

This example features a data warehouse in Germany. Again we get a statspack top 5 report:

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                      % Total
Event                               Waits    Time (s) Ela Time
------------------------------- --------- -----------  --------

db file scattered read            325,519       3,246     82.04
library cache load lock             4,673       1,363      9.26
db file sequential read           534,598       7,146      4.54
CPU time                            1,154         645      3.83
log file parallel write            19,157         837      1.68

Again, we examine the consistency of the figures.  If 3,246 seconds is 82.04% of the total recorded, how can 7,146 seconds a couple of lines further down be only 4.54% of the total? And, of course, most people are aware that the ‘Top 5’ report is sorted in descending order by time – so how come the figures in this report are not in order.

Finally, we note that there is a CPU time component in this report (which means it is a 9i report) However, the line  labeled ‘CPU time’ has got a number in the ‘Waits’ column – and that does not happen in the real thing.

All in all, it would seem that this report is a complete fabrication.

This obvious fabrication makes the author look ignorant and incompetent; which is a shame, because in this example the author gives us a good explanation of how he investigates the way in which the system is used, and concludes (correctly, by the sound of it) that the system is an ideal candidate for using materialized views.

Note: – in one of his case studies the author decided that creating indexes was the best solution for excessive tablescans, and in another that creating materialized views was the best solution for dealing with excessive tablescans. This demonstrates quite clearly that neither strategy is a ‘silver bullet’ in the normal interpretation of the expression. You might also consider the fact that if you are going to build materialized views, you need to create some infrastructure to support them, and write some code to control them – which breaks even Don’s definition of a “silver bullet”.

Side-note:  One of the lucky advantages I have in my trouble-shooting work is that I seem to be very good at pattern recognition – particularly with numbers. So when I saw the following table in an article on buffer busy waits on builder.com I felt that it looked familiar.

Funnily enough, the author (Don Burleson again) is writing about buffer busy waits – and starts the article with this statspack top 5 that fails to show any significant time loss on buffer busy waits – but doesn’t suggest the creation of materialized views as a quick fix (so maybe it’s not a “silver bullet” after all). But it does, perhaps, show us the something about the hypothetical data warehouse in Germany that needed materialized views.

                                                      % Total
Event                          Waits    Time (s)     Ela Time
--------------------------- --------   ---------  -----------
db file sequential read        2,598       7,146        48.54
db file scattered read        25,519       3,246        22.04
library cache load lock          673       1,363         9.26
CPU time                       2,154         934         7.83
log file parallel write       19,157         837         5.68

 

The same set of values appears yet again on one of the Burleson websites as the background statistics for a tip on the use of Direct I/O. So there’s another “silver bullet” that doesn’t get a mention. You do begin to wonder how many “silver bullets” had to be fired before one of them hit the target in this German data warehouse.

Addendum 13th March –

And here it is again in: Oracle 10g Data Warehousing. The author first describes it as an “AWR report (Statspack report for Oracle 9i and earlier)” then labels it as a “Statspack (AWR)” report:

                                                      % Total
Event                          Waits    Time (s)     Ela Time
--------------------------- --------   ---------  -----------
db file scattered read         2,598       7,146        58.54
db file sequential read       25,519       3,246        12.04
library cache load lock          673       1,363         9.26
CPU time                                   1,154         7.83
log file parallel write       19,157         837         5.68

Note that scattered and sequential reads have swapped over – and (inevitably) the percentages and times don’t correlate. In this case, the figures are used to show you a “typical data warehouse system that is clearly constrained by disk I/O resulting from the high percentage of full-table and full-index scans”.

In passing, you might want to work out the average wait time for reads on the last two versions of the figures - I think there may be a hardware problem to fix before you worry too much about a database problem.


Add Freelists

The database in this case study was a 9.2.0.4 system in Michigan. The staff on an order entry system had doubled, and there was a performance problem. (Ed - when you double the rate of input, and the speed of data entry drops, this is a good clue that the problem might be contention).  The author identifies the problem as contention on the segment header block on the customer_order table relating to the fact that almost all the DML was inserts into this table, so he issues:

alter table customer_order freelists 5;
alter index cust_pk freelists 5;

The author finally notes: “I knew that this was only a stop-gap fix and as soon as they ran their weekly purge (a single process) that only one of the five freelists would get the released blocks, causing the table to extend unnecessarily.”

This is not the way that freelists work. If a single process frees up a lot of table blocks, those blocks are put onto a transaction freelist. When a subsequent process cannot find any space in its process freelist, it would call for space from the master freelist, and at this point the blocks on the (committed) transaction freelist would be transferred to the master freelist and then be redistributed as required to the process freelists.

When you set the freelists parameter for a table or index, you are defining the number of process freelists. If you leave it to default to one, then the master freelist is used as the process freelist.

For an interesting discussion on freelists (plus a few other erroneous ideas propagated by Don Burleson, plus his chosen strategy for responding to any mention of his errors) see the following thread on Ask Tom.

http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:35336203098853

Addendum 1: There was a bug in 7.2, fixed in 7.3, which caused a problem with the redistribution of free blocks.

Addendum 2: If you create an object with multiple freelist groups then freed blocks become associated with a freelist group and cannot migrate to another freelist group, so a single large delete can cause unnecessary object extension described. The dbms_repair.rebuild_freelists procedure was introduced in 8i to address this issue.


Conclusion

The author starts his article with the claim that there are ‘silver bullets’ of Oracle tuning. And, unsurprisingly, he succeeds in describing a number of cases where a problem is investigated and a simple fix can be identified very quickly. Thus, according to his personal interpretation of the expression ‘silver bullet’, he can claim to have proved his point.

It is a shame that the author tries to pretend that he is using ‘silver bullets’, when he could so easily have used the same examples to say: “look how straightforward it is to use a simple, tidy, approach to identifying critical problems and proposing appropriate solutions”.

It is a greater shame that the author destroys the credibility of the article by padding it with so many sets of figures that are clearly not related to the cases he describes.


Footnote

Three of the errors described in this review (the misunderstanding about freelists, and the anomalies in the Statpack reports) were raised in an article I sent to DBAZine in March 2004 – they decided not to publish it.  However, since Don Burleson is apparently the Senior Technical Editor for DBAZine, we could perhaps have expected to see some efforts made to correct the article.

Footnote 2

I have notified Don Burleson by email that this article is on my website, and offered him the opportunity to reply on this website, or by giving me a URL to a response on his own website.

According to the timestamps on my email system, I sent the notification at 11:20 on 5th March, and Don Burleson acknowledged receipt at 15:09 on 5th March with the message “LMAO! I love nerd-baiting!”.

The following is Don Burleson’s considered reply, written two days later, with a timestamp of 13:13 on 7th March.


 

Hi Jonathan,

 

> Should you have any comment, I will be happy to add it at the end of the articles.

 

Yes, please publish this, verbatim:

 

*******************************************************

 

This notification is a bit "late", since I'd received hateful e-mails long before you told me about it from readers.  As a paid DBAZine author, you might want to notify DBAZine when you publish false and inflammatory statements about their property.

 

As to your "review":

 

-  Three of the errors described in this review (the misunderstanding about freelists, and the anomalies in the Statpack reports) were raised in an article I sent to DBAZine in March 2004 – they decided not to publish it.  However, since Don Burleson is apparently the Senior Technical Editor for DBAZine, we could perhaps have expected to see some efforts made to correct the article.

 

           I've never seen this article, and I have no idea why it was rejected by DBAZine management.  

 

- This obvious fabrication makes the author look ignorant and incompetent;

 

Actually, yours is the first complain I've received.  As you noted, even a beginner can see that the examples are "obvious fabrications", designed solely to illustrate the points. 

 

This is one of the most frequently-read DBAZine articles they ever published, and the huge amount of positive feedback prompted me to create the book "Oracle Silver Bullets".

 

http://www.rampant-books.com/book_2005_1_silver_bullet.htm 

 

Since, you acknowledge that they are obviously nothing more than illustrative examples, you have a no basis for labeling them "ignorant and incompetent".   This remark is incorrect, rude and unprofessional.

 

- It is a greater shame that the author destroys the credibility of the article by padding it with so many sets of figures that are clearly not related to the cases he describes.

 

        I asked several DBAZine authors to have a look, and they found it to be a on-point, interesting, informative and credible.


-  The second problem with the article is that it has a significant number of simple, and obvious, factual errors. So many, that it sheds doubt on the author's claims

 

I had several experienced DBAZine authors review your "comments", and I was told that they are incorrect, unprofessional and malicious.  

 

-  someone called Donald has bolted on a few extra tables to the system, and forgotten that tables often need to be indexed to meet critical query
requirements

 

Excuse me?  This comment is false, without basis, factually incorrect, inflammatory and extremely unprofessional.

While I have not seen your rejected article, I think I have an idea why it was rejected before being forwarded to me for technical review.  Did DBAZine say why it was rejected?

 

Regards,

 

Donald K. Burleson
www.dba-oracle.com

 


Back to Index of Topics