A reader from New York writes:


That was then, this is now

I recently got an email from a dear old dinosaur with 50 years experience of tuning Oracle databases (he’d done a lot of overtime and on-call time) who wanted to tear a strip off me for suggesting (in a presentation that I’d recently done in Kazakhstan) that it was easy to test one of his pet theories to see if it was relevant to a particular member of the audience.

 

“All this new-fangled testing, analyzing, and careful thinking is a waste of time. I’ve fixed more than 5,000 databases in the last week alone using my methods, and had IT managers from Afghanistan to Zanzibar swooning at the 1000% improvement in performance. This scientific stuff is irrelevant and expensive. All you need do is quote the marketing literature - then add more memory and a couple of faster CPUs.”

 

I felt a little twinge of sympathy for the old chap. Things have changed. The Oracle world moves on and it must be tough when your “advanced tuning skills” of yesteryear are now part of “Basic Design and Configuration 101” and you have to start learning all over again. It can be hard to keep up, and sometimes it’s so much easier to fight the rearguard action, cling to the forlorn hope, and make the little things sound more impressive than they really are. (And, of course, you do get lucky from time to time and find someone who hasn’t taken “Basic Design and Configuration 101”).

Changing times:

Ten years ago

Databases were small, disks were small and people had lots of them, Oracle had a few interesting features but it was largely just tables and indexes that people dealt with. Most databases performed badly and had I/O problems because of bad design, inappropriate indexing strategies, and bad code.

 

Today

Databases have grown by two or three orders of magnitude (note – one order of magnitude means multiplying by 10), the capacity of a typical disk has grown by one or two orders of magnitude, but disk access times are typically only three or four times as fast as they used to be, and most people don’t have many of them. Oracle is stuffed full of interesting technical features, but many systems don’t use them – and quite a number of the systems that do use new features tend to use them inappropriately. Most databases perform badly and have I/O problems because of bad design, inappropriate indexing strategies, and bad code. Some databases still work reasonably quickly because you can stuff enormous amounts of memory into the box, add lots of very fast CPUs, and hide the problems for a while.


Mythtakes

Some old dinosaurs try to protect their intellectual high ground by pumping out declarations full of sound and fury, signifying nothing – at best they do no harm, at worst they only muddy the waters further. Some young bucks are too keen to fight their ‘new’ scientific corner, and end up making statements that are just as extreme, over-generalised, and suspect as those of the old dinosaurs. Both have the potential for causing problems for naďve users.

Old Methods

Back in the bad old days, state of the art skills included: 

reducing all segments to one extent,

rebuilding all indexes regularly,

wetting your pants if the cache hit ratio dropped below 99.99999%,

making sure that small tables weren’t indexed.

For a history of some of the odd realities, odder beliefs and the reasons behind them, you could read Oracle Insights: Tables of the Oak Table, in particular Dave Ensor’s chapter, which will tell you such abstruse details as where AFIEDT.BUF came from, and may even mention the reason for the original 2K block size.

 

There were two main drawbacks to many of these state of the art ideas. First, they didn’t apply after Oracle 6 came out (if they ever did), and second they did actually appear to work some of the time. (Yes, that sentence reads exactly as I intended).  So what’s wrong with a snappy method that works some of the time? Nothing - if it works for you; but if you tell ten other people to do it and it only works for one of them then you’ve just made 9 people unhappy and wasted a lot of their time.

 

Alas, some of these ‘old methods’ did work for some people. But the information that would allow individuals to decide in advance whether a strategy was a good idea, a positively bad idea, or simply just a waste of time and effort, was not available.

New versions of old methods

Oracle Corp. keeps updating their software, adding features designed to reduce the workload on the DBA, the developer, and the database. Sometimes the features work well, sometimes they are inappropriate, sometimes they are sub-optimal and sometimes they are abused. Almost invariably, the marketing literature is ahead of its time.

 

Is everything that Oracle Corp. puts into their code right for you? If an old method seems to be embedded in new code, does that prove that the old method was always a good idea? I’ll let you work that out for yourself by considering a few examples.

 

Oracle 8i introduced on-line index rebuilds – does this mean you should rebuild every index on a regular basis (old method), or does it just give you a facility for minimising interruptions on a 24 x 7 system for those special cases where you can see that an index rebuild would positively benefit performance ? In passing, bear in mind that on a busy 24 x 7 system an on-line index rebuild has to acquire an exclusive lock on the underlying table for a brief period to start the rebuild, and another to end the rebuild – it’s likely to be a lot better than holding a lock on the table for the duration of the rebuild, but it’s not entirely threat-free after all – and the rebuild is not necessarily cheap on resources (although Oracle has at least three possible mechanisms for doing the rebuild and will pick the one it thinks it the cheapest).

 

Oracle 9i introduced on-line table reorganization – does this mean you should physically rebuild tables on a regular basis? Table rebuilding is actually a strategy I have suggested from time to time – but usually by cloning read-only partitions of a partitioned table because in the real-world the cost of an online table rebuild is too high to allow, except in extreme circumstances, as it includes recreating all the indexes and using materialized view log mechanisms to keep the rebuilt table in synch whilst everything else is going on. On-line table reorganization is best viewed as a facility to allow old systems to migrate into new features such as partitioned tables with minimum down-time.

 

Oracle 10g introduced automatic memory management – regularly monitoring v$db_cache_advice etc. to shift memory dynamically between several of the large memory structures in the shared memory segment. Does this mean that increasing the buffer cache to reduce physical I/O is automatically the right thing to do?  Bear in mind Oracle 10g also supplies you with the tuning advisor and the access advisor to help you improve your indexing and SQL in a  balanced fashion – perhaps you should only enable automatic memory management after you have tuned the SQL. Bear in mind, also, that the mechanisms of automatic memory management specifically exclude the non-standard db cache sizes (see 10g Admin Guide Ch. 2  P.36) There is at least one author who thinks that the ability to specify multiple block sizes may be the most important feature of 9i – does their exclusion from automatic memory management mean that Oracle Corp. thinks he is talking rubbish? Bear in mind, finally, that the algorithm tries to avoid removing memory from the “shared pool” because it kicks the system to death (that’s my paraphrasing – the manual describes the issue a little differently). So should we follow the line “but that’s what Oracle does” when even Oracle cannot do it well? Perhaps not.

New Methods

Tuning by observing the work done and time lost at the process level – the so-called “wait interface” method - is the ‘new’ science.

 

I could invent a silly story at this point – perhaps I could tell you how I spent 72 hours watching a ‘ratio-junkie’ struggle to fix a problem by tweaking parameters, plugging in memory chips swapping CPUs, creating materialized views, rebuilding indexes, and refuelling every three hours on pizza and high-caffeine coffee; then I could explain how I stepped in and solved the problem a mere 30 seconds after typing “execute dbms_support.start_trace_in_session(...etc…)”.  I won’t insult your intelligence by doing so – you would obviously conclude that it was pure invention. Extreme bottlenecks and extreme overloads are easily spotted by anyone with a basic level of trouble-shooting skills, irrespective of their preferred method of operation.

 

I will say that, once you have dealt with the trivial errors, the method of observing business-related processes to see where the work is going and where the time is lost helps you identify the important threats more easily, recognise some of the slightly less obvious errors in design (or index strategies), and list options for addressing the issues based on a realistic cost/risk/benefit estimate. And yes – sometimes the answer is: “this particular I/O problem will go away for about six months if you add a little over 48GB of RAM to the db_keep_cache_size” (as I said to someone just three weeks ago) “and the other three alternatives are …”

 

Finally I have to point out that a scientific approach is not a new method. “Academic” and “theoretical” are terms that sometimes appear in the writings of this industry as if they were criticisms. But have you seen how many academic papers Oracle Corp. presents at events like the International Conference on Large Databases - without Academia there would be no Oracle. The approach of evaluating a hypothesis by predicting a test result then doing the test has been around for centuries. If you are a professional trouble-shooter, you owe it to your clients to go through that discovery process in your own labs so that you don’t have to spend your client’s money learning how Oracle works. If you are a full-time DBA, your employers owe you, and themselves, some time for you to do some testing and experimentation that might be relevant to your (their) systems so that you can predict problems before they actually happen.

Epilog

What of the daft old duffer? Treated with a little warmth and compassion, it turns out he can be a charming chap. He has seen the error of his ways, and now tries hard to copy those more enlightened individuals around him. Of course he does slip back into old habits from time to time – but then, there are times when we all hanker after the good old days when everything was simple.


About the author: Don Corleone runs a small family business out of NYC, mainly involved in shooting trouble. He is a shy, retiring individual and doesn’t like to make a big fuss about his level of skill, because his published work speaks for itself. The author reserves the right to leave any outrageous claims unsubstantiated, and also reserves the right to have invented any characters, events, and figures that might suit the general narrative thrust of this article.

That was then – A reader writes

Oct 2004