A reader from
I recently got an email from a dear old dinosaur with 50 years experience of tuning Oracle databases (he’d done a lot of overtime and on-call time) who wanted to tear a strip off me for suggesting (in a presentation that I’d recently done in Kazakhstan) that it was easy to test one of his pet theories to see if it was relevant to a particular member of the audience.
“All this
new-fangled testing, analyzing, and careful thinking is a waste of time. I’ve
fixed more than 5,000 databases in the last week alone using my methods, and
had IT managers from
I felt a little twinge of sympathy for the old chap. Things have changed. The Oracle world moves on and it must be tough when your “advanced tuning skills” of yesteryear are now part of “Basic Design and Configuration 101” and you have to start learning all over again. It can be hard to keep up, and sometimes it’s so much easier to fight the rearguard action, cling to the forlorn hope, and make the little things sound more impressive than they really are. (And, of course, you do get lucky from time to time and find someone who hasn’t taken “Basic Design and Configuration 101”).
Databases were small, disks were small and
people had lots of them, Oracle had a few interesting features but it was
largely just tables and indexes that people dealt with. Most databases
performed badly and had I/O problems because of bad design, inappropriate
indexing strategies, and bad code.
Databases have grown by two or three orders
of magnitude (note – one order of
magnitude means multiplying by 10), the capacity of a typical disk has
grown by one or two orders of magnitude, but disk access times are typically
only three or four times as fast as they used to be, and most people don’t have
many of them. Oracle is stuffed full of interesting technical features, but
many systems don’t use them – and quite a number of the systems that do use new
features tend to use them inappropriately. Most databases perform badly and
have I/O problems because of bad design, inappropriate indexing strategies, and
bad code. Some databases still work reasonably quickly because you can stuff
enormous amounts of memory into the box, add lots of very fast CPUs, and hide the
problems for a while.
Some old
dinosaurs try to protect their intellectual high ground by pumping out
declarations full of sound and fury, signifying nothing – at best they do no
harm, at worst they only muddy the waters further. Some young bucks are too
keen to fight their ‘new’ scientific corner, and end up making statements that
are just as extreme, over-generalised, and suspect as those of the old
dinosaurs. Both have the potential for causing problems for naďve users.
Back in the
bad old days, state of the art skills included:
reducing all segments to one extent,
rebuilding all indexes regularly,
wetting your pants if the cache hit ratio
dropped below 99.99999%,
making sure that small tables weren’t
indexed.
For a
history of some of the odd realities, odder beliefs and the reasons behind
them, you could read Oracle Insights: Tables of the Oak Table,
in particular Dave Ensor’s chapter, which will tell you such abstruse details
as where AFIEDT.BUF came from, and may even mention the reason for the original
2K block size.
There were two main drawbacks to many of these state of the art ideas. First, they didn’t apply after Oracle 6 came out (if they ever did), and second they did actually appear to work some of the time. (Yes, that sentence reads exactly as I intended). So what’s wrong with a snappy method that works some of the time? Nothing - if it works for you; but if you tell ten other people to do it and it only works for one of them then you’ve just made 9 people unhappy and wasted a lot of their time.
Alas, some of these ‘old methods’ did work for some people. But the information that would allow individuals to decide in advance whether a strategy was a good idea, a positively bad idea, or simply just a waste of time and effort, was not available.
Oracle
Corp. keeps updating their software, adding features designed to reduce the
workload on the DBA, the developer, and the database. Sometimes the features
work well, sometimes they are inappropriate, sometimes they are sub-optimal and
sometimes they are abused. Almost invariably, the marketing literature is ahead
of its time.
Is
everything that Oracle Corp. puts into their code right for you? If an old
method seems to be embedded in new code, does that prove that the old method
was always
a good
idea? I’ll let you work that out for yourself by considering a few examples.
Oracle 8i
introduced on-line index rebuilds – does this mean you should rebuild
every index on a regular basis (old method), or does it just give you a
facility for minimising interruptions on a 24 x 7 system for those special
cases where you can see that an index rebuild would positively benefit
performance ? In passing, bear in mind that on a busy 24 x 7 system an on-line
index rebuild has to acquire an exclusive lock on the underlying
table for a brief period to start the rebuild, and another to end the rebuild –
it’s likely to be a lot better than holding a lock on the table for the
duration of the rebuild, but it’s not entirely threat-free after all – and the
rebuild is not necessarily cheap on resources (although Oracle has at least
three possible mechanisms for doing the rebuild and will pick the one it thinks
it the cheapest).
Oracle 9i
introduced on-line table reorganization – does this mean you should
physically rebuild tables on a regular basis? Table rebuilding is
actually a strategy I have suggested from time to time – but usually by cloning
read-only partitions of a partitioned table because in the real-world the
cost of an online table rebuild is too high to allow, except in extreme
circumstances, as it includes recreating all the indexes and using materialized
view log mechanisms to keep the rebuilt table in synch whilst
everything else is going on. On-line table reorganization is best viewed as a
facility to allow old systems to migrate into new features such as partitioned
tables with minimum down-time.
Oracle 10g
introduced automatic memory management – regularly monitoring v$db_cache_advice etc. to shift memory dynamically
between several of the large memory structures in the shared memory segment.
Does this mean that increasing the buffer cache to reduce physical I/O is
automatically the right thing to do?
Bear in mind Oracle 10g also supplies you with the tuning advisor and the access
advisor to help you improve your indexing and SQL in a balanced fashion – perhaps you should only
enable automatic memory management after you have tuned the SQL. Bear in
mind, also, that the mechanisms of automatic memory management specifically exclude
the non-standard db cache sizes (see 10g Admin Guide Ch. 2 P.36) There is at least one author who
thinks that the ability to specify multiple block sizes may be the most
important feature of 9i – does their exclusion from automatic memory management
mean that Oracle Corp. thinks he is talking rubbish? Bear in mind, finally,
that the algorithm tries to avoid removing memory from the “shared pool”
because it kicks the system to death (that’s my paraphrasing – the manual
describes the issue a little differently). So should we
follow the line “but that’s what Oracle does” when even Oracle cannot do it
well? Perhaps not.
Tuning by
observing the work done and time lost at the process level – the so-called
“wait interface” method - is the ‘new’ science.
I could
invent a silly story at this point – perhaps I could tell you how I spent 72
hours watching a ‘ratio-junkie’ struggle to fix a problem by tweaking
parameters, plugging in memory chips swapping CPUs, creating materialized
views, rebuilding indexes, and refuelling every three hours on pizza and
high-caffeine coffee; then I could explain how I stepped in and solved the
problem a mere 30 seconds after typing “execute dbms_support.start_trace_in_session(...etc…)”. I won’t insult your intelligence by doing so
– you would obviously conclude that it was pure invention. Extreme bottlenecks
and extreme overloads are easily spotted by anyone with a basic level of
trouble-shooting skills, irrespective of their preferred method of operation.
I will say
that, once you have dealt with the trivial errors, the method of observing
business-related processes to see where the work is going and where the time is
lost helps you identify the important threats more easily, recognise some of the
slightly less obvious errors in design (or index strategies), and list options
for addressing the issues based on a realistic cost/risk/benefit estimate. And
yes – sometimes the answer is: “this particular I/O problem will go away for
about six months if you add a little over 48GB of RAM to the db_keep_cache_size” (as I said to someone just three
weeks ago) “and the other three alternatives are …”
Finally I
have to point out that a scientific approach is not a new method. “Academic”
and “theoretical” are terms that sometimes appear in the writings of this
industry as if they were criticisms. But have you seen how many academic papers
Oracle Corp. presents at events like the International Conference on Large
Databases - without Academia there would be no Oracle. The approach of
evaluating a hypothesis by predicting a test result then doing the test has
been around for centuries. If you are a professional trouble-shooter, you owe
it to your clients to go through that discovery process in your own labs so that
you don’t have to spend your client’s money learning how Oracle works. If you
are a full-time DBA, your employers owe you, and themselves, some time for you
to do some testing and experimentation that might be relevant to your (their)
systems so that you can predict problems before they actually happen.
What of the
daft old duffer? Treated with a little warmth and compassion, it turns out he
can be a charming chap. He has seen the error of his ways, and now tries hard
to copy those more enlightened individuals around him. Of course he does slip
back into old habits from time to time – but then, there are times when we all
hanker after the good old days when everything was simple.
About the author: Don Corleone runs a small family business out of NYC, mainly
involved in shooting trouble. He is a shy, retiring individual and doesn’t like
to make a big fuss about his level of skill, because his published work speaks
for itself. The author reserves the right to leave any outrageous claims
unsubstantiated, and also reserves the right to have invented any characters,
events, and figures that might suit the general narrative thrust of this
article.
That was then – A reader writes |
Oct 2004 |