A reader from Spain writes:


Can you spend too much cash?

There is a little debate going on at present about how much money you can manage to waste doing things that do not offer a reasonable return on investment. I would like to make a few contributions to that debate, covering a few of the less-obvious points that others may have missed.

 

Galileo, of course, was a careful scientist with a fine inquiring mind and a passion for getting the truth told. Consequently the protagonists he used to present his arguments in his famous (or infamous) book on the two world systems asked the questions that were the serious and difficult questions for the philosophers of his day.

 

This is quite difficult to do, since it requires a lot of hard work to address a serious question properly and thoroughly. To avoid this difficulty, I shall ask questions that no-one is really asking, because then I can make the questions up, avoid the hard bits, and give some really obvious answers in the hope that this makes me look very well informed.


Sceptic: There is nothing wrong with full caching and disk I/O can hurt Oracle performance

The naďve DBA, and even those who think they are top Oracle consultants, sometimes forget that speed isn’t everything. Remember, it’s not the fall that kills you it’s the sudden stop at the bottom.

 

A regular amount of Deliberate Disk I/O (DDI) is very important, and without a suitable amount of DDI taking place, your Oracle system may fail catastrophically with a condition known as Energetic Bit Failure (EBF). Remember, memory works quickly because it is very, very small and the bits can cram themselves really close together. But the downside of cramming the bits together is that they can get very hot. (Imagine being on the London Underground, or New York subway, in the rush hour on the hottest day of the year).

 

When a bit gets too hot, it can be subject to EBF, and the failure is subject to the Memory Domino Effect (MDE) – when a bit fails, a byte fails; when a byte fails, a word fails; when a word fails a dword (double word) fail – and on a 64-bit system when a dword fails a qword (quad word) fails, and the next thing you know a whole memory chip fails, and then the entire machine memory fails.  (Every time one bit fails, all the surrounding bits have to hold it up until it recovers, which means they are working harder – which gets them hotter …)

 

Addendum (4th March):  Arguments for and against 64 bits. For: if a bit fails, there are 63 other bits to hold it up. Against: the bits in a 64-bit chip are smaller than the bits in a 32 bit chip, so they are packed closer together and get hotter, more quickly.

 

To avoid the problems of EBF, it is necessary to make sure that all bits are allowed to visit the disc on regular occasions. Not only do the bits on disc get more space to spread themselves out and relax, the fan-like spinning motion of the disc ensures that the bits can cool down very quickly. 

 

If you still have problems with your in-memory bits overheating you can adjust the (now-hidden) parameter called the spin_count. This parameter tells Oracle how many times the disc has to spin before a hot bit can be reloaded. If you have a Very Large Memory (VLM), you may want to increase the default value from 2,000 as the bits that are packed into VLM  (especially the bits in the middle) get much hotter and need more time (spins) to cool down after use.

 

It isn’t necessary to increase the spin_count very much, though. As you know, the Stefan-Boltzman law tells us that the rate of radiation is proportional to the fourth power of the temperature for a black body – and in the sealed environment of a disk cabinet, the bits will be black. (The same is not true for bits which have come off CDs as these are illuminated by laser light; this is why CD rotational rates can be much slower than ordinary disk rotation rates).

 

If you set the spin_count too high, you may find that your bits start to freeze and have trouble coming back off the disc (unix administrators refer to bits in this state as ‘sticky’ bits) so do be careful when fiddling with this hidden parameter. Very few people in Oracle support are aware of this issue, so don’t be surprised if they tell you that you are talking rubbish when you ask about doing it.


Sceptic: Surely the end-users should be allowed to enter data.

I am amazed at the number of sites that have strict change controls in place, and then forget that every time a user types data into a screen THEY HAVE CHANGED THE CONTENT OF THE DATABASE!! Don’t they realized that every time the data is changed, the statistics could become wrong, and this could make the very next query generate a massive overload that causes all sorts of problems? Don’t they realise that when you change the data, you might cause rows to migrate, increasing the cost of all queries? Don’t they realise that even a single change to a single row in a single table could result in an index degenerating, spawning children, splitting up, down and across, and wasting huge amounts of space.

 

End-users should NOT be allowed to change the data until a full impact analysis has taken place, and proper change control procedures have been followed.

 

Of course, there may be cases where you cannot implement this most sensible of strategies. If you have to put up with such sloppy procedures, then you can always work around them by creating after insert, update, and delete triggers on all tables to re-compute the statistics on the table every time the table changes. If you do this, make sure that none of the developers puts the call to dbms_stats inside an autonomous transaction otherwise the new data will not be committed before the statistics are gathered, and the statistics will still be wrong.

 

An alternative strategy, which is a little less compute-intensive, is to put a before insert / update / delete trigger on every table to raise an exception – preferably one that confuses the end-users and their programs. Of course, if you want to be lazy, you could just write a script that generates a script that creates all the triggers with the same error message – but if you do, consider using the dbms_random package to generate the error number – this will make it harder for the users to realize that you are deliberately stopping them from messing up your nice tidy database.

 


Sceptic: My IT manager stops me from re-organizing tables every lunch-time – why?

This is typical of the ignorant IT management. You persuade the finance department to spend a fortune buying huge machines with excessive amounts of discs, memory, and CPUs, and then management tells you not to use all those resources.  What did you buy them for if you’re not allowed to use them!! 

 

(Remember, a CPU cycle not used is a CPU cycle wasted – CPU’s are like laptop batteries, if you don’t keep using them at full capacity they develop a memory of under-usage, and eventually you’ll find that your 1GHz CPU only remembers to run at 900MHz, then 800MHz, and so on.The only way to get the CPU back up to speed when this happens is to switch the whole machine off for 24 hours while leaving it plugged into the mains supply.)

 

But don’t worry – run this super-duper, amazing, really incredible script every morning at 9:00 a.m. and show the results to your IT manager. Then he’ll obviously have to let you re-organize everything straight away: Remember – if you want to get maximum value out of all that expensive hardware, the best time to re-organize the database is when the users are busiest – this will help you to exercise the CPUs and the discs; if you also want to make sure that you are using all of the memory, you can switch to manual workarea_policy for your re-org sessions, and set really large values for sort_area_size, hash_area_size etc. 

 

column c1 format a120 heading ‘Re-org needed’

 

select

      'Table ' || table_name || ' owned by user ' || owner ||

      ' needs to be re-organized immediately. '                   c1

from

      dba_tables

where

      owner not in ('SYS', 'SYSTEM', 'DBSNMP', 'PERFSTAT')

order by

      owner, table_name

;

 

Don’t forget to rebuild all the indexes and compute statistics after you’ve finished the table re-orgs.


Epilog

Alas, there are some people who are so blind that they don’t see the dangers that the more skilled consultants (with 400 years experience of fighting giants, slaying dragons and rescuing damsels in distress) can spot a mile off. But with the right vision, dedication, and motivation, you will be able to shrug off the negative comments of the doubters and make a real difference to your organization’s databases.

 


Addendum – 20th March 2005

Oracle CEO approves Quixotic approach:

On 13th Dec 2004, Larrey Ellison, was quoted in the Computer press as saying:

 

“A lot of people compared us to Don Quixote tilting at windmills, but finally we now have PeopleSoft”

 


About the author: Don Quixote runs a two-man business in La Mancha (Spain) with his junior partner Sancho Panza. Don Qixote has spent many years describing and tackling problems that no-one else can see. Don Quixote travels everywhere on horse-back but, as the junior partner, Sancho has to put up with a donkey, and has frequently suffered problems with ASS management as a consequence.

Can you spend too much cash – A reader writes

Jan 2005