JL Computer Consultancy

The Snark research mechanism ?

Aug / Sep 2005


Just the place for a Snark ! I have said it twice:
            That alone should encourage the crew.
Just the place for a Snark ! I have said it thrice
            What I tell you three times is true.

            Lewis Carroll – The hunting of the Snark


Addendum (21st Jan 2006)

Addendum (22nd Oct 2005)

Criticism requires care (17th Sept 2005)

Addendum (7th Sept 2005)

Go to Part 2 (4th Sep 2005)


Last week (22nd Aug) I wrote about an article published by Don Burleson on SearchOracle, and it was such a popular article that it got mentioned on several websites, including Mike Ault’s blog and Don Burleson’s own forum. In the last week, Mr. Burleson has updated the article, both on SearchOracle, and on his own website – the extracts below come from his own website.

You would have thought that perhaps he would have seen fit to do something about the more serious mistakes he had made, but in fact his efforts seem to be devoted to justifying them on the basis that he was only copying other people anyway, so it wasn’t his fault.

In my original article, I highlighted, criticized, and demonstrated the errors in the following assertion:

For example, the following set of parameters indicates that the Oracle DBA does not understand PGA management.

·         sort_area_size=1048576              <-- sort_area_size is ignored when pga_aggregate_target is set and when workarea_size_policy =auto.

·         pga_aggregate_target = 500m     <-- The maximum default allowed value is 200 megabytes

This is what the copy of the article on Mr. Burleson’s website now says:

For example, the following set of parameters indicates that the Oracle DBA does not understand PGA management.

·         sort_area_size=1048576 <-- sort_area_size is ignored when pga_aggregate_target is set and when workarea_size_policy =auto.

·         pga_aggregate_target = 500m <-- The maximum default allowed value is 200 megabytes, this limits sorts to 25 megabytes (5% of 500m). 

So we see that he is still condemning DBAs around the world for doing something that is perfectly reasonable.

Note: The comment about 25 megabytes and 5% of 500M was a very early modification that appeared on his website before I published my first article, not a response to my article. This, of course, changes the statement from a simple display of ignorance to something that is completely lacking in logic. Why should the fact that 25 MB is 5% of 500M prove that 500M is inevitably wrong? Why should the fact that 200 MB is (somehow) a maximum default prove that a limit of 25 MB is a bad choice? There is no way that you can interpret the line to produce a logical statement.


But let’s see what Mr. Burleson does instead of eliminating this wild assertion. He starts quoting references, with this introduction:

Because these are hidden parameters, information must be gleamed from Oracle MetaLink and Oracle expert web sites where you trust the source and where the Oracle employee moderators might have access to Oracle internal resources. 

Unfortunately, he fails to suggest that after gleaning your information, you might want to winnow it, test it, and discard the chaff.  But let’s see what his references have to say:

Carol Francum of SearchOracle:  … This phase of Oracle tuning is directed at … sort_area_size.

Oops, the quote from this person says you should look at sort_area_size. By the Burleson measure they don’t understand PGA management – so they clearly aren’t an expert. (Or perhaps there is a clue that Burleson may be wrong and should have tested his claim before publishing).

David Welsh on SearchOracle: … (You should leave entries for SORT_AREA_SIZE and HASH_AREA_SIZE in the pfile because PGA_AGGREGATE_TARGET must be disabled during RMAN recovery) …

Oops, the quote from this person says you should leave the sort_area_size and hash_area_size in the pfile. They’ve even given a reason for doing so. By the Burleson measure they don’t understand PGA management – so they clearly aren’t an expert. (Or perhaps there is a clue that Burleson may be wrong and should have tested his claim before publishing).).

By the way, thanks for the warning, David. (I’ll check it some day, of course, on the latest versions I use, but it’s good to have had a warning in advance.)

The next reference is a randomly selected slice from the description of a graph of memory usage against time that appears in a paper by Benoit Dageville and Mohamed Zait of Oracle Corp., presented at the annual VLDB conference in 2002. The paper is very instructive, and an interesting read – but it’s about the design strategy for the automatic workarea sizing mechanisms, not about setting parameters.

The next reference is one I have to quote in its entirety:

Metalink Note:223299.1:  “If PGA_AGGREGATE_TARGET is set in the init.ora, then SORT_AREA_SIZE,  HASH_AREA_SIZE, BITMAP_MERGE_AREA_SIZE  and CREATE_bitmap_AREA_SIZE are ignored.”  . . . "If PGA_AGGREGATE_TARGET is set in the init.ora, then SORT_AREA_SIZE,  HASH_AREA_SIZE, BITMAP_MERGE_AREA_SIZE  and CREATE_bitmap_AREA_SIZE are ignored. . . If PGA_AGGREGATE_TARGET is set in init.ora, then WORKAREA_SIZE_POLICY defaults to AUTO."

Yes ! It’s the Snark – Three times in a row this note fails to give a warning about shared servers (MTS). Three times – so it must be right, and everything else can be ignored. Shared Servers are irrelevant, and any reference to sort_area_size is a mistake.

Metalink Note:  223730.1: “1- When we set the PGA_AGGREGATE_TARGET  and WORKAREA_SIZE_POLICY to auto   then the *_area_size parameter are automatically ignored and oracle will automatically use the computed value for these parameters.”

Not so convincing: the number of times it doesn’t mention shared servers is less than the number of times that the previous note doesn’t mention shared servers. (Remember Catch-22 and Major Major’s father who didn’t grow more alfalfa than anyone else didn’t grow). But that’s another vote in favour of ignoring any comments about sort_area_size.

Metalink Note:  30918.1: “Oracle does not recommend using the SORT_AREA_SIZE parameter unless the instance is configured with the shared server option. Oracle recommends that you enable automatic sizing of SQL working areas by setting PGA_AGGREGATE_TARGET instead. SORT_AREA_SIZE is retained for backward compatibility.”

Oh dear, this one explicitly points out the significance of shared servers – but I suppose, if we’re going to be democratic it only gets one vote whereas note 223299.1 gets three votes – so this note can presumably be ignored.

The next quote is from Tom Kyte’s AskTom website – and has even highlighted some lines about having to set the sort_area_size when using shared servers. But never mind, on a simple vote count we can still ignore sort_area_size.

The next quote is also from Tom Kyte’s AskTom website – and fails to warn about the special case of shared servers. Tsk, tsk, Tom. (By the way, if you follow the link, you will find that the quoted answer has been amended, with a reference to Burleson for spotting the omission).

The final quote is this:

Jonathan Lewis notes: “You don't need to delete sort_area_size to use the pga_aggregate_target.  You need only make sure that you have set workarea_size_policy to AUTO.  It is possible that it has been left there because some sessions invoke alter session set workarea_size_policy = manual;  In the absence of a specific value, this would cause such sessions to fall back to 64K”

Now this is an author who is usually pretty dependable, so I thought I’d follow the link to see exactly what the article said. Unfortunately, the link didn’t work, so I had to do a google search (try searching the google news groups on the three words: pga_aggregate_target sort_area_size lewis). Since it was a reference to a newsgroup item, the whole thread came back, and this is what is said:

Question from original poster:

I hava an installation of Oracle 9.2 with pga_aggregate_target

set in init.ora. As this parameter replaces e.g. sort_area_size

and hash_area_size and other pga relevant parameters in init.ora

(in dedicated server systems), I don't know why my predecessor

let this parameters set in init.ora.

 

They seem to be used setting this memory pools, becaus using

"show parameter xxx" prompts the assigned values.

 

I think I have to delete this parameters (sort_area_size,...

and all other pga relevant parameters) to handle pga_aggregate_target

correct.

 

What do you think about?

My answer – to a very specific question about a specific system for a specific DBA:

You don't need to delete sort_area_size to use the pga_aggregate_target.

You need only make sure that you have set workarea_size_policy to AUTO.

It is possible that it has been left there because some sessions invoke

      alter session set workarea_size_policy = manual;

In the absence of a specific value, this would cause such sessions to

fall back to 64K

Follow-up comment by Connor McDonald:

Can't remember if its 9.0 or 9.2, but I think one or both of those

versions doesn't support pga_agg_ for shared server and thus falls back

to s_a_s et al.

Follow-up comment by me:

Good point - big memory allocations for shared servers

are outside the scope of pga_aggregate_target because

some components of the memory have to be in the UGA,

which is kept in the SGA.

 

I think it's still true for 9.2 - by the way.

Haven't thought about checking it for 10.1 yet.

I wouldn’t bother to make heavy weather of this particular quote – except you have to wonder how Don Burleson managed to miss the references in the original question to dedicated server, and the comments in the follow up to the shared server (of course, some researchers do have a knack of only seeing the evidence that they want to see). It makes it tough, of course, to decide whether to make this reference count in favour of the Don hypothesis, or against it. That’s the trouble with quoting out of context and putting in a link that doesn’t work – some people just spoil things by checking up on you.


Hints on research

After reading Don’s list of references, there are four important points that I think need to be stressed to anyone who is going to claim to be an expert:

·         Do the research before publishing, not afterwards.

·         If your paper research produces contradictory comments, democratic principles won’t usually get you to the right answer – you have to test.

·         It only takes one solid counter-example to invalidate your claims.

·         If the facts contradict the theory, then the theory is wrong.


Conclusion

I haven’t got to the conclusion yet – but I’ve run out of time.

I still want to address the ‘success stories’, and comment on the changes to the example that I quoted in my previous article, but that will have to wait until next week. And maybe by then Don Burleson will have found an expert to test his example, and correct it for him.

But in passing, here’s a funny little thing: Mike Ault’s example (which I think could be the basis for a very interesting article that I’d love to see him write) starts with these parameter values:

hash_join_enabled             TRUE
pga_aggregate_target          629145600
shared_pool_size              218103808
sort_area_size                524288

which are then adjusted to the following:

_pga_max_size                 314572800
hash_join_enabled             TRUE
pga_aggregate_target          1978906181632
shared_pool_size              83886080
sort_area_size                524288

Note the sort_area_size.  The default value for sort_area_size is 64KB – so this is a system with the sort_area_size set in the init.ora or spfile.

Sorry, Mike, according to Don Burleson you don’t understand PGA management ;(


Part 2 – (4th Sept 2005)

In the original version of the Burleson article, we saw the following example.

In a system like the example in Figure 1, the following settings would increase the default sizes for large sorts and hash joins.

·         pga_aggregate_target = 1000m

·         _pga_max_size = 1000m

·         _smm_px_max_size = 333m

With these hidden parameters set we see a 5x large size increase for parallel queries and sorts:

·         A RAM sort or hash join may now have up to 50 megabytes (5% of pga_aggegate_target) a 5x increase.

·         Parallel queries may now have up to 330 megabytes of RAM (30% of pga_aggegate_target), such that a DEGREE=4 parallel query would have 83 megabytes (333 meg/4).

In the first critical review of this article I demonstrated that you didn’t need either of the hidden parameters set to get a 50MB workarea for a serial sort, and then demonstrated that the limit on the parallel queries was not the 83MB predicted by Burleson, but nearer the 50MB dictated by the serial limit.

Burleson has now updated his article, which reads as follows (4th Sept – the copy from his own website):

In a system like the example in Figure 1, the following settings would increase the default sizes for large sorts and hash joins.

·         pga_aggregate_target = 4g

·         _pga_max_size = 300m

·         _smm_px_max_size = 333m

With these hidden parameters set we see a 4x large size increase for parallel queries and sorts:

·         A RAM sort or hash join may now have up to the full 200 megabytes (5% of pga_aggegate_target) a 400% increase.
 

·         Parallel queries may now have up to 333 megabytes of RAM (30% of pga_aggegate_target or _smm_px_max_size), such that a DEGREE=4 parallel query would have 83 megabytes (333 meg/4).

The system in the referenced figure 1 apparently had enough memory for a pga_aggregate_target of 1GB according to the Burleson methodology, so let’s check what the baseline parameter values are when we set the pga_aggregate_target to that value and make sure that the related hidden parameters are not set:

    alter system set pga_aggregate_target = 1g;

    alter system reset "_pga_max_size" scope = spfile sid = '*';

    alter system reset "_smm_max_size" scope = spfile sid = '*';

    alter system reset "_smm_px_max_size" scope = spfile sid = '*';

    startup force

When we select back these parameters we get:

    pga_aggregate_target       1073741824  (  1 GB)   (target for all PGAs summed across all processes)

    _pga_max_size               209715200  (200 MB)   (target for PGA for a single session)

    _smm_max_size                   52428  ( 50 MB)   (limit for a single workarea operation in a single process)

    _smm_px_max_size               314571  (300 MB)   (limit for a single workarea operation summed across its PX slaves)

Before we do anything else, let’s set the pga_aggregate_target to 4GB and see what happens to the hidden parameters.

    alter system set pga_aggregate_target = 4g;

    startup force

After restarting the database we see:

    pga_aggregate_target       4294967296  (  4 GB)

    _pga_max_size               209715200  (200 MB)

    _smm_max_size                  102400  (100 MB)

    _smm_px_max_size              1258290  (1.2 GB .. ca. 1228 MB)

So let’s set the parameters according to the values indicated by Burleson, and see what we get after restarting the database:

    alter system set pga_aggregate_target = 4g;

    alter system set "_pga_max_size" = 300m;

    alter system set "_smm_px_max_size" = 340992;           -- 333MB expressed as KB

    startup force

After restarting, this is what we see:

    pga_aggregate_target       4294967296  (  4 GB)

    _pga_max_size               314572800  (300 MB)

    _smm_max_size                  153600  (150 MB)

    _smm_px_max_size               340992  (333 MB)

So, compared to the default values from the 1 GB setting, the effect of the Burleson tuning advice is to increase the _smm_max_size from 50MB to 150MB (not the 200MB that he was expecting), and to increase the _smm_px_max_size from 300 MB to 333 MB - which is not the “4x large size increase for parallel queries and sorts”. By setting this hidden parameter after increasing the pga_aggregate_target, Burleson has choked the parallel execution slaves, not allowed them to ‘super-size’ – which was the idea suggested in the title of the original article (“Undocumented Secrets for super-sizing your PGA”).

The bit that is nearly correct is that each PX slave in a parallel query of degree 4 should be able to get 83 MB for it’s share of a workarea operation – but that didn’t need any hidden parameters set, it became a possibility the moment the _smm_max_size went over 83 MB, and that only needed the pga_aggregate_target to exceed a value of about 1660MB.

Test! Test! Test!

I’ve made a couple of claims about sizing – I really ought to show you something of the tests I did to check that my claims were true. I’ll show two things – first that with the Burleson parameters, the maximum workarea for a serial sort stops at 150MB; then I’ll show that with a pga_aggregate_target of (just over) 1660 MB and no hidden parameters set, I can get four parallel slaves using a workarea of 83MB each.

The following tests were run through a dedicated server on 9.2.0.6, with an 8K block size, and workarea_size_policy = auto.

Serial limits

In my previous review of Burleson’s article, I showed you a script that I used to generate some data. In that example I created a table of 10 million rows with a column of varchar2(6). Change the script to generate just 8 million rows instead of 10 million then do the following:

    set pause on

 

    select sortcode

    from    t1

    order by sortcode

    ;

While this query runs, repeat the following from another session – which may have to be connected INTERNAL on your system:

    select

        operation_type,

        work_area_size,

        max_mem_used,

        tempseg_size

    from

        v$sql_workarea_active

    ;

Since I was the only user on the machine, this picked up just the one workarea relating to the sort. You should see the work_area_size grow to something a bit short of 150 MB (in my case it peaked at 133,992,448) then drop back to a few megabytes as a dump to the temporary segment starts. The final output from the query (after the original sort had produced its first array of results and paused) was:

    OPERATION_TYPE       WORK_AREA_SIZE MAX_MEM_USED TEMPSEG_SIZE

    -------------------- -------------- ------------ ------------

    SORT                        5079040    133996544    101711872

The fact that the memory allocation peaked at 128 MB was an interesting little puzzle – but my initial guess is that as the memory allocation grows, the increment that the session uses also grows (you may spot this pattern as you repeatedly query v$sql_workarea_active). When I allowed the session to acquire 200MB for the work area, my queries against v$sql_workarea_active showed the work_area_size jumping from 128 MB to 154 MB then to 184 MB (at which point the sort completed in memory).  Perhaps it was the fact that the ‘next step’ was 154MB that made my first test stop at 128 MB.

The reason I suggested 8 million rows for the table was that this would allow me to get an in-memory sort (i.e. workarea execution – optimal, as the statistics describe it) by increasing the _pga_max_size to 400 MB, hence allowing the _smm_max_size to reach 200 MB. Bear in mind you need to examine problems from at least two directions (“if I’m right what should happen” and “what else could happen that would make me look right even when I’m wrong”) – I wanted to make sure that I had a data size that could use a 200 MB memory allocation if it had been available.

Parallel limits

For my second test, I want to show that a pga_aggregate_target of about 1,660 megabytes will be enough to get a memory allocation of 83 MB in a query running parallel degree four. (You may recall that Burleson wanted to use a target of 4GB – on a machine which he explained had only one gigabyte available for such activity: I would prefer to tell Oracle the truth if possible, otherwise it might try to use the memory you’ve promised it, and push your system into heavy disk activity as swapping becomes necessary).

As a baseline, we leave the Burleson settings in place, re-create my data set but take it up to 14 million rows, and run the following query:

    set pause on

    select /*+ parallel (t1 4) */

        sortcode

    from

        t1

    order by sortcode

    ;

Again, we monitor v$sql_workarea_active to see what happens. The following results came out as the sort started to spill to disc:

    OPERATION_TYPE       WORK_AREA_SIZE MAX_MEM_USED TEMPSEG_SIZE

    -------------------- -------------- ------------ ------------

    SORT                     75,617,280   64,331,776

    SORT                     75,617,280   69,672,960

    SORT                     75,617,280   68,935,680

    SORT                     75,617,280   75,620,352   19,922,944

And this is what things looked like as the sort completed:

    OPERATION_TYPE       WORK_AREA_SIZE MAX_MEM_USED TEMPSEG_SIZE

    -------------------- -------------- ------------ ------------

    SORT                      2,031,616   75,620,352   41,943,040

    SORT                      2,031,616   75,620,352   45,088,768

    SORT                      2,031,616   75,620,352   44,040,192

    SORT                      2,031,616   75,620,352   48,234,496

We didn’t actually get the 83 MB predicated by Burleson – again (according to my guess) the next expected increment in memory allocation would have taken us over the limit - the previous allocation was 62,486,528, so my guess was that the next would probably have been a little over 89,000,000. Having made the guess, I had to check it, of course, so I ran a separate test with the _smm_px_max_size = 400M, and found that the next allocation would have been: 91,505,664).

To do the test that justifies my claim – I reset all the hidden parameters, and set the pga_aggregate_target to 1,750 MB (after seeing that 91,505,664 was the next sort increment in the experiment above, I assumed that I needed just over 20 times this to allow the slaves in my test to reach the target). These are the results I got from v$sql_workarea_active as my parallel query ran:

Just as the sort starts to spill to disk

    OPERATION_TYPE       WORK_AREA_SIZE MAX_MEM_USED TEMPSEG_SIZE

    -------------------- -------------- ------------ ------------

    SORT                     91,505,664   83,427,328

    SORT                     91,505,664   77,840,384

    SORT                     91,505,664   84,295,680

    SORT                     91,505,664   91,504,640    5,242,880

Then as the sort completes:

    OPERATION_TYPE       WORK_AREA_SIZE MAX_MEM_USED TEMPSEG_SIZE

    -------------------- -------------- ------------ ------------

    SORT                     83,779,584   83,779,584

    SORT                     78,168,064   78,168,064

    SORT                     84,656,128   84,656,128

    SORT                      2,031,616   91,504,640   48,234,496

Coincidentally, three of the slaves managed to do an optimal sort and then free a little memory, and just one slave had to spill to disk and switch to a one-pass sort, releasing a lot of memory as it did so.

So there you have it: Don Burleson was wrong in his conclusions about how his tweaking of the hidden parameters would affect the serial memory operation; and he didn’t need to tweak the hidden parameters at all or push his pga_aggregate_target to such an over-committed level to get the parallel effect he claimed. And in either case, he clearly hadn’t tested his suggestions.


Conclusion 2

I still haven’t finished with the article, so I still haven’t come to a conclusion – there’s still a lot to say about the errors in this article. But I’m not going to bother pursuing the errors any more. I think most readers will have got the idea by now that Don Burleson may know less about PGA memory management than the DBAs he condemns (incorrectly) in his opening paragraphs. I will, however, examine the “success stories”, and what you can learn from them, in a future article.

Just one last word about how to correct errors, though:

My first complaint about the article was due to the inflammatory, and defamatory, comment:

For example, the following set of parameters indicates that the Oracle DBA does not understand PGA management.

·         sort_area_size=1048576 <-- sort_area_size is ignored when pga_aggregate_target is set and when workarea_size_policy =auto.

The “corrected” article addresses this casual condemnation of DBAs who might be using shared servers (MTS) by introducing a section of “Important Caveats” which includes the comment:

These PGA rules no not apply to shared server environments using Oracle multi-threaded server (MTS).  However, the vast majority of Oracle shops do not use the MTS.

It seems that Burleson thinks that it’s okay to make massively misleading comments that could cause strife and problems so long as he is only insulting or damaging the reputation of people in a minority group. Would it really have been so much effort to change the opening paragraphs – or was a dramatic flourish so much more important than the truth?


Addendum (7th Sept)

Amazing! The article has been re-written again or, to use Burleson’s own words:

“I have enhanced this tip again, including more clarifications and a 230 page artificial test case to show that my original tip observations were substantially correct.”

In this context, I believe the word “enhanced” has been used with its special meaning of “corrected”. I do agree, however, that some of the observations in the original article were “substantially correct” I would go so far as to say “absolutely correct”– specifically: there are some undocumented parameters that few people understand, and changing the _pga_max­_size, _smm_max_size, and _smm_px_max_size can affect the memory available for workarea operations. The devil (as the saying goes) is in the detail. (Oct 2005 – Since I was adding a note to this page, I thought I would also point out at the same time that this paragraph is an example of irony - apparently some people didn’t realise that.)

You may want to read the 230 page test case – it’s just a highly verbose report of Mike Ault’s implementation of the tests I described above. You can probably skip pages 18 – 88, that bit is just a listing of the database parameters. The listing of 1,961 rows of output to prove that a parallel execution slave can acquire a certain amount of memory is also a little verbose – that’s what the max_mem_used column is for and you only have to capture it once by freezing the sort at a critical moment. The report is actually an 8-page report, which has been padded with a huge volume of massively redundant material.

The good thing about the state of the article is that a number of the errors have been corrected. In particular, the arbitrary condemnation of DBAs not understanding PGA management because of some parameter settings has been replaced with the much milder comment:

“For example, the following set of parameters may be mutually-exclusive:”

·         sort_area_size=1048576 <-- sort_area_size is ignored when pga_aggregate_target is set and when workarea_size_policy =auto, unless you are using a specialized feature such as the MTS. If dedicated server connections are used, the sort_area_size parameter is ignored.

·         pga_aggregate_target = 500m <-- The maximum default allowed value is 200 megabytes, this limits sorts to 25 megabytes (5% of 500m).

·         mts_servers<>0 <-- If Multi-threaded server is being used, the pga_aggregate_target setting would be ignored in all versions except Oracle10g.

Not only is the comment less contentious, we even get the explanation that the sort_area_size is still relevant (prior to 10g) when using shared servers (MTS). Of course, we do still see the strangely meaningless note beside the pga_aggregate_target line. Moreover, the most significant reason for setting both the sort_area_size and the pga_aggregate_target is exactly when you have mts_servers <> 0 – so this set of parameters is almost certainly NOT going to be mutually exclusive.

[Side-note: since MTS is a deprecated term in 9i, we really ought to refer to shared servers and set the shared_servers parameter rather than the old mts_servers parameter – but old habits die hard.]

Alas, we also see the comment that the pga_aggregate_target setting is ignored if shared servers (MTS) are being used. This is sloppy thinking – and a common error.

·         First – a system is not exclusively configured for shared server (MTS) or dedicated server, you can have sessions connecting using either technology.

·         Secondly – even when a session connects through a shared server, the instance will still be monitoring its PGA usage and account for it against the pga_aggregate_target, and the Cost Based Optimizer (CBO) will still base its calculations on the limits dictated by the PGA-related parameters - even though those limits are not relevant when the query runs.

The casual implication that an entire instance is either dedicated or shared – but not both – is something to be avoided.

Reading through the rest of the ‘tip’ you will find an amazing number of corrections and added details – in fact, most of the comments I made in the earlier parts of this critique seem to have occurred to Burleson as things that needed “enhancement”.

However, despite Mike’s valiant efforts to re-implement my test cases, Burleson still manages to say this:

For example, by default a DEGREE=4 parallel query would have a maximum sort area value of 15 megabytes per session with a 200 megabyte pga_aggregate_target setting.

Now it may be a coincidence, it may just be bad luck, but I didn’t mention this particular error in my initial critiques – it was just one of several little errors that I didn’t bother to point out. But for some reason, it hasn’t been “enhanced”.

If the pga_aggregate_target is 200 MB then the _smm_max_size is 10 MB and the _smm_px_max_size is 60 MB. Burleson makes his 15 MB comment based on dividing 60 MB by 4 slaves – but 15 is greater than 10, so the slaves only get 10 MB. As I said in the first critique of Burleson’s article:

The trivial error that Don has made here is in assuming that the _smm_px_max_size is the upper limit on the sum across the slaves of the memory for a single work area operation. Clearly he hasn’t tested this or even thought about the associated parameters. Each slave is limited by the _smm_max_size – the 5% of the pga_aggregate_target that Don mentions elsewhere in his article. The limiting effect of the _smm_px_max_size doesn’t kick in until you start running at a degree of parallelism greater than six. (The 30% parallel limit allows for 6 times the 5% serial limit ).

Unfortunately, of course, Mike’s test case was incomplete. He didn’t set up a parallel test where the _smm_px_max_size / degree was less than the _smm_max_size, so he missed a critical detail. Looks like 230 pages just wasn’t enough :(


Criticism requires care (17th Sept 2005)

If you are going to condemn the work of other people in public, it is important to be extremely careful with the facts. If you fail to exercise extreme care, you are likely to end up with egg on your face.

Consider, for example, this comment made on the AskTom forum by someone claiming to be Harry Conway, employee of Burleson Computing and impending author for Rampant Techpress:

What is really irritating is that the rigged “proof” offered by Jonathan Lewis that uses shared servers, a fact hidden way-down into the text.  C’mon, who uses the MTS on a PC?  I’ve been a DBA for over 10 years and hardly anybody uses MTS anymore.

Two lines of text, three exhibitions of sloppy thinking. Let’s take it from the bottom up:

I’ve been a DBA for over 10 years and hardly anybody uses MTS anymore

Is there a logical connection between the two halves of that statement? No, not really. Compare it with this: “I’ve been driving a cab for the last 10 years, and hardly anyone uses LPG.” It sounds as if there may be some connection, some privileged information – but there isn’t. I knew of one person who spent 14 years sitting in an office looking after one IMS database for one company – if he had said “I’ve been using IMS for 14 years and hardly anybody does {feature X} with IMS anymore” I would have laughed. He didn’t go to conferences, he didn’t go to user group meetings, he didn’t even have anyone else in the office to talk to about IMS – all he ever did was read novels, and fix programs when they crashed.  (Of course, he wouldn’t have made such a silly ‘proof by false appeal to authority’, he just got on with the job and wasn’t worried about trying to impress people).

Now here’s a statement that (a) is logically sound, and (b) carries some weight: “I presented a seminar about implementing Oracle three weeks ago, and asked the audience how many of them were using shared servers (formerly MTS) in production systems, and 10% of them put their hands up.” (Personally, I wouldn’t claim that as a statistically significant sample, as I restrict the audiences for the seminar, and there were only 20 attendees).

               C’mon, who uses the MTS on a PC? 

It’s a funny thing how you can shoot yourself in the foot when waving a gun around too freely. In this case, twice. My original critique said: “Funnily enough, I happen to be running a 9.2.0.6 instance with exactly these settings, plus workarea_size_policy = auto.”  It doesn’t say what hardware the instance was running on. Now I have a licence for Oracle Enterprise Edition, with upgrade support, and a MetaLink account – and when I ask for upgrades on CD, Oracle sends me binaries for HP-UX – they don’t run on a PC.

Why should Harry Conway assume that any database I run has to be on a PC? “Don’t judge others by yourself” is an old adage. Has Harry assumed that I only have a PC to run Oracle because that’s all he (and the company he works for) uses. Does he only get his hands on anything else when it belongs to someone else?  I can’t imagine why else he would assume that my instance has got to be running on a PC.

But there’s more. Even if I did only have a PC for testing Oracle, does Harry Conway think that Oracle on a PC is completely different from Oracle on a Unix box ? Does MTS (or shared server to give it the correct name) use completely different strategies for memory on PCs for some strange reason?  Might have to raise an iTAR on that one if that’s what the evidence suggests.

What is really irritating is that the rigged “proof” offered by Jonathan Lewis that uses shared servers, a fact hidden way-down into the text.

Poor, poor Harry- he’s made a Don mistake. If you make a claim, someone’s going to check it. Here’s the original set of observations:

Here are a few results. Note especially that I started a new session for each test to avoid clouding the figures.

Sort_area_size

Approximate change in
UGA max memory

Approximate change in
PGA max memory

1 MB

1 MB

Nil

2 MB

2 MB

Nil

4 MB

4 MB

Nil

So the sort_area_size does have some relevance – even when all the new pga mechanisms are correctly enabled.

Maybe you’re not ignorant, maybe you do understand PGA management, maybe you (like me) are running with shared servers (formerly multi-threaded servers) and know that for Oracle 9i, the older parameters are still used to limit the memory that gets allocated in the SGA (which is where the UGA is going to be held when using shared servers).

Note where the comment about running shared servers (formerly multi-threaded servers) appears: seven lines after test, two lines after the test results; one line after the claim that sort_area_size does have some relevance. And Harry thinks that’s ‘way-down’ into the text. Is that a short attention span, or is it something completely different?

Let’s be generous and count from the line which reads: “So let’s build some sample data, and see what happens if I play around with the sort_area_size:“. On my screen, that’s a massive 41 lines before you hit the critical line – if I print it, it’s just one sheet (A4 size, 15mm borders). Let’s bend over backwards to be nice, and count from the very top line of the article. On my screen, the critical line now appears 61 lines down– and that’s allowing for the heading, two separator lines, and two sub-headings. It fits on just two sheets of paper.

Sad news for Mike Ault – it doesn’t look like Harry Conway is going to get very far through that 230-page proof when 61 lines is a long read.


Note 1: I haven’t bothered to create a link to the AskTom thread where this comment appears. Most of the technical content is already available in the critique and the previous one; most of the rest of the thread comes from a number of individuals (largely anonymous) who persistently declined to address the technical questions, and seemed to be unable to understand the answers to the non-technical questions that they kept repeating.

Note 2: I haven’t commented on the grammatical errors in the extract from Harry Conway’s posting as I appreciate that it is easy to make errors in spelling, grammar and continuity when typing real-time responses on a forum.  I believe I have interpreted his statements in the way he intended them to be read.


Addendum  22nd October 2005

I’ve been sent an email that made me realise that the following comment (quoted from Undocumented Secrets of Untested Fairytales) needed some clarification:

The limiting effect of the _smm_px_max_size doesn’t kick in until you start running at a degree of parallelism greater than six. (The 30% parallel limit allows for 6 times the 5% serial limit ).

The author of the email interpreted this comment to mean that for operations running with a degree of six or less, the _smm_px_max_size would be ignored, and the _smm_max_size would be applicable. This is not the case. The text would probably have been less ambiguous if I had said:

Assuming you haven’t been fiddling with hidden parameters, you will not be able to see the effect of the _smm_px_max_size until you run a query with a degree of parallelism greater than six because the limit set by _smm_max_size will be override the limit set by _smm_px_max_size. The limit set for each slave in the operation is least(_smm_max_size, _smm_px_max_size/degree of parallelism).”

For example – assume pga_aggregate_target = 1000MB, the default _smm_max_size is therefore 50MB, and the default _smm_px_max_size is 300MB. So we get the table (with thanks to a sharp-eyed reader who spotted a silly typo in the original):

Degree of parallelism

_smm_max_size

_smm_px_max_size / degree

Limit per slave

4

50 MB

75MB

50MB

6

50 MB

50MB

50MB

7

50 MB

42.8MB

42.8MB

As you can see, it is only when _smm_px_max_size/degree is less than _smm_max_size that you can see the effect of the PX parameter. Which, for the default settings, means the degree of parallelism has to be greater than six.

I have copied this comment into Undocumented Secrets of Untested Fairytales, where the original statement came from.


Addendum  21st January 2006

I’ve been send a note by Jeff Moss (blog site: http://oramossoracle.blogspot.com). He had been doing some research for a presentation he was preparing for the UKOUG Business Intelligence SIG on performance tuning for data warehouses, and in reading this article he had noticed that the previous addendum was still incomplete, and could mislead.

The point I had overlooked, of course, was that there is a hidden limit built in to _pga_max_size (200MB for 9.2) which imposes a limit on _smm_max_size (100MB in 9.2) but doesn’t impose a limit on the _smm_px_max_size.

For example, if you set your pga_aggregate_target to 4,000MB in a 9.2 instance, then your _smm_px_max_size would be 1,200 MB (30% of pga_aggregate_target) but your _smm_max_size would be only 100MB (smaller of 5%, and _pga_max_size/2).

Consequently, the limiting effect of the _smm_px_max_size would only become visible (in this example) once your degree of parallelism exceeded 12 – until that point, every slave would be allowed to get up to 100MB for a single operation; go above that level, and the maximum for a slave would be 1,200/degree of parallelism.

I have copied this comment into Undocumented Secrets of Untested Fairytales, where the same addendum appears.


Back to Index of Topics