Parallelism, plans, and testing
Addendum (7th April 2006)
I was recently sent a note about a blog entry which made the following comments:
One of the tables in a SQL involved in a match against values passed into a global table temporary table had a PARALLEL setting of 4, but the instance had a parallel min servers setting of 0 so there was no way that a parallel scan could be performed.
Using the command line explain plan you get a perfectly reasonable plan using an index scan on the permanent table and a full scan on the global temp. However, when you check on this SQL in the V$SQL_PLAN table it is attempted a parallel query with a single query slave doing a full table scan on both the permanent and temporary table!
The developer, using the PL/SQL Developer tool, was getting the bad explain plan and couldn’t see the problem. We changed the PARALLEL to 1 on the table and the plans then matched between the online explain plan and the V$SQL_PLAN explain plan.
This was followed with two pieces of advice to Oracle Corporation:
If the PARALLEL is set on a table and the MIN_PARALLEL_SERVERS is set to 0, completely disregard the PARALLEL setting. In testing a setting of 0 resulted in no P000 processes starting so even with the resulting parallel plan the query was serialized.
The plan generated by the EXPLAIN PLAN and AUTOTRACE commands should match the one generated into the V$SQL_PLAN table.
I considered sending an email to the blogger suggesting he remove this item, as it contained so many errors; but decided that it would be more useful to review the item. It’s an interesting piece of psychology, but many people find it easier to understand by having errors explained, rather than hearing a description of what goes on when something goes right. Anyway, the author presents himself as an Oracle expert so he should be robust enough to cope with a review.
So, taking the first paragraph of each section::
parallel min servers setting of 0 so there was no way that a parallel scan could be performed
If the PARALLEL is set on a table and the MIN_PARALLEL_SERVERS is set to 0, completely disregard the PARALLEL setting.
But parallel_min_servers dictates the minimum number of parallel execution slaves per instance that should be kept alive irrespective of how long they have been idle. There is a similar parameter parallel_max_servers which dictates the maximum number of slaves that may be started on an instance – and if this number is non-zero then any query involving a parallel-defined object may generate a parallel plan.
As a quick check – I created a table with degree 4 on my 184.108.40.206 system, and ran an explain plan for “select * from tab” against it. With parallel_max_servers set to 24, the plan included a parallel component. With parallel_max_servers set to 0, the plan was a serial plan.
Looking at the more subtle complaint, then, I tried to emulate the blogger’s requirements. A copy of the execution plans would have been more informative than the descriptions given, but I think the following may be the sort of thing the blogger was complaining about:
create global temporary table gtt1
on commit preserve rows
select * from all_objects
where 1 = 2
create table t1
where rownum <= 10000
create index t1_i1 on t1(owner);
-- Collect statistics here, including a histogram on column owner.
-- Put a little data into the GTT (optional step)
insert into gtt1
where rownum <= 100
set autotrace traceonly
from t1, gtt1
where t1.owner = 'SYS'
and t1.data_object_id = gtt1.data_object_id
from t1, gtt1
where t1.owner = 'FRED'
and t1.data_object_id = gtt1.data_object_id
set autotrace off
I ran this little script under 220.127.116.11, using a locally managed tablespace, with freelist management, 8KB block size, db_file_multiblock_read_count set to 8, system statistics disabled, and no funny settings for optimizer_index_cost_adj and optimizer_index_caching. These are the two execution plans I got (and you can check that they are the plans that actually were used by checking the 10046 trace files and v$pq_tqstat and sundry other bits of information).
For the query involving “SYS” as the target owner
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=4690 Bytes=1013040)
1 0 HASH JOIN* (Cost=37 Card=4690 Bytes=1013040) :Q439002
2 1 TABLE ACCESS* (FULL) OF 'T1' (Cost=20 Card=477 Bytes=41976) :Q439001
3 1 TABLE ACCESS* (FULL) OF 'GTT1' (Cost=17 Card=8168 Bytes=1045504) :Q439000
SELECT /*+ ORDERED NO_EXPAND USE_HASH(A2) */ A1.C0,A2.C0,A1.C1,A1.C2,A1.C3,A1.C4,A1.C5,A1.
A2.C9,A2.C10,A2.C11,A2.C12 FROM :Q439001 A1,:Q439000 A2 WHERE A1.C0=A2.C0
SELECT /*+ NO_EXPAND ROWID(A1) */ A1."DATA_OBJECT_ID" C0,A1."OWNER" C1,A1."OBJECT_NAME" C2
,A1."SUBOBJECT_NAME" C3,A1."OBJECT_ID" C4,A1."SECONDARY" C5,A1."OBJECT_TYPE" C6,A1."CREATE
D" C7,A1."LAST_DDL_TIME" C8,A1."TIMESTAMP" C9,A1."STATUS" C10,A1."TEMPORARY" C11,A1."GENER
ATED" C12 FROM "T1" PX_GRANULE(0, BLOCK_RANGE, DYNAMIC) A1 WHERE A1."OWNER"='SYS' AND A1.
"DATA_OBJECT_ID" IS NOT NULL
For the query involving “FRED” as the target owner
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=20 Card=1 Bytes=216)
1 0 HASH JOIN (Cost=20 Card=1 Bytes=216)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=88)
3 2 INDEX (RANGE SCAN) OF 'T1_I1' (NON-UNIQUE) (Cost=1 Card=1)
4 1 TABLE ACCESS (FULL) OF 'GTT1' (Cost=17 Card=8168 Bytes=1045504)
Note – simply by changing one literal input, we get two different plans – one parallel, one serial. The serial plan seems to match the description the blogger gave for serial execution “an index scan on the permanent table and a full scan on the global temp” and the parallel plan seems to meet the description the blogger gave for the parallel execution “it is attempted a parallel query with a single query slave doing a full table scan on both the permanent and temporary table” (I’m not sure how you get any information about the “single query slave” from v$sql_plan, though).
The blogger states that the developer got the serial plan, and production got the parallel plan, and that Oracle Corp should sort things out so that the plans from autotrace and explain plan match the plans from v$sql_plan. But I got two different plans from (effectively) the same query – in this case it’s the data that changed the plan, not the tools. Possibly the data set the developer was using was totally different from the data set the production system was using – and that’s something that makes life difficult with cost based optimisation. (In passing, if you’ve ever heard the comment that you only re-analyze the data to make the plans change – it’s not true. Sometimes you re-analyze the data to make the plans stay the same because the data has changed).
Bear in mind, just to make things harder for developers to predict production plans, 9i does bind variable peeking – the first set of bound values dictates the execution plan. On a good day production might run with a serial plan for hours before losing that plan to a shared pool flush, then switch to the parallel plan on the next execution, because of an unlucky set of bind values.
Then 10g goes one further – you don’t have statistics on global temporary tables (in general), and 10g is set, by default, to sample statistics on any object that doesn’t have statistics. Maybe the developer didn’t have any data in his GTT when trying to determine the plan, and maybe the production GTT had thousands of rows at the first execution.
One final thought – Just because one table in a query is declared parallel, that doesn’t mean the query HAS to run parallel. It means that a plan that does a full tablescan on that specific table will operate as a parallel tablescan (assuming that some parallel slaves are supposed to be available). This may mean that other components of the plan go through a serial_to_parallel manipulation as well, but that’s not guaranteed.
If you think that Oracle is doing something completely stupid, it probably isn’t. It’s worth trying to strip the problem back to it’s bare essentials and building a test case. A simpler example might allow you to spot the detail that you’ve overlooked on the production system.
Remember that explain plan (and autotrace, which is just explain plan in disguise) can lie – for many different reasons. If you mix global temporary tables, parallelism, bind variables, and dynamic sampling you are very likely to see explain plan disagreeing with v$sql_plan; you are also likely to see different run-time plans depending on actual data patterns at the moment the query is optimized..
One of the surprising benefits of reviewing and dissecting published material is that it seems to prompt far more email from people wanting to discuss the issues (rather than email of the type: “my code doesn’t work please fix it for me”.) Following the comments above, I got an email referencing this item on AskTom which says:
Parallel query is not effective against a global temporary table since the parallel execution servers run in separate sessions and would not be able to 'see' the data you put into the global temporary table ...
Now parallel query and fine-grained access control used to have exactly this problem in early releases. But since I had been running parallel queries against GTTs for the demonstrations in this note I was a little surprised to see this comment.
Reading on, however, I came to Tom’s demonstration code which was supposed to show a difference in behavior between an ordinary table and a global temporary table. Both behaved the same way – until I ran the test on an 8i system.
Good things, test cases; they let you find out that problems have gone away (or appeared) on an upgrade.
Of course sometimes it’s easy to be fooled by test cases. The example given by Tom Kyte suggested watching v$px_sesstat for the parallel activity – but if the query runs too quickly the stats could disappear before you can see them so you might want to query v$pq_tqstat from the coordinator session as well to see the volumes of data moved.
Here’s a simple and critical demonstration, though:
create global temporary table gtt1 (n1 number);
alter table gtt1 parallel (degree 4);
Try this with 8i, and you get Oracle error “ORA-14451: unsupported feature with temporary table”. The error doesn’t appear with my 18.104.22.168.
Things do change from one version of Oracle to the next. That’s why it’s a good idea to construct sample code supporting your theories. It means you can repeat your tests easily or even allow other people to repeat them for you so that you can keep your knowledge up to date.
Funnily enough, if you check the SQL Reference for 9.2 (A96540-1) you even find the following under the Create Table command (p15-24):
Parallel DML and parallel queries are not supported for temporary tables. (Parallel hints are ignored. Specification of the parallel_clause returns an error.)
Simple test cases can even show that the manuals are wrong.