ࡱ> 8:7Z#` 0Dbjbj\.\. ]>D>D<>>>>>>>Z>Z>Z>Z>n>DO0>>>>>???1G3G3G3G,_GJN$NPhRN>@??@@N>>>>N\F\F\F@.>>>>1G\F@1G\F\FF[>>G>> teZ>,DF%G N0OFSFFSGS>G ?0?"\F@@???NNFF???O@@@@"">>>>>> The Parallel Query Option in Real Life (Spring 1997) In the previous article in this series, I introduced the basic mechanism of using the Parallel Query Option. This article moves on to discuss when to use it, benefits, features, monitoring and problems. Any examples in this article are based on Oracle 7.3.2.3 running on HP-UX 10.20 Using the Parallel Query Option. With facing any special feature of the Oracle RDBMS, there are always four basic questions to ask yourself: When should I use it ? How do I design for it ? How do I monitor it ? What makes it break ? In this article I hope to give you some answers for these questions. When to use it: To recap from the previous article: The Parallel Query Option operates by rewriting an end-user query in such a way that several copies of the code can run simultaneously to produce the required final result. For maximum benefit the copies should not interfere with each other. The starting point of the query is (usually) a tablescan. Given these conditions, there are basically 3 scenarios which you need to review as possible candidates for using the Parallel Query Option Simple tablescans Small parent scan to large child Object Creation Simple Tablescan This can be used to address two types of problem. In one case typical queries may retrieve only a small number of rows, yet the nature of the queries is such that indexing does not help - (you may find that bitmapped indexes, available in later versions of 7.3, may be worth investigating for this type of problem). In the other case, typical queries may need to retrieve and crunch a large number of rows to return a small result set. The Parallel Query Option can be very effective in reducing the time for such queries - on the plus side you have the options of increasing parallel throughput on disks and breaking up the sort requirements to avoid sorts to disk - on the minus side you need to consider the risk of unsuitable data distributions that result in excessive communication and sort costs. Small Parent Scan to Large Child: The obvious use of this option is with the recently introduced HASH JOIN facility where two tables are scanned and the smaller is hashed to be used as a target for the larger and most of the benefit comes from the ease of separating the I/O on tablescanning. However another option (particularly of use in conjunction with partition views) is in scanning a small table and using indexed accesses into a large table where the physical data distribution is suitably packed. Object Creation: When running in parallel, particularly with the unrecoverable option, you may find that the time to create an object (table or index) is significantly reduced. But there are a couple of side-effects (see further down) to worry about when chasing this benefit. How to design for it: Remember that the parallel query option is likely to introduce a level of concurrent processing your system has never seen before. The single most critical feature of improved performance on parallel operations is that these concurrent processes should not interfere with each other at the disk level. So the first point of design is to get lots of small discs rather than a few very large discs. Once you have plenty of discs you have to get the data onto them in ways that will minimise interference, maximise individual throughput, and not be a maintenance nightmare. The first step would be to earmark a few discs for special use, e.g. redo, archives, temp, transient tables, after which the rest of the disks should be treated in a fairly uniform manner. My preference is to use logical volume managers and stripe a number of discs (typically 5 to 10 in a set) at the level of the operating system. The stripe size should be in the 1M region (i.e. 1Mb on each disc, not 1MB spread across the set) though this would vary depending on the size and nature of the largest objects to be created. Depending on the features of the LVM, I may stripe different parts of the system differently, but keep the number of such variations small. The purpose of the exercise is to ensure that when a parallel query starts each slave will hit a different disc and run several 64K multi-block reads before jumping on to the next disc - i.e. the highly desired maximum throughput per disk with minimum interference between disks. Monitoring in Development: In pre-production monitoring the most useful tools are the sql_trace facility, and the dynamic view v$pq_tqstat. The former tells you what the slaves have done, the latter tells you how much they are talking to each other. Since parallel query slaves are allocated dynamically and can be created on demand I find that leaving sql_trace (and timed_statistics) true in the init.ora file is the most sensible thing. Once I have acquired any useful information from a l query slave I am happy to kill it as it will be respawned automatically if needed. The contents of v$pq_tqstat can best be described through a picture. Fig .1 shows the way that Oracle handles: select /*+ parallel (t1,1,3) */ nice_column, count(*) from pq_good t1 group by nice_column; The first layer of slaves reads the discs, doing whatever sorting and summing they can before the n-squared (in our case 9) lines of communication pass the results on to the second layer of slaves, which do further sorting and summing before passing the results to the query co-ordinator.  EMBED Word.Picture.6  fig 1 Parallel Slave Activity In Oracle-speak, the entire process is called a data-flow operation (DFO), and each layer of communication uses a set of table-queues (TQs). Each time a session runs a parallel query it creates a new set of entries in v$pq_tqstat recording under a new DFO number the statistics of the way each server participated at each level of table queues. Fig. 2 shows a query to run against v$pq_tqstat. break on dfo_number on tq_id skip 1 on server_type skip 1 select dfo_number,tq_id,server_type,instance,process,num_rows,bytes from v$pq_tqstat order by dfo_number,tq_id,server_type desc, instance, process ; fig 2 Query against v$pq_tqstat Fig. 3 shows the output corresponding to fig.1. The output shows a single DFO built from two TQs. The lower TQ consists of 3 tables, the upper of a single table. At the first level the slaves p000, p001 and p002 produce 72 rows and 2,331 bytes which are consumed by slaves p003, p004 and p005. At the second level, the 3 consumers become producers, reducing the data slightly to pass on 59 rows and 638 bytes to the query co-ordinator. DFO_NUMBER TQ_ID SERVER_TYP INSTANCE PROCESS NUM_ROWS BYTES ---------- ---------- ---------- ---------- ---------- ---------- ---------- 1 0 Producer 0 P000 32 1011 0 P001 23 742 0 P002 17 578 Consumer 0 P003 23 741 0 P004 24 794 0 P005 25 796 1 Producer 0 P003 22 236 0 P004 16 176 0 P005 21 226 Consumer 0 QC 59 638 fig 3 Output from v$pq_tqstat For users of the Parallel Server, this query also lists the instance on which each parallel slave ran so that you can get an idea of how much cross-instance communication took place. To identify a bad query, watch out for a large number of rows being passed from a producer to a consumer. There are three reasons why this is bad: First is the cost of communication (particularly for users of the Parallel Server), second is the degree of contention between producers as they all try to fill the consumers queues, and third is the risk that so much data is passed to consumers they have to direct their sorting operations to disk. It is not obvious from the results above but the data distribution in this (very small) example was designed to be appropriate for Parallel Query. The first indication comes from the marginal change in data volume between the first and second layers of processing, confirmation comes from examining trace files. Looking at fig. 4, which is the tkprof output from the trace file for query slave P000, we can see that the code generated by the Query Coordinator has been executed 3 times (which is the typical minimum for a PQ slave) and has returned a total of 31 rows (it always seems to be one less that the figure given in v$pq_tqstat) Since the end result of the original query was 59 rows the 3 executions in this trace file could have returned 177 rows (3 x 59). The difference between the actual and theoretical volume of data returned is a good indication that the first layer of processing has been effective and that the query (or rather the data distribution) is appropriate for parallelism. SELECT /*+ PIV_GB */ A1.C0 C0,COUNT(*) C1 FROM (SELECT /*+ ROWID(A2) */ A2."NICE-COLUMN" C0 FROM "PQ_GOOD" A2 WHERE ROWID BETWEEN :1 AND :2) A1 GROUP BY A1.C0 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 3 0.00 0.00 0 0 0 0 Fetch 4 0.00 0.00 0 908 12 31 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 8 0.00 0.00 0 908 12 31 Rows Execution Plan ------- --------------------------------------------------- 0 SELECT STATEMENT GOAL: FIRST_ROWS 38907 SORT (GROUP BY) 38907 FILTER 38907 TABLE ACCESS GOAL: ANALYZED (BY ROWID RANGE) OF 'PQ_GOOD' fig 4: trace file of slave p000 Monitoring in Production: In production, the most important thing is to identify which slave belongs to which front-end session. Once this is possible, all the usual methods of identifying which session is doing how much work can be applied. The clue to tracking down the Parallel Query slaves is in v$lock and v$session. To acquire a slave, the Query Co-ordinator must take out locks of type PS, which can be seen in v$lock. select sid, id1,id2,ctime from v$lock where type = PS and lmode = 6; sid is the session of the QC id1 is the instance the slave is on id2 is the slave number ctime is the number of seconds the QC has owned the slave. As a slave starts working for the Query Co-ordinator, it acquires an entry in v$session, but copies the audsid, and several other columns from the Query Co-ordinators v$session entry. So, if you get a SID from v$lock, you can pick up the corresponding audsid, and track down all sessions which are using the same audsid. This approach requires you to assume that the PNNN that is the name of a slave can always be matched to the numeric NNN that appears as the id2 in v$lock. Relational purists (and the more cautious DBAs) may object to this not-quite-relational approach. An alternative approach is to re-write the SQL used in the definition of the dynamic view v$pq_slave, which is a simple view on the x$kxfpdp internal table. Column x$kxfpdp.kxfpdppro corresponds to the addr column in v$process, so a view definition that exposes this column makes for a much more elegant and solid way to track from PQ slaves back to sessions. You might also want to modify the v$pq_slave view to give the various busy/idle/CPU times in seconds rather than minutes. The view is otherwise quite useful in production, although unfortunately the most useful column (CPU used in the current session) is not updated in real time. Features of PQO: There are a number of features of PQO that may take you by surprise. Object Creation: each query slave involved thinks it is unique - so it uses the storage parameters of your SQL in complete ignorance of all the other slaves. As an extreme example of the problems this can cause: guess what happens when you kick off 10 slaves to create a table with minextents set to 20 on a system is running with a blocksize of 2K. Eventually the Query Coordinator tries to coalesce 200 extents into one segment - but hits a hard limit of 121 !! On a more realistic front, there are two issues - first, every single PQ slave that has to sort to disc will demand its own sort space in the TEMPORARY tablespace; secondly, when the Query-Coordinator coalesces the separate segments created by slaves it trims off unused space, which means that you may calculate an object to need exactly 24 extents of 10MB each, and find that a 6-way parallel creation has produced 21 extents of 10MB, 6 of random sizes, and 6 holes in your tablespace. Another possible problem with parallel object creation lies in its effect on data distribution. We have already seen how physical distribution of data can have a big effect on the suitability and efficiency of the Parallel Query option. If you use parallel creation to generate large, permanent, summary tables you may find that the beneficial clustering of your detail data gets lost by the load balancing that takes place across the parallel query slaves. Profiles: You may create a profile that limits a user to 30 CPU seconds per call. If the user is cunning and decides to run a query in parallel each slave gets its own 30 CPU second limit. Make sure that you include a suitable sessions_per_user value in the profile. Bugs: There are a number of bugs in parallel query execution which require patches or backports from 7.3.3 and 7.3.4. Until you are running on one of these newer versions make sure you test business-critical classes of query with real volumes of data, and then you may still want to review the tkprof and trace files very carefully. Some examples of problems I have had with parallel execution are: Parallel queries on views of the form: select col1,col2,0,0 from table1 union all select 0,0,col3,col4 from table2; fail because columns get lost. Some classes of hash-joins and sort-merge joins running in parallel join on the wrong columns - sometimes this fails, sometimes it returns totally meaningless answers, in the worst case it returns answers that are wrong but look about right. A query which makes the Query Co-ordinator generate an SQL statement with more than 255 bind variables, or longer than 64K fails. (Be wary of date-oriented queries). Nested loop joins into partition views fail to eliminate redundant partitions. Conclusion: Where it works properly, the Parallel Query Option can be an incredible tool for getting vast amounts of work done very quickly. However it requires some careful design and coding strategy, and currently it has a number of bugs that can waste an awful lot of your time. Jonathan Lewis is a freelance consultant with 11 years experience of Oracle. He specialises in physical database design and the strategic use of the Oracle database engine, but spends most of his time resolving performance issues. He can be contacted on 0973-188785, or e-mailed at jonathan@jlcomp.demon.co.uk Spring 1997 &45S T u   /it'LMdefghbmh)n)**/ 0"1)1-161112222.373\3b333c4i4*545T5\5u55Ż h56 hj$6 hU(j$6 hOJQJUVmHnHujhU h5 h>* h5>* h6ho$hE56R S T u  ( > ? i j w x  & Fgdo$D   ]^ & Fgdo$0XY'(KLgP ^``PQY   :!;!!!!"e"""L#M###4$$a$`4$5$6$$$%%k%l%%%%C&D&&& ( (((F)G)H)**+++),$a$),.,,,,,E---8.....0/]/w///////000111`$a$11111 202I222A3B3334496:6T7U7f77799i;j;6= 0^`0`55555\6f6U7e7778=A=F>K>-?9?YBdBwCDDD ho$6 h6 h>* h5>* h5 h56h6=7=8=E>F>?????!@+@M@l@m@_A`ABBWBYBeBBBvCwCxCDD`3 0@PBP. A!"#8$8%88 7 0@PBP. A!"#8$8%88 P 3 0@PBP. A!"#8$8%88 7 0@PBP. A!"#8$8%88 P 3 0@PBP. A!"#8$8%88 7 0@PBP. A!"#8$8%88 P 3 0@PBP. A!"#8$8%88 7 0@PBP. A!"#8$8%88 P 3 0@PBP. A!"#8$8%88   !"#$%&'()*+,-.01234569IU=>?@ABCDEFGHWKLMNOPQRSTVXY\]^_`abcdefghijklmnopqrstuvwxyz{|}~Root Entry  F;e;Data /WordDocument ]ObjectPoolte;e_921347620 FtetePIC LMETA <tCompObjh  "L&!  FMicrosoft Word Picture MSWordDocWord.Picture.69qOh+'0  D P \ ht|& 6  &WordMicrosoft Word  System    -Times New Romank~wWw  - -- A--<Times New Romank~wWw -2 SQL*Plus" -'-- --- 2 OracleSID (QC)!  "-'-- 1--,-2  Ora_p003_SID!-'-- ---2  Ora_p004_SID!-'-- j--n-2 y Ora_p005_SID!-'-- 1\--,`-2 b Ora_p000_SID!-'-- \--`-2 b Ora_p001_SID!-'-- \j--`n-2 by Ora_p002_SID!-'->}>--- $%W>^--->------>>>-->3-->I-->---- --> --  -- -->>------>z--z--z-->>------>j--jT--jW-->0--0C--0F--l>O>--- $-ZOZ>0---l|4--- $%F 0---l4{--- $F%\0--&R &"--M"----"---.--.""- & -.||--|q--|- & & R &--M-------.--.- & -.>>-->3-->I- & & R &--M[----[---.ZZ--.- & -.---- - & -ObjInfo WordDocumentJSummaryInformation( DocumentSummaryInformation8 ܥhc eojjjj  M1# # /~X0MMjj.j(=jjjj SQL*Plus Ora_p001_SID Ora_p002_SID Ora_p000_SID Ora_p005_SID Ora_p004_SID Ora_p003_SID OracleSID (QC) NN|,!20("0!!(K200&@&"PK20- .aaa$a!K2$@a!K2&`A@ ((K2&A@ ((K2^&A@ )(K20$ .aaa$a!K2$@a!K2&`A@ ((K2&A@ ((K2^&A@ )(K20  .aaa$a!K2$@a!K2&`A@ ((K2&A@ ((K2^&A@ )(K200&&&! (*(K200&&! (*(K200&@&&! (*(K200&0!0*  )(K200&`0*  )(K200&0*QP)(K200&`'0*QP)(K20 0&@&0*QP)(K20 0&`0*QP)(K20 0&.0*)(K20 0&@&0*)(K20 0&`0*)(K220(p#p,!(K220(+p,!(K220(p,!(K220(+(!(K220(p#(!(K220((!(K220(" %!(K2!#/1=?KMY[giuw uU uDa"#01>?LMZ[hivwzzzzzzK@Normala "A@"Default Paragraph Font &4BP^nq    &VFv 6 h    0 b !w@OfficejetLPT1:winspoolOfficejetOfficejetpB A4l\DRIVERS\W32X ''''OfficejetpB A4amicb '''' 1Times New Roman Symbol &Arial"ph&&!0Jonathan LewisJonathan LewisormifiJonathan Lewisorm&tyNormalJonathan Lewis2Microsoft Word for Windows 95 @@@s=@7u=՜.+,0HPpx  JL Computer Consultancy@ Oh+'00l      Parallel Query (JLComp)Jonathan LewisDOracle parallel query option tuning performance trouble-shooting4Notes on using the parallel query option in Oracle Normal.dotJonathan1Table[SSummaryInformation(DDocumentSummaryInformation8TCompObj!q Lewis5Microsoft Office Word@G@\E>@CVM@Z^e 3՜.+,0$ px  JL Computer Consultancyn< Parallel Query (JLComp) Title  FMicrosoft Office Word Document MSWordDocWord.Document.89q``` Normal5$7$8$9DH$(CJOJQJ^J_HaJmH nHsH tHJ`J Heading 1$./@&a$ 5>*\<`< Heading 2@& 5>*\8`8 Heading 3@&5\DA@D Default Paragraph FontVi@V  Table Normal :V 44 la (k@(No List F@F Normal Indent0^`0<+`<  Endnote TextCJaJ4@4 Header  o#D`"D  Footnote Text ^5\>O2> Double Indent ^2oB2 Action 5>*\>ob> programCJOJQJ^JaJT#'w;<\5\n\\\]J]]]56RSTu(>?ijwx       ] ^ 0XY'(KLgPQY :;eLM456klCD F!G!H!""###)$.$$$$$E%%%8&&&&&0']'w'''''''((())))))) *0*I***A+B+++,,9.:.T/U/f///11i3j3657585E6F677777!8+8M8l8m8_9`9::W:Y:e:::v;w;x;<<0000h0000 0 0 0 000000000000000 0 0 000000000000000000h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h00000000000000000000000000000000000h000056RSTu(>?ijwx       ] ^ 0XY'(KLgPQY :;eLM456klCD F!G!H!""###)$.$$$$$E%%%8&&&&&0']'w'''''''((())))))) *0*I***A+B+++,,9.:.T/U/f///11i3j3657585E6F677777!8+8M8l8m8_9`9::W:Y:e:::v;w;x;<< 00000000 0 0 0 000000000000000 0 0 0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h0h000h00000000000000h000000h000000000000000000000h000000000000000h000000000000000000000000h0000000000000000000000000000000000000000000000000000000000h00005D#, P4$),16=D$&'()*+-D%Ldg<:8@0(  B S  ?; H ; Kb; C`'''<'''<=*urn:schemas-microsoft-com:office:smarttags PlaceType=*urn:schemas-microsoft-com:office:smarttags PlaceName9*urn:schemas-microsoft-com:office:smarttagsplace XM  ktx5>it %bm)-2=HY mp~h!n!""$$d%h%}%~%%%%%%%%%%%%%& &&&")()-)6)))))))))))I*N*****.+7+\+b+++++c,i,,,*-4-T-\-u------\.f.00016.666m7s7<Tt #'6=hkQWcflo !!$!$;$A$$$$$&&6'?''')))))))))) **0*3*I*N*y1178!8&8+818M8Q8<33333333333333333333333333333333333333333333<<<ZL/*;M