JL Computer Consultancy

Assessing the effectiveness of your last parallel query

February 1999


The design aim of the parallel query option is to get better throughput on large queries by reducing the time spent on numerous small I/Os at the cost of using excessive CPU. Sometimes, however, the expected performance benefit does not appear and you need some simple tools to identify why not.

The attached script (with sample output and notes on the output at the end of the page) is one such tool. Oracle holds a log of recent inter-process communication in the session local memory, and the v$pq_tqstat performance monitoring view makes this visible. The log 'pushes down' the stats from previous queries so the easiest way to use this script is to connect to a new session, run your query, then execute this report.

Interpreting the report takes some experience, especially for complex queries with multiple parallel branches; even sychronising the table queues (tq values) with the queue-ids listed under the node column of explain plan can be difficult. However, for very simple queries, there are a couple of general points to note:

Whatever Explain Plan tells you, it may be lying; so if the report returns no rows your query did not do any parallel execution.

You may find that there are layers highlighted in the report (individual table queues) where the workload is severely imbalanced. Perhaps 4 producers in one table queue process several thousand rows each and the rest process none. In this case the effective parallelism is lower than the actual parallelism, possibly due to unlucky data distribution.

Sometime you will find that the bytes per row returned is very high - the parallel query option actually passes aggregate values around using a full 22 bytes rather than the minimum possible, also there are various bugs (or design srategies) in all versions of the parallel query option that result in excessive repetition of aggregate columns.

In one amazing case I found one query passing round rows of 1200 bytes between slaves when the required output could have been achieved in about 70 bytes per row; the problem was largely due to a column 'count(distinct area_code)' appearing seventeen (17) times in the slave SQL. When this occurs you need to look at the query-rewrites produced by the PQO so see if you can restate your basic query in a way that will reduce the PQO overhead.

PQO also has a problem in many versions with 'aggregate pushdown', a feature where sum(), count() and other aggregate functions are supposed to be executed as close to the data as possible, but instead are postponed until a higher layer in the DFO. This can result in a huge volume of message passing, and massive waste of CPU. If this is happening then the total number of rows in the bottom layer producer set is likely to be about the same as the number of rows in your table. Again the only solution is to see if you can rewrite you basic query to bypass a problem with the PQO. (Although Oracle 8 does allow you to increase the size of messages that can pass between slaves from its default of ca. 2K and this can help a bit).

Back to Main Index of Topics


rem
rem     Script:        pq_tq.sql
rem     Author:        Jonathan Lewis
rem     Purpose:       Reports parallel query table queue stats 
rem                    for the current session
rem
rem     Usage:
rem            Execute a parallel query, then run this script.
rem            The inter-process volumes and identity of slaves used
rem            will be dumped in a file called pq_tq.lst
rem            The most recent query will be reported as the first 
rem            set of stats (dfo_number = 1)
rem
clear columns
clear breaks
set pause off
set linesize 80
set pagesize 55
set newpage 0
set trimspool on
column dfo_number      format       999 heading "dfo"
column tq_id           format       999 heading "tq"
column server_type     format       a10 heading "Serv Type"
column num_rows        format 9,999,999 heading "Rows"
column bytes           format 9,999,999 heading "Bytes"
column waits           format    99,999 heading "Waits"
column instance        format       999 heading "Instance"
column process         format       a12 heading "PQ Slave"
break on dfo_number skip 1 on tq_id skip 1 on report
spool pq_tq
select 
        dfo_number,
        tq_id,
        server_type,
        num_rows,
        bytes,
        waits,
        instance,
        process
from 
        v$pq_tqstat
order by 
        dfo_number,
        tq_id,
        server_type desc,
        instance,
        process
;
spool off

Back to Main Index of Topics


Notes on the Sample Output

The bottom layer producers (tq = 0) have all done about the same amount of work and distributed their results fairly evenly to the bottom layer of consumers, so the real degree of parallellism is close to the effective degree of parallelism. There is a bit of a skew, both at the producer and consumer levels, so the query is not going to scale perfectly.

The number of bytes passed per row is about 20 at all levels, so we don't have a problem with multiple copies of aggregate columns etc.

The number of rows passed from the bottom layer producer to its consumers is quite high, and when the consumers turn around to become producers in the second table queue (tq = 1) the drop in the number of rows produced is dramatic, ca. 2000 : 1.

Is this a case of aggregate push-down not working ? We have to go back to the table to answer this question: since the table held 90,000 rows and the bottom layer producers pass 90,000 rows we have a problem: if, on the other hand, the table was supposed to supply a couple of million rows we could be quite happy that the producers were already doing a reasonable job of aggregating results as they read them from the table.

Back to Main Index of Topics


Sample output

 
 dfo   tq Serv Type        Rows      Bytes   Waits   Instance PQ Slave
---- ---- ---------- ---------- ---------- ------- ---------- ----------     
   1    0 Producer       14,402     287844       2         35 P000
          Producer       15,874     334833       5         36 P000
          Producer       15,947     336105       6         37 P000
          Producer       14,120     297413       2         38 P000
          Producer       16,774     353905       7         41 P000
          Producer       13,457     284588       3         42 P000
          Consumer       15,862     331399      43         35 P002
          Consumer       14,952     312333      48         36 P002
          Consumer       12,954     270682      36         37 P001
          Consumer       13,223     275648      53         38 P001
          Consumer       16,321     343649       2         41 P001
          Consumer       17,262     360977       5         42 P001
        1 Producer           84       1566      43         35 P002
          Producer           83       1541      48         36 P002
          Producer           69       1293      36         37 P001
          Producer           78       1454      53         38 P001 
          Producer           83       1556       2         41 P001
          Producer           86       1614       5         42 P001
          Consumer          483       9024       9          7 QC

Back to Main Index of Topics