JL Computer Consultancy

Identifying PQ slaves that won't die (Oracle 7.3).

February 1999

When configuring for parallel query execution under Oracle 7.3, there are 3 main parameters to set -


The maximum number of Parallel Query slaves the instance can start.


The number of Parallel Query slaves the instance should try to keep alive.


How long (in minutes) a Parallel Query Slave can stay alive if it is idle and there are more live slaves than the number specified in parallel_min_servers..

There are all sorts of interesting quirks in the Parallel Query Option which vary across different versions of Oracle. However there are two bugs which, individually, are relatively painless but combine to cause real problems.

Bug number 1 - massive memory consumption - there have been numerous bugs which left Oracle consuming vast amounts of memory in parsing, particularly parsing a partion view using bitmap indexes. According to the latest patch notes these bugs have all been fixed from around 7.3.3 onwards). However even after all the acknowledged bugs were fixed, I still saw processes running up to 80MB (over their baseline requirement) whilst parsing - and I have seen a query hit 400 Mb in parsing before the machine ran out of virtual memory. There is a simple, generally viable, workaround to the problem, just set the idle time to 1 minute, and the min servers to 0 and the PQ slaves die off very quickly and release the memory back to the operating system. This can disclose bug number 2 . . . .

Bug number 2 - parallel query slaves not dying after their idle time was up. Worse, they would stay alive, but Oracle would think they were dead, refuse to use them, refuse to free their slots in the v$pq_slave structure, and run up more parallel query slaves with the same name. The suggested work-around to this problem from Oracle was to set parallel_max_servers equal to parallel_min_servers, which puts you straight back in the path of bug number 1.

My compromise solution to this pair of problems was to set parallel_min_servers to zero because I definitely wanted all the slaves to die off when idle; I then set an idle time of 5 minutes so that I idn't get too many inaccessible slaves. To make sure they disappear as soon as possible I then scheduled a script based on the one below. As you can see it checks the value of the idle_time by looking at v$parameter, and then lists all slaves which have exceeded their idle time - but any slave over the idle time should be dead.

I actually use a modified version of v$pq_slave which reports time in seconds and the v$process address of the query slave, and then have a script that produces a file holding a string of 'kill -9 pid' commands, executes it, and follows it up with a pmon wakeup call.

By running this every 15 minutes (you could use cron, or a simple shell 'while true, sleep, loop' script) we keep the system fairly clear of defunct but expensive parallel query slaves.

Back to Main Index of Topics

rem     Script:        bad_slaves.sql
rem     Author:        Jonathan Lewis
rem     Purpose:       Identify PQ slaves past their idle time
rem     Notes:
rem     The user has to have select rights on v$parameter
rem     to find out the idle time set for PQ slaves
rem     Slaves in excess of parallel_min_servers, with an idle
rem     time (in minutes) greater than parallel_server_idle_time
rem     should be shut down automatically.  (This is only true if
rem     parallel_min_servers = 0.  If some slaves are allowed to be
rem     permanently idle you have to come up with a diferent algorithm
clear columns
clear breaks
set pause off
set termout on
set linesize 80
set pagesize 22
set newpage 0
set trimspool on
column value new_value m_idle_time
select value from v$parameter
where name = 'parallel_server_idle_time'
set termout on
        idle_time_cur > &m_idle_time
order by slave_name

Back to Main Index of Topics