Identifying PQ slaves that won't die (Oracle 7.3).
When configuring for parallel query execution under Oracle 7.3, there are 3 main parameters to set -
The maximum number of Parallel Query slaves the instance can start.
The number of Parallel Query slaves the instance should try to keep alive.
How long (in minutes) a Parallel Query Slave can stay alive if it is idle and there are more live slaves than the number specified in parallel_min_servers..
There are all sorts of interesting quirks in the Parallel Query Option which vary across different versions of Oracle. However there are two bugs which, individually, are relatively painless but combine to cause real problems.
Bug number 1 - massive memory consumption - there have been numerous bugs which left Oracle consuming vast amounts of memory in parsing, particularly parsing a partion view using bitmap indexes. According to the latest patch notes these bugs have all been fixed from around 7.3.3 onwards). However even after all the acknowledged bugs were fixed, I still saw processes running up to 80MB (over their baseline requirement) whilst parsing - and I have seen a query hit 400 Mb in parsing before the machine ran out of virtual memory. There is a simple, generally viable, workaround to the problem, just set the idle time to 1 minute, and the min servers to 0 and the PQ slaves die off very quickly and release the memory back to the operating system. This can disclose bug number 2 . . . .
Bug number 2 - parallel query slaves not dying after their idle time was up. Worse, they would stay alive, but Oracle would think they were dead, refuse to use them, refuse to free their slots in the v$pq_slave structure, and run up more parallel query slaves with the same name. The suggested work-around to this problem from Oracle was to set parallel_max_servers equal to parallel_min_servers, which puts you straight back in the path of bug number 1.
My compromise solution to this pair of problems was to set parallel_min_servers to zero because I definitely wanted all the slaves to die off when idle; I then set an idle time of 5 minutes so that I idn't get too many inaccessible slaves. To make sure they disappear as soon as possible I then scheduled a script based on the one below. As you can see it checks the value of the idle_time by looking at v$parameter, and then lists all slaves which have exceeded their idle time - but any slave over the idle time should be dead.
I actually use a modified version of v$pq_slave which reports time in seconds and the v$process address of the query slave, and then have a script that produces a file holding a string of 'kill -9 pid' commands, executes it, and follows it up with a pmon wakeup call.
By running this every 15 minutes (you could use cron, or a simple shell 'while true, sleep, loop' script) we keep the system fairly clear of defunct but expensive parallel query slaves.
rem Script: bad_slaves.sql
rem Author: Jonathan Lewis
rem Purpose: Identify PQ slaves past their idle time
rem The user has to have select rights on v$parameter
rem to find out the idle time set for PQ slaves
rem Slaves in excess of parallel_min_servers, with an idle
rem time (in minutes) greater than parallel_server_idle_time
rem should be shut down automatically. (This is only true if
rem parallel_min_servers = 0. If some slaves are allowed to be
rem permanently idle you have to come up with a diferent algorithm
set pause off
set termout on
set linesize 80
set pagesize 22
set newpage 0
set trimspool on
column value new_value m_idle_time
select value from v$parameter
where name = 'parallel_server_idle_time'
set termout on
idle_time_cur > &m_idle_time
order by slave_name