JL Computer Consultancy

A slightly unusual use of pipeline functions

July 2006


A large part of my job is to find ways to make databases “go faster” without making any significant changes to structure or code. “Significant” is a bit vague, of course, but typically means things like “invisible to the application” as far as the structure is concerned, or “very localized” if code is involved, and always means “minimum risk, minimum side-effects”.

I have come across two examples recently where short data-loading scripts could be modified very locally to reduce the workload resources used – particularly CPU and disk activity, quite dramatically. It isn’t a mechanism that will be universally appropriate – but for the right data pattern it’s certainly worth considering.


The requirement is quite simple.  We had data loads where rows belong to ‘groups’, need to be given some form of ‘group id’ as they get into the database. There are many ways to implement this type of requirement, with various costs involved. One option is to load the data, create a summary (or parent) table for the loaded data, then use the summary table to generate a sequence-based id. Another option would be to play around with analytic functions, partitioning on the “group” columns, and using a mechanism to bring in a “group id” on the first row of each “group”.

Every option introduces its own drawbacks – the first example above, for example, may require you to do too much work populating and updating real (or perhaps global temporary) tables; the second option may do excessive amounts of sorting as a side effect of using analytic functions if the rows are wide rows.

SO, as an extra weapon in the armoury, here’s a method that uses pipeline functions to minimise the I/O and CPU costs of sorting or using tables. The example is given as a script with in-line comments.  The notional task is to add a sequence number to rows, based on grouping values by multiple columns.


rem

rem   Script:           pipe_sequence.sql

rem   Author:           Jonathan Lewis

rem   Dated:            June 2006

rem   Purpose:    Getting a sequence number onto grouped data.

rem

rem   Last tested

rem         10.2.0.1

rem         9.2.0.6

rem   Not tested

rem         10.1.0.4

rem   Not relevant

rem         8.1.7.4

rem

rem   Notes:

rem   There are various ways you might group data, and then

rem   add a sequence number (or other unique identifier) to

rem   rows that belong to the same group.

rem

rem   This case was based on a 'store card' number used to group

rem   together purchases that occured within a short time period

rem   of each other.  (The real system defined as visit as all

rem   purchases within 10 minutes of each other - which required

rem   playing games with rounding on a time interval).

rem

rem   Other options

rem         use of analytic functions

rem         creating global temporary tables

rem         calling functions that return either nextval or currval

rem

rem   Enhancements:

rem         Change the function to use pl/sql array fetches (bulk collect)

rem

 

 

drop table t1;

 

drop sequence s1;

drop type jpl_table_type;

drop type jpl_scalar_type;

 

 

rem

rem   We will be collating rows with the same value for columns

rem   group_1 and group_2, and giving them a single sequence value

rem

 

create sequence s1;

 

create table t1 as

select

      rownum                              id,

      mod(rownum-1,10)                    group_1,

      lpad(mod(rownum-1,10),10,'0')       group_2,

      trunc((rownum-1)/10)                data1,

      rpad('x',100)                       padding

from

      all_Objects

where

      rownum <= 100

;

 

rem

rem   A pipeline function needs an array type as its return,

rem   a corresponding scalar type for each row created

rem

 

 

create type jpl_scalar_type as object(

      group_1     number,

      group_2     varchar2(10),

      seq_id      number

);

/

 

create type jpl_table_type as table of jpl_scalar_type;

/

 

rem

rem   We select distinct the columns that are the identifying

rem   columns – by hiding this query in a pipeline function we

rem   a) can use the nextval construct

rem   b) avoid creating an intermediate table

rem   c) minimise the width of the row being sorted

rem

 

create or replace function pipe_fun

return jpl_table_type

pipelined

as

begin

      for r in (

            select group_1, group_2, s1.nextval seq_id

            from (

                  select distinct group_1, group_2

                  from t1

                  order by group_1, group_2

            )

      )

      loop

            pipe row (jpl_scalar_type( r.group_1, r.group_2, r.seq_id));

      end loop;

      return;

end;

.

/

 

rem

rem   To demonstrate how to use the mechanism we join the

rem   the pipeline function to the base table on the grouping

rem   columns – which allows us to append the sequence value

rem   to the end of the row in (apparently) one step

rem

 

break on seq_id skip 1

column padding noprint

 

select

      /*+ ordered use_hash (t) */

      v.seq_id,

      t.group_1,

      t.group_2,

      t.id,

      t.data1,

      t.padding

from 

      table(pipe_fun)   v,

      t1          t

where

      t.group_1 = v.group_1

and   t.group_2 = v.group_2

order by

      v.seq_id, t.group_1, t.group_2, t.id

;

 

rem

rem   Note – there is an “implementation detail” in 9i

rem   which means you hit three rows in the dictionary

rem   cache every time you fetch a row from the pipeline

rem   function. This is a concurrency threat that has been

rem   eliminated in 10g.

rem


Back to Index of Topics