JL Computer Consultancy

Partitioning in 2 dimensions. (Oracle 8.1.5)

June 1999


Some time ago I published a note pointing out that one feature - multi-dimensional partitioning - gave partition views in 7.3 an edge over partitioned tables in 8.0. This defect has been addressed to some extent in Oracle 8.1 with the introduction of hash sub-partitioning, known as Composite partitioning. This note gives an example of the technique.

Addendum Jun 23rd - How to predict hashing positions for strings

Assume I have a large set of sales data which I wish to break down into more manageable units. To allow for ease of archiving (or eliminating) very old data I choose to partition the data by date with a month as the unit of granularity. However, to break the data down further, I also choose to split my (UK) sales into 4 areas, 'England','Ireland','Scotland', and 'Wales'.

Under simple range partitioning in 8.0 I had a dilemma, which is described in the earlier article, about whether to create a partitioning key of (date, country) or (country, date). In 8.1 I can bypass this dilemma by introducing a second level of partitioning within the date based partitions. The partitioning mechanism is a hashing mechanism and you are advised to use a number of sub-partitions per partition that is a power of 2. In my case I used the following code to create a table which breaks each partition into 4 sub-partitions

You might note the standard trick on date based partitioning - the top partition should always be one which you don't plan to put data into just yet, and its boundary should.be MAXVALUE - so that you can split it quickly and painlessly when the time comes.


rem
rem     Script:        c_tab.sql
rem     Author:        Jonathan Lewis
rem     Dated:         June 1999
rem     Purpose:       Create table with subpartitions to examine 2-dimensional partitioning.
rem
rem     We are aiming to partition by date in one direction, and by a very small
rem     list of countries (or any fixed, discrete, list of values) in the other
rem
rem     The subpartition names will of the form SYS_SUBPnnn (surprisingly the numeric
rem     'nnn' bit is not zero padded to several digits to get fixed length names).
rem
rem     Sub partitions nnn+1, nnn+2 nnn+3, nnn+4 will be Jan
rem     Sub partitions nnn+5, nnn+6 nnn+7, nnn+8 will be Feb etc.
rem
rem     The subpartitions for a country will be:
rem            nnn+1, nnn+5, nnn+9, nnn+13
rem
rem     We want to see if a query for a single country, just accesses
rem     the smallest set of 5 partitions, and not the whole table.
rem
CREATE TABLE sales (
        country        varchar2(10),
        sale_date      date,
        product        number(6), 
        qty            number(4),
        value          number(8,2),
        store          number(4)
)
        PARTITION BY RANGE (sale_date)
        SUBPARTITION BY HASH(country) SUBPARTITIONS 4
        (
               PARTITION p_1999_jan VALUES LESS THAN (TO_DATE('01-Feb-1999', 'dd-mon-yyyy')),
               PARTITION p_1999_feb VALUES LESS THAN (TO_DATE('01-Mar-1999', 'dd-mon-yyyy')),
               PARTITION p_1999_mar VALUES LESS THAN (TO_DATE('01-Apr-1999', 'dd-mon-yyyy')),
               PARTITION p_1999_apr VALUES LESS THAN (TO_DATE('01-May-1999', 'dd-mon-yyyy')),
               PARTITION p_1999_may VALUES LESS THAN (maxvalue)
        );

In this example I have kept the definition as simple as possible. I don't have any indexes, I don't try to give names to the sub-partitions, I don't allow different partitions to have different numbers of sub-partitions, and I don't try to allocate sub-partitions to different tablespaces. Nor do I include any of the esoteric features of partitioned tables that became available in 8.1 - e.g. LOBs in partitions, partitioned IOTs.

The code above generates 20 segments - one for each sub-partition. Note: the partitions themselves do NOT have any independent physical existence in the database, instead the data dictionary records the fact that partition p_1999_jan (say) is made up of the 4 physical partitions SYS_SUBP17, SYS_SUBP18, SYS_SUBP19, and SYS_SUBP20.

After creating the table I loaded some data into it and analyzed the table. One of my standard tricks for loading data is to call on the ALL_OBJECTS view, or ALL_SOURCE view and play silly games with the rownum - both views are reasonably large, so this is a quick way of generating a comfortable amount of data.


rem
rem     script:        c_data.sql
rem     Author:        Jonathan Lewis
rem     Dated:         Jun 1999
rem     Purpose:       Get data into 4 x 4(5) subpartitioned table
rem
rem     Notes:  
rem            There are ca. 10,500 items in my view ALL_OBJECTS.
rem            hence ca. 2500 end up in each country/month segment (we hope).
rem
rem            NB  No rows go into the MAY (MAXVALUE) partition
rem
insert into sales 
select
        'England',
        to_date(19990101,'YYYYMMDD') + mod(rownum,120),
        mod(rownum,40),        -- product
        mod(rownum,3),         -- quantity
        mod(rownum,99),        -- value
        mod(rownum,25)         -- store
from
        all_objects
;
insert into sales
select
        'Ireland',
        to_date(19990101,'YYYYMMDD') + mod(rownum,120),
        mod(rownum,40),        -- product
        mod(rownum,3),         -- quantity
        mod(rownum,99),        -- value
        mod(rownum,25)         -- store
from
        all_objects
;
insert into sales
select
        'Wales',
        to_date(19990101,'YYYYMMDD') + mod(rownum,120),
        mod(rownum,40),        -- product
        mod(rownum,3),         -- quantity
        mod(rownum,99),        -- value
        mod(rownum,25)         -- store
from
        all_objects
;
insert into sales
select
        'Scotland',
        to_date(19990101,'YYYYMMDD') + mod(rownum,120),
        mod(rownum,40),        -- product
        mod(rownum,3),         -- quantity
        mod(rownum,99),        -- value
        mod(rownum,25)         -- store
from
        all_objects
;
commit;
analyze table sales compute statistics;
 

The results of the ANALYZE command highlight one of the drawbacks to this method of 2-dimensional partitioning. As you will see from the figures below, two of the sets of data ended up hashing to the same value, and so ended up sharing sub-partitions. (England and Scotland were the culprits).

Unlike HASH CLUSTERS, where you can specify your own hash function or hash column, hash sub-partitioning does not (yet) allow the user to define how the hashing should be implemented.

For a very small list of values, like mine, this is a bit of a nuisance - it means that different qeuries will respond in different times depending on the number of countries that hash to a given sub-partition; it means that different sub-partition tablespaces have to be different sizes; it means I have some empty partitions which I can't really get rid of. All these are relatively small problems for which a work-around is possible (make the column upper case , for example, and see what happens to the hash function then). If you have a larger set of values (100 say), you might decide to create only 8 sub-partitions, and then you will probably find a much better balance of ca. 12(.5) values per sub-partition.


SQL> 
  1  select
  2     partition_name, subpartition_name, num_rows, blocks
  3  from
  4     user_tab_subpartitions
  5  order by
  6     subpartition_name, partition_name
  7* 
SQL> /
PARTITION_NAME           SUBPARTITION_NAME         NUM_ROWS    BLOCKS
------------------------ ------------------------ --------- ---------
P_1999_JAN               SYS_SUBP17                       0         0
P_1999_JAN               SYS_SUBP18                    2603        49
P_1999_JAN               SYS_SUBP19                    5206        99
P_1999_JAN               SYS_SUBP20                    2603        49
 
P_1999_FEB               SYS_SUBP21                       0         0
P_1999_FEB               SYS_SUBP22                    2347        44
P_1999_FEB               SYS_SUBP23                    4694        89
P_1999_FEB               SYS_SUBP24                    2347        44
 
P_1999_MAR               SYS_SUBP25                       0         0
P_1999_MAR               SYS_SUBP26                    2573        49
P_1999_MAR               SYS_SUBP27                    5146        99
P_1999_MAR               SYS_SUBP28                    2573        49
 
P_1999_APR               SYS_SUBP29                       0         0
P_1999_APR               SYS_SUBP30                    2490        49
P_1999_APR               SYS_SUBP31                    4980        96
P_1999_APR               SYS_SUBP32                    2490        44
 
P_1999_MAY               SYS_SUBP33                       0         0
P_1999_MAY               SYS_SUBP34                       0         0
P_1999_MAY               SYS_SUBP35                       0         0
P_1999_MAY               SYS_SUBP36                       0         0
20 rows selected.
SQL> spool off

Having prepared a table with data (and rebooted my machine), the acid test was - what happens if we try to read all the data for a single country.

If we write a query for all the sales in January, we know from previous experience of range partitioning that the Feb, Mar, Apr, and May partitions will be completely ignored by the optimiser. The question is, if we put in a query for Wales only with no date attached will the optimiser use the hash-partitioning column to avoid checking any of the data in the non-Welsh subpartitions.

The table below (a simple query on V$SQL) shows the results of querying each country in turn:


SQL> l
  1  select executions, disk_reads, buffer_gets, rows_processed, sql_text
  2  from v$sql where sql_text like '%count%sales%'
  3* and sql_text not like '%sql%'
SQL> /
EXECUTIONS DISK_READS BUFFER_GETS ROWS_PROCESSED 
---------- ---------- ----------- --------------
SQL_TEXT
-----------------------------------------------------
         1        393         444              1     
select count(*) from sales where country = 'England' 
 
         1        196         211              1     
select count(*) from sales where country = 'Ireland' 
 
         1          0         403              1     
select count(*) from sales where country = 'Scotland'
 
         1        191         206              1     
select count(*) from sales where country = 'Wales'   
 
SQL> spool off

As you can see, the work done by each query is indeed the minimum needed to scan ONLY the 4(5) sub-partitions which hold data for the country in question. Of course this report is not the only thing I based my conclusion on - a full trace at level 12 showing bind variables and waitstats (including file and block ids), repeated tests using parallel execution, etc. demonstrated that the optimizer was treating this table as a 2-dimensional array of segments where I had the option of selecting an entire row of segments (date range), or an entire column of segments (country), or a single segment by specifying a date range and a country.

The optimizer even behaved correctly when I set up a bind variable and fired off the query:

        select count(*) from sales where country = :b1

Obviously there are plenty more tests to do to find out when the optimiser gets over-stressed and doesn't get the best path - what happens, for example if I put supply the predicate:

               where 
                       country in ('England', 'Ireland')
               and     date in (to_date('1-jan-1999'), to_date('1-Jan-2000');

but the fact that it works at all makes it possible for a couple more clients to consider seriously the option for moving from Oracle 7 to Oracle 8.


Addendum 23rd June 1999 - Predicting hash subpartitions to be used.

My problem with the specific example was finding out too late that with 4 countries and 4 partitions I was unlucky in the hash function distribution. Until I had created the data, I did not realise that England and Scotland would collide.

There is a way around this problem so that you can get an idea of how well string data will hash BEFORE you start loading it. Oracle has supplied the pacakged function dbms_utility.get_hash_value to hash a string to a number. I believe that this function is the function used internally to do the job for string based subpartitions.

Definition:

        function dbms_utility.get_hash_value(
               name           varchar2,
               base           number,
               hash_size      number
        ) return number;

In the case of 4 subpartitions (which are internally numbered from 0 to 3, though externally numbered from 1 to 4), my base hash value is 0, and the size of the hash table I want to produce is 4, so I would call the function as follows:

        dbms_output.put_line(dbms_utility.get_hash_value('England',0,4));

and discover that England and Scotland hash to 2, Ireland hashes to 1, and Wales hashes to 3.

With a little cunning, you may be able to 'adjust' your legal list of column values to improve the distribution of data before you let end-users on to the system.


Back to Main Index of Topics