JL Computer Consultancy

Dimensions and summary management in Oracle 8.1.5

October 2001


Addendum: 26th Oct 2001: Added a note from a reader about fixing the 'UNDEFINED' column in the XXX_OBJECTS view.

One of the exciting new features of Oracle 8.1.5 is the summary management capability. Snapshots have been assimilated into 'materialized views', and a 'query rewrite' facility exists that allows Oracle to determine that a query against a large table can be converted into a similar query against a smaller summarising snapshot or 'materialized view', so that when a user types:

        select department, year, sum(daily sales) 
        from enormous_daily_sales_tables 
        group by department, year;

Oracle may actually operate the equivalent of:

        select department, year, sum(monthly sales) 
        from small_monthly_sales_tables 
        group by department, year;

This article outlines the method using a small, but not entirely trivial, example. It starts with the table created for an earlier article on 2-dimensional partitions partly so that I do not have to include in this article a section on creating an initial 'large' table, and partly to demonstrate that partitioning and materialized views can be mixed.

In the various datawarehouse environments I have designed, I have found that there is sometimes a physical arrangement of data that can make the performance of a critical set of queries significantly faster- basically we can always consider physically sorting the data along one dimension to expedite queries in that dimension - consequently I am not keen on the idea of letting Oracle's built-in 'snapshot refresh' processes take control of rebuilding summary tables, so this example of using materialized views limits itself to referencing a pre-built table. Later articles will cover some of the features of refreshing, and the packages supplied by Oracle to add value and convenience to materialized views and datawarehouse management.

Preparing the groundwork:

My original sample table in the 2-d partitioning example was a daily sales table, with a key of (product, sales_date), in this article I am going to assume that products sum to product_groups, and product_groups sum to product_depts, so we have a hierarchy of: product_dept --- product_group --- product.

I will be demonstrating that query rewrite will work in two different scenarios - first if the heirarchy is denormalised into a single dimensional table, and secondly if the hierarchy is left in a perfectly normalised form. The attached script will create 4 separate tables (with constraints and indices):

        product_hierarchy      the denormalised hierarchy
        products               the table of products
        groups                 the table of product_group, parent to products
        departments            the table of product_dept, parent to product_group

Once you have created these tables, you then need to create a summary table at the product_group level by executing the following:

                    CREATE TABLE sales_sum
        unrecoverable
        PARTITION BY RANGE (sale_date)
        (
               PARTITION p_1999_jan VALUES LESS THAN (TO_DATE('01-Feb-1999', 'dd-mon-yyyy')),
               PARTITION p_1999_feb VALUES LESS THAN (TO_DATE('01-Mar-1999', 'dd-mon-yyyy')),
               PARTITION p_1999_mar VALUES LESS THAN (TO_DATE('01-Apr-1999', 'dd-mon-yyyy')),
               PARTITION p_1999_apr VALUES LESS THAN (TO_DATE('01-May-1999', 'dd-mon-yyyy')),
               PARTITION p_1999_may VALUES LESS THAN (maxvalue)
        )
        as
        select
               sal.country,
               sal.sale_date,
               hir.product_group,
               sum(sal.qty)           qty,
               sum(sal.value)         value,
               store
        from
               sales                  sal,
               product_hierarchy      hir
        where
               hir.product = sal.product
        group by
               sal.country,
               sal.sale_date,
               hir.product_group,
               store
        ;

The Materialized View and a simple Dimension:

At this point, we need to do two things. Tell Oracle that we want to use the table we have just created as a materialized view that may be used with the Query Rewrite facility, and tell Oracle how the product_hierarchy table (or normalised equivalent) may be used when joining to the materialized view. The two different approaches have to be handled separately, so I will take the simpler denormalised case first.

Before starting this section, though, you will need to ensure that you have the privileges:

        create dimension
        create materialized view
        query rewrite

Then execute the following script to create the materialized view

        rem     The materialized view name must match the table name.
        rem     The table could have columns not mentioned in this view
        rem     The order of the various clauses after the 'create' IS important.
        
        create materialized view sales_sum
        on prebuilt table              -- the table already exists
        with reduced precision         -- allow column precision mismatch
        never refresh                  -- we rebuild it ourselves
        enable query rewrite           -- the point of the exercise 
        as
        select
               sal.country,
               sal.sale_date,
               hir.product_group,
               sum(sal.qty)   qty,
               sum(sal.value) value,
               store
        from
               sales                  sal,
               product_hierarchy      hir
        where
               hir.product = sal.product
        group by
               sal.country,
               sal.sale_date,
               hir.product_group,
               store
        ;

One odd consequence of this statement (on 8.1.5) is that my schema ended up with two objects named SALES_SUM, one of them a table, the other of type UNDEFINED - presumably a minor error in the view definition that failed to cater for the object type code for MATERIALIZED VIEW. (Oct 2001 - I have had a note from Aditya.Srivastava@xansa.com pointing out that this is Oracle bug number 1188948. The solution is to change the definitions of the views, DBA_OBJECTS, ALL_OBJECTS, and USER_OBJECTS to include an extra line in the main DECODE() for: 42, 'MATERIALIZED VIEW' )

After the Materialized view, we create the dimension with this script-

        rem     Databases operate on meaningless ids, but users
        rem     operate on names, so we have to associated things like
        rem     product_dept (meaningless pk) with department_name
        
        create dimension product_dim 
               --  first bit lists the levels
               level   product        is product_hierarchy.product
               level   prd_group      is product_hierarchy.product_group
               level   department     is product_hierarchy.product_dept
               -- second bit describes the relationships in one possible hierarchy
               hierarchy product_roll_up(
                       product        child of
                       prd_group      child of
                       department
               )
               -- third bit identifies extra columns in a given level
               -- the 'attribute' is actually the name of a level
               attribute product determines (prod_name)
               attribute prd_group determines (group_name)
               attribute department determines (dept_name)
        ;

Then to test the effect of our efforts so far we execute a query against the original (product-only table). However, before we can use query rewrite, we have to set some init.ora parameters (in my case at the session level) to allow it work.

        alter session set query_rewrite_enabled = true;
        alter session set query_rewrite_integrity = trusted;

The first line allows query rewrite to work, the second tells Oracle to assume that a user-maintained view - which Oracle has not been keeping up to date with its own refresh mechanisms itself - will really be okay. If we then execute and trace the following statement we get a pleasantly quick and cheap result -

        select 
               hir.group_name,
               sum(sal.value) value
        from
               sales                  sal,
               product_hierarchy      hir
        where
               hir.product = sal.product
        and     hir.product_group = 2
        group by
               hir.group_name
        ;

in the absence of the materilized view, this query required several thousand logical I/Os, and accessed the tables as listed, but with the query rewrite available, the actual run-time resource usage was 121 logical I/Os, and the plan turned into:

        SELECT STATEMENT Optimizer=CHOOSE (Cost=43 Card=4 Bytes=332)
          SORT (GROUP BY) (Cost=43 Card=4 Bytes=332)
            MERGE JOIN (CARTESIAN) (Cost=43 Card=1465 Bytes=121595)
              VIEW (Cost=3 Card=5 Bytes=260)
                SORT (UNIQUE) (Cost=3 Card=5 Bytes=190)
                  TABLE ACCESS (FULL) OF 'PRODUCT_HIERARCHY' (Cost=1 Card=5 Bytes=190)
              SORT (JOIN) (Cost=43 Card=293 Bytes=9083)
                PARTITION RANGE (ALL)
                  TABLE ACCESS (FULL) OF 'SALES_SUM' (Cost=8 Card=293 Bytes=9083)

Oracle rewrote the query to find uniquely the list of required product_groups (in this case just the one) and group_names from the product_hierarchy table, then joined this intermediate result to the materialized view to produce the final result.

The Materialized View and a normalized Dimension:

A more complex example appears when we do NOT normalise the hierarchy. There seem to be some limitations in this approach - the examples I got to work required me to generate a more complex materialized view than should have been necessary. (Don't forget to drop the objects from the previous experiment first).

        create materialized view sales_sum
        on prebuilt table              -- the table already exists
        with reduced precision         -- allow column precision mismatch
        never refresh                  -- we rebuild it ourselves
        enable query rewrite           -- the point of the exercise 
        as
        select
               sal.country,
               sal.sale_date,
               grp.product_group,
               sum(sal.qty)   qty,
               sum(sal.value) value,
               store
        from
               sales                  sal,
               products               prd,
               groups                 grp
        where
               prd.product = sal.product
        and     grp.product_group = prd.product_group
        group by
               sal.country,
               sal.sale_date,
               grp.product_group,
               sal.store
        ;

The corresponding Dimension is also more complex, including some JOIN KEY clauses:

rem
rem     It may appear a little confusing that the JOIN KEY clause is:
rem            join key {child table.column list} references {parent level}
rem
create dimension product_dim
        level   product        is products.product
        level   prd_group      is groups.product_group
        level   department     is departments.product_dept
        hierarchy product_roll_up(
               product        child of
               prd_group      child of
               department
               --      join columns have to be in a child table of the reference level
               join key products.product_group references prd_group
               join key groups.product_dept references department
        )
        --      listed columns have to be in the table defined in the attribute/level
        attribute product determines (products.prod_name)
        attribute prd_group determines (groups.group_name)
        attribute department determines (departments.dept_name)
;

The sample SQL I ran looked like this:

        alter session set query_rewrite_enabled = true;
        alter session set query_rewrite_integrity = trusted;
        
        select  
               dep.dept_name,
               grp.group_name,
               sum(sal.value) value
        from
               sales                  sal,
               products               prd,
               groups                 grp,
               departments            dep
        where
               prd.product = sal.product
        and     grp.product_group = prd.product_group
        and     dep.product_dept = grp.product_dept
        group by
               dep.dept_name,
               grp.group_name
        ;

The results were very pleasing: without rewrite, the total cost was some 33,000 logical I/Os, 22,000 physicals and took about 49 seconds to complete. with rewrite the query took 180 logicals, 48 physicals, and completed in 4.5 seconds.

My only concern was that I had to include the GROUPS table in the materialized view in order to make this work. I felt that it should have been sufficient to include only the PRODUCTS table, since the dimension declared products.product_group to be equivalent to groups.product_group. I still need to spend more time tracing the internal execution to figure out why this is not working.


Back to Main Index of Topics