JL Computer Consultancy

Handling massive updates ( ? A Y2K trick on Oracle 7 ?)

May 1999


Given the type of work I do I never expected to have anything to do with a Year 2000 project. However I have to sit somewhere when I visit a client and recently I found myself sitting opposite someone who was busy writing code in preparation for updating a large Oracle system that had used a character string of the form YYMMDD to hold a date.

Since I was there he described the problems he had been having and asked if I had any solution. This is what I came up with.

The Problem:

A table of ca. 250,000,000 rows with a varchar2() column that needs to be changed from YYMMDD to 19YYMMDD. There is space in the table (PCTFREE) to allow this change to take place without causing row migration.

There is not enough space in the database to allow rollback segments to grow large enough for a single update with a single commit to correct all the rows.

There is not enough space in the database to do a 'create as select unrecoverable'.

Updates which use a loop to update some rows and commit are too slow for various reasons, such as reduced concurrency, increased I/O, single row calls etc.

One Solution:

To get maximum concurrency, minimum I/O, and minimum contention between parallel processes I emulated the Oracle Parallel Query strategy with statements like:

update 
        /*+ rowid */ 
        tableX
        set char_date = '19' || char_date
        where   nvl(length(char_date),0) = 6
        and     rowid between  chartorowid('000927.0000.0008') 
                         and   chartorowid('000A3F.FFFF.0008');

The/*+ rowid */ hint makes Oracle use a special addressing mode that ensures only a block range including the required start and stop rowids is visited (without the hint, and prior to 7.2-ish the query would have resulted in a full tablescan).

The big trick then is to produce a set of rowid ranges that completely cover the table without any overlaps. Once this is done, any number of concurrent processes can be run, each with a separate rowid range; I/O will be kept to a minimum; there will be no (base table) contention between processes; finally, with sufficient rollback segments declared there will be minimum rollback contention. The only outstanding problem will be the redo log contention.

The Strategy:

There are two key elements to the strategy - the first is generating a set of rowid ranges, the second is a method for using a list of controlling values to run a scalable set of concurrent processes. The rest of this note deals with the first problem, the second problem will be the subject of a separate article.

To generate a list of rowid ranges (and remember that this is for Oracle 7 only), you start with the list of DBA_EXTENTS for an object, and derive a pair of hexadecimal rowid ranges for each extent.

A row from DBA_EXTENTS includes the columns: FILE_ID and BLOCK_ID, which tell you the starting block for the extent, and BLOCKS which tells you the length of the extent.

A rowid looks like: block_id.row_index.file_id, e.g. 00001F23.0003.0006, so to get from DBA_EXTENTS to a rowid range you need to calculate the block_id for the last block in the extent (which will be BLOCK_ID + BLOCKS - 1), convert the file id and block id to a hexadecimal form, and then insert a 'row number 0' for the first block, and a 'row number 0x7FFF' for the trailing row in the last block. (Oracle is not bothered by missing rows in the range). As an added refinement, you could choose to break very large extents into a series of consecutive rowid ranges.

A PL/SQL function to convert decimal to hex is described in a separate note.

Once you can produce a list of rowid ranges, the next step is to write them into a table so that a another process (or multiple concurrent copies of another process) can acquire a control row from this table, and use the content to update the base table. The detail of this 'control table' strategy that will be given in a later note

Back to Main Index of Topics .


The package:

rem
rem     Script:        y2k_tools.sql
rem     Author:        Jonathan Lewis
rem     Dated:         Feb-1999
rem     Purpose:       Generate list of rowid ranges for an extent
rem     
rem     Notes:
rem     This is STRICTLY version 7
rem
rem     The owner of the procedure has to have access to view
rem     DBA_EXTENTS as a directly granted privilege.
rem
rem     The code converts object names to capitals, so will fail
rem     for objects which have managed to get their names into
rem     lower case.
rem
rem     There is no type checking for objects;  you should ensure
rem     that you supply a table name that is NOT a clustered table.
rem
create table rowid_control (
        owner_name     varchar2(32),
        segment_name   varchar2(32),
        rowid_start    rowid,
        rowid_end      rowid,
        status         char(1)
);
create or replace package y2k_tools as
        procedure generate_rowid_ranges (
               i_owner_name   in      varchar2,
               i_segment_name in      varchar2,
               i_chunk_size   in      number  default 160
        );
end;
.
/
create or replace package body y2k_tools as
--
--      Private function to turn a start block and block count
--      into starting and ending rowids for that range
--
procedure generate_range (
        i_file_id      in     number,
        i_block_id     in     number,
        i_blocks       in     number,
        o_start           out rowid,
        o_end             out rowid
) is
begin
        o_start :=  chartorowid(
                lpad(jpl_utils.decimal_to_hex(i_block_id),8,'0') || 
                '.0000.' || 
               lpad(jpl_utils.decimal_to_hex(i_file_id),4,'0') 
        );
        o_end := chartorowid(
                lpad(jpl_utils.decimal_to_hex(
                                i_block_id + i_blocks - 1
                     ),8,'0'
                ) || 
                '.7FFF.' || 
               lpad(jpl_utils.decimal_to_hex(i_file_id),4,'0')
        );
end generate_range;
procedure generate_rowid_ranges (
        i_owner_name   in      varchar2,
        i_segment_name in      varchar2,
        i_chunk_size   in      number  default 160
) is
        cursor c1 (v_owner_name varchar2, v_segment_name varchar2) is
        select 
               block_id, file_id, blocks
        from 
               dba_extents
        where
               segment_name = upper(v_segment_name)
        and     owner = upper(v_owner_name)
        ;
        m_tmp   c1%rowtype;
        m_rowid_start  rowid;
        m_rowid_end    rowid;
begin
        for r1 in c1(i_owner_name, i_segment_name) loop
               m_tmp := r1;
               while (m_tmp.blocks >= i_chunk_size ) loop
                       generate_range(
                               i_file_id  => m_tmp.file_id,
                               i_block_id => m_tmp.block_id,
                               i_blocks   => i_chunk_size,
                               o_start    => m_rowid_start,
                               o_end      => m_rowid_end
                       );
                       dbms_output.put_line(
                               m_rowid_start  || ' - ' || m_rowid_end
                       );
                       insert into rowid_control(
                               owner_name,
                               segment_name,
                               rowid_start,
                               rowid_end,
                               status
                       )
                       values (
                               i_owner_name,  
                               i_segment_name,
                               m_rowid_start,
                               m_rowid_end,
                               'N'
                       );
                       m_tmp.blocks := m_tmp.blocks - i_chunk_size;
                       m_tmp.block_id := m_tmp.block_id + i_chunk_size;
               end loop ;
               if (m_tmp.blocks != 0) then -- pick up the last bit
                       generate_range(
                               i_file_id  => m_tmp.file_id,
                               i_block_id => m_tmp.block_id,
                               i_blocks   => m_tmp.blocks,
                               o_start    => m_rowid_start,
                               o_end      => m_rowid_end
                       );
                       dbms_output.put_line(
                               m_rowid_start  || ' - ' || m_rowid_end
                       );
                       insert into rowid_control(
                               owner_name,
                               segment_name,
                               rowid_start,
                               rowid_end,
                               status
                       )
                       values (
                               i_owner_name,  
                               i_segment_name,
                               m_rowid_start,
                               m_rowid_end,
                               'N'
                       );
               end if;
        end loop;
end generate_rowid_ranges;
end y2k_tools;
.
/

Back to Main Index of Topics


Example showing the use of the package:

set serveroutput on size 100000
execute y2k_tools.generate_rowid_ranges('jpl','big_junk',320);

0001D6BB.0000.0022 - 0001D7BE.7FFF.0022
0001D7BF.0000.0022 - 0001D8C2.7FFF.0022
0001D8C3.0000.0022 - 0001D9C6.7FFF.0022
0001D9C7.0000.0022 - 0001DACA.7FFF.0022
0001DACB.0000.0022 - 0001DBCE.7FFF.0022
0001DBCF.0000.0022 - 0001DCD2.7FFF.0022
0001DCD3.0000.0022 - 0001DDD6.7FFF.0022
0001DDD7.0000.0022 - 0001DEDA.7FFF.0022
0001DEDB.0000.0022 - 0001DFDE.7FFF.0022
0001DFDF.0000.0022 - 0001E0E2.7FFF.0022
0001E0E3.0000.0022 - 0001E1E6.7FFF.0022
0001E1E7.0000.0022 - 0001E2EA.7FFF.0022
0001E2EB.0000.0022 - 0001E3EE.7FFF.0022
0001E3EF.0000.0022 - 0001E4F2.7FFF.0022
0001E4F3.0000.0022 - 0001E5F6.7FFF.0022

At the same time, the rows above were also inserted into the rowid_control table. To demonstrate that the code has produced a range of rowids that covers the table completely without overlap, I then ran the script below to show that I could use the list of rowid ranges to count the rows in the table

Whilst I have only done a 'select count(*)' in the script below, you could equally well have written an update statement. In a further article I will demonstrate a general purpose table-based parallel allocation mechanism that you can use to control concurrent processes. This will use the rowid_control table to identify sections of the table to update..


rem
rem     get_test.sql
rem
select count(*) from jpl.big_junk;
set serveroutput on
declare
        cursor c1 is
        select rowid_start, rowid_end
        from rowid_control
        where owner_name = 'JPL'
        and segment_name = 'BIG_JUNK';
        m_temp  number := 0;
        m_count number := 0;
begin
        m_count := 0;
        for r1 in c1 loop
               select /*+ rowid */
               count(*) into m_temp
               from jpl.big_junk
               where rowid between r1.rowid_start and r1.rowid_end
               ;
               m_count := m_count + m_temp;
        
        end loop;
        dbms_output.put_line(m_count);
end;
.
/

Back to Main Index of Topics