|Author's name: Andrew Deighton
Author's Email: firstname.lastname@example.org
|Date written: May 2003
Oracle version(s): All
|The time taken to load a given number of rows into a table can increase as the size of the table grows. This behaviour has been seen in all versions of the Oracle server.|
Back to index of questions
The best way to describe what is happening here is with an example. First create two tables and populate them as follows:
create table SOURCE (KEY number); begin for I in 1 .. 100000 loop insert into SOURCE (KEY) values (I); end loop; commit; end; / create table DEST (KEY number);
We can now time the insert of 20 million rows into the DEST table using the following PL/SQL block:
declare offset number; start_time date; time_taken number; begin offset:= 0; for I in 1 .. 200 loop start_time:= sysdate; insert into DESTINATION (KEY) select KEY+offset from SOURCE; time_taken:= (sysdate-start_time)*24*60*60; dbms_output.put_line('Loop '||to_char(I)||' ending at '||to_char(sysdate,'HH24:MI:SS')|| ' after '||to_char(time_taken)||' seconds.'); end loop; end; /
This loop produces output that shows how long it takes to load
100000 rows into the table. Ideally we would like the last 100000
rows to be inserted in roughly the same amount of time as the
first 1000000 rows. In this case, this is exactly what we find -
that is, there is no slow down as more rows are loaded.
Perhaps this is because we are inserting known data values in order? As a second attempt, we can modify the insert statement in line 9 of the above procedure to read:
insert into DESTINATION (KEY) select dbms_random.random from SOURCE;
Looking at the times taken for each 100000 row insert we notice that although the time is slightly longer on average than it wasfor the first test, there is still no noticeable increase in time through the iterations in the loop. The extra time taken being the overhead of the call to dbms_random.
So, maybe we need an index on the DESTINATION table in order to see the effect that inserting many rows into the table has on performance. Create an index on the table as follows:
create index IX1 on DESTINATION (KEY);
and then rerun the PL/SQL block that we ran earlier. Perhaps surprisingly in this case, there is still no noticeable increase in the amount of time taken to load 100000 rows at the end of the procedure compared to at the beginning.
As a last resort, we can run the second PL/SQL block, inserting a random number into the table. At last, we come to a point where it takes longer to insert the last 100000 rows than it did to insert the first 100000 rows. (In my test, an increase in time of over 5000%!) This indicates that the presence of an index on a column that contains values in a random order is likely to be the culprit that is slowing the inserts.
The reason for this slow down can be explained if the test is modified to provide statistics on the number of reads and writes that are performed on the table and index during the load. As the table is initially empty when the load starts, there is no need to read from the disk - new blocks are simply created internally and mapped onto the data file. There are, however, disk writes as the database writer flushes dirty buffers to the disk, making space for new blocks to be created. Initially this is true for the index as well. However, after about 1 million rows have been inserted, the number of reads from the index starts to increase. In my tests, the insert of the last 100000 rows into the table required over 80000 disk reads from the index segment! (And 3 from the table.)
The reason for the slow down is therefore clear. Up to about a million rows, the index is completely held within the buffer cache. In order to insert a new entry into the index, Oracle can scan a bunch of blocks that are in memory and insert the row. Once the index grows beyond the size available in the block buffers, some of the blocks needed for the index maintenance may not be in memory, and will need to be read. However, in order to red them, Oracle first needs to create space for them, by clearing out old blocks, which in turn may be needed for a later insert. This thrashing of blocks in and out of memory is the source of the performance issue.
This also explains why the index did not show the same characteristics when the rows were inserted in order - once a leaf block had been written it would not be required again, and so the reads from the index were not required.
In order to try to relieve the problem, the insert statement can be modified to read:
insert into DESTINATION (KEY) select dbms_random.random from SOURCE order by 1;
This forces each 100000 rows to be inserted in numeric order, progressing through the index from start to finish. Although this does not completely get rid of the problem, on my (rather slow) test system, the time to load 20million rows was reduced by over 28 hours (a saving of 29%)!
The ideal solution is of course to drop the indexes on a table before running a large load and recreate them after the load has completed.
It is worth noting that this characteristic is only observed when inserting large numbers of rows into the table using a batch system when there is little other activity on the database. Inserting a single row into a table with 20 million rows on a busy database will not take significantly more time than inserting the same row into an empty table, as the chances of the necessary blocks being in the buffer cache are equally random depending on the use that has been made of the database in the recent past.
Further reading: n/a
Back to top
Back to index of questions