JL Computer Consultancy

Log file parallel write waits

Sept 2005

Some time ago, I wrote a note about the difficulties of using a system-level snapshot to determine the cause of performance problems. In that note I described how the event log file sync could look like a threat when it wasn’t really a threat; and in general you ought to check if the log file sync waits were really a consequence of having multiple processes all waiting on the same log file parallel write.

I recently came across four cases where I was happy to trot out this advice. (One on AskTom, one on the comp.databases.oracle.server newsgroup, and twice with clients). However, on the client sites, I had high-level access to the database, and neither of the other common reasons for excessive log file sync wait times seemed to be true.

Luckily, client number two called me whilst I was still thinking about the problem that I had seen at client number one. When you’ve got two people asking the same question, and everything you know tells you that what you’re seeing can’t be happening, this is a clear warning that there’s something you don’t know. (When the facts contradict the theory, it’s the theory that’s wrong).

So I went right back to square one, and built a totally simplistic model of their most critical process on a PC. (Hyperthreading, XP Pro, The problem instantly resolved itself – it was an Oracle bug, and the log writer (lgwr) was concealing the truth.

Here’s my test case:

drop table t1;


create table t1 (

n1 number(5),

v1 varchar2(100)


pctused 1

pctfree 99



insert into t1 values (1, rpad('x',100,'x'));



alter system switch logfile;

execute dbms_lock.sleep(3)


execute sys.dbms_system.kcfrms


execute snap_system_events.start_snap

execute snap_events.start_snap


update t1 set v1 = upper(v1) where n1 = 1;



-- repeated for a total of 2,000 updates


update t1 set v1 = upper(v1) where n1 = 1;



execute snap_events.end_snap

execute snap_system_events.end_snap

The package snap_events is a little utility I use to take a snapshot of v$session_event for my session, and find differences between a starting and ending snapshot. The package snap_system_events does the same for v$system_event.

The log file switch, with a three second pause, is to ensure that I don’t hit a log file switch in the middle of the test and confuse the results; the pause was enough to make sure that the database writer has finished clearing the resulting checkpoint.

The call to dbms_system.kcfrms resets the max_wait_time values on the session events, system events (where it’s a hidden x$ column) and file statistics. It’s a very convenient facility, but be careful about using it on systems with a very large number of users – there are a lot of memory updates it has to do.

Here’s a typical sample of the critical two lines of output (relating to v$system_event) from,, and running on the same machine. Spot the critical figures before you read on.


Interval:- 9 seconds


Event Waits Time_outs Csec Avg Csec Max Csec

----- ----- --------- ---- -------- --------

log file parallel write 2,000 0 78 .039 3

log file sync 2,000 0 175 .088 3


Interval:- 9 seconds


Event Waits Time_outs Csec Avg Csec Max Csec

----- ----- --------- ---- -------- --------

log file parallel write 2,024 0 19 .009 0

log file sync 2,000 0 85 .043 2


Interval:- 8 seconds


Event Waits Time_outs Csec Avg Csec Max Csec

----- ----- --------- ---- -------- --------

log file parallel write 4,000 0 87 .022 4

log file sync 2,000 0 89 .044 4

The giveaway is in the column “Max Csec”: Maximum time in centiseconds for a single wait in the interval. Luckily I was running a laptop, on batteries, with a slow disc drive, or this anomaly might not have been so obvious. Given that I was the only person using the machine, there was no way that the log file sync wait times could be seriously affected by anything other than log file parallel write times. In 8i and 10g, we see the maximum wait time for a log file sync mirrors exactly the maximum wait time for a log file parallel write. In 9i, we have (and I kept seeing) maximum log file sync times that were dramatically larger than the maximum log file parallel write times. The figures for the average wait times are a little suspect – but the figures for the maximum wait times are very suggestive. The log writer seems to be lying in 9.2.

(Note: the double count on the 10g writes may be associated with having two shared redo threads – the system statistics only recorded 2,000 redo writes, and the redo log files weren’t duplexed, and I don’t yet know why else I would see 4,000 waits – possibly the log writer does something with both threads when a synch write request is received, and records two waits even though only one write took place).

Up until this point, I had only been looking for issues with log file sync times, as I “knew” that my problem had nothing to do with the log writer. Now I had a clue that pushed me in a different direction – a very strong hint that log writer was lying – so I did a quick search on Metalink, using the advanced search, with the exact phrase “log file parallel write”, and checking the Bug Database only. The item that came out on the top of the list was this one:

Bug No.







Oracle Server - Enterprise Edition

Product Version


Linux x86

Platform Version

2.1 AS

Database Version

Affects Platforms



Severe Loss of Service


Closed, Verified by Filer

Base Bug



Fixed in Product Version

Problem statement:


There is a known bug in 9.2, the log writer does not report its wait-time correctly. The bug was filed in November 2002, and is fixed in 10g.

If you’ve ever been in the habit of discounting the log writer as a source of performance problems because the log writer times are good – think again. Any timings from the log writer are not to be trusted in 9.2. If you see high average wait times on log file sync, this may be the only way you can infer that you might have an I/O problem in the log writer.

When I found this bug, I did wonder if it was out of date. Sometimes bugs get fixed prematurely and the “Fixed in Product Version:” markers on the database don’t reflect the true status, so I sent in an iTAR asking if the bug was still live, whether it really applied to all platforms, and if there were any back-ports for the 10g fix.

I was very impressed with the turn-around - I got a reply in less than two hours.

        Yes, the problem still exists in It's fixed in 10g.

        The <Bug 2669566> is generic, so it's applicable to any platform.

        There is no backport patch for this bug on top of A while ago there was a request to create a backport fix on top of for Solaris, but after Development's consideration it was rejected on the ground that it may introduce instability in one of the core database modules.

I have a great deal of sympathy with the justification in the third part of the reply – but that doesn’t make me any happier about the fact that I don’t know what log writer is doing, and in general I have no cost-effective, low-impact, safe, way of finding out.

Back to Index of Topics