2014-10-07

enhanced SQL trace

As a huge friend of Oracles SQL trace I also see it's limitations as well. One of these limitations is visible in it's underlying infrastructure: the wait interface. It's only about waits.
There is another source of information as well, you could call it the DO interface, but the common term is session and system statistics. there Oracle counts what it is doing.
Unfortunately I do not know a simple way to show both realities in a common way.

At least for consistent gets I now can show (most of) them together with wait events.

The magic is done by event 10200 - Dump Consistent Reads.

I create a trace file with both, 10046 and 10200 event:


SET AUTOTRACE ON STATISTICS

ALTER SESSION SET EVENTS '10200 TRACE NAME CONTEXT FOREVER, LEVEL 1';
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';

select * from berx.t1;

Statistics
----------------------------------------------------------
         80  recursive calls
          0  db block gets
        118  consistent gets
          0  physical reads
          0  redo size
        654  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         15  sorts (memory)
          0  sorts (disk)
          3  rows processed

Of course the additional event gives additional lines which will not be processed by those tools to parse SQL traces. I will not go into any detail of those additional lines - especially as I don't know about them to much. If you are interested I'd recommend Charles Hoopers blog.

I just grab lines which contain the string completed for block and rework their values so it looks similar to something the SQL trace parsers know. This is a very inexperienced piece of code!

#!/usr/bin/perl -w
#10200_preproc.pl
use strict;

my $cursor       = 0 ;
my $timestamp    = 0 ;
my $is_processed = 0 ;
my $tablespace;
my $block;
my $object_id;

while (<>) {
  my $line = $_;
  if ($line =~ m/.*#(\d+)[: ].*/) {
    ($cursor = $line) =~ s/.*#(\d+)[: ].*/$1/ ;
  }
  $cursor =~ tr/\n//d;

  if ($line =~ m/.*tim=(\d+).*/) {
    ($timestamp = $line) =~ s/.*#(\d+)[: ].*/$1/ ;
  }
  $timestamp =~ tr/\n//d;

  if ($line =~ m/completed for block/) {
    ($tablespace = $line) =~ s/.*\<([abcdefx\d]+) : .*/$1/ ;
    ($block      = $line) =~ s/.* : ([abcdefx\d]+).*/$1/;
    ($object_id  = $line) =~ s/.*objd: ([abcdefx\d]+).*/$1/;
    $tablespace =~ tr/\n//d;
    $block      =~ tr/\n//d;
    $object_id  =~ tr/\n//d;
    $tablespace = hex($tablespace) +1;
    $block      = hex($block);
    $object_id  = hex($object_id);
# WAIT #391723720: nam='db file scattered read' ela= 2876 file#=5 block#=129 blocks=127 obj#=71123 tim=90982740602
    print "WAIT #"; # as the hash is somethign special ...
    print "$cursor: nam='LIO - completed for block' ela=0 tablespace=$tablespace block#=$block obj=$object_id obj#=$object_id tim=$timestamp\n";
    $is_processed = 1;

  }
  if ($is_processed == 0) {
    print $line;
  }
  else {
    $is_processed = 0
  }

}

This code should really be seen as an example, not a solution!
Out of a line like
ktrgcm(): completed for block  <0x0000 : 0x0041560c> objd: 0x0000003e
it creates
WAIT #47572859450312: nam='LIO - completed for block' ela=0 tablespace=1 block#=4281868 obj=62 obj#=62 tim=47572859450312

Probably the only magic is the Cursor and timing info which I just copy from the last SQL trace line above. I'd as well like not to use the keyword WAIT as for sure the process is not waiting in this case, but I have to still trick the tools.

With my modified tracefile from the statement above I create this report:

CALL-NAME                    DURATION       %  CALLS      MEAN       MIN       MAX
---------------------------  --------  ------  -----  --------  --------  --------
SQL*Net message from client  0.671976   95.8%      3  0.223992  0.000165  0.671529
PARSE                        0.025995    3.7%      1  0.025995  0.025995  0.025995
row cache lock               0.001640    0.2%     12  0.000137  0.000079  0.000497
FETCH                        0.001000    0.1%      2  0.000500  0.000000  0.001000
library cache pin            0.000471    0.1%      2  0.000236  0.000214  0.000257
library cache lock           0.000443    0.1%      2  0.000222  0.000211  0.000232
Disk file operations I/O     0.000047    0.0%      1  0.000047  0.000047  0.000047
SQL*Net message to client    0.000002    0.0%      3  0.000001  0.000000  0.000001
LIO - completed for block    0.000000    0.0%    117  0.000000  0.000000  0.000000
CLOSE                        0.000000    0.0%      3  0.000000  0.000000  0.000000
EXEC                         0.000000    0.0%      1  0.000000  0.000000  0.000000
XCTEND                       0.000000    0.0%      1  0.000000  0.000000  0.000000
---------------------------  --------  ------  -----  --------  --------  --------
TOTAL (12)                   0.701574  100.0%    148  0.004740  0.000000  0.671529

You see 117 LIO - completed for block events here, which is quite close to 118 consistent gets.

There are probably a lot more things to do, still it helps me to show what a statement is doing (in terms of LIO) beside the time it spends waiting.