Oracle 10 added the awesome procedure dbms_xplan.display_cursor but unfortunately the documentation of the package is a bit lacking and the options and output can be confusing, so here are few clarifications.
The procedure display_cursor gives the *real* execution plan instead of an estimated execution plan which is a huge relief after “explain plan”, “set autotrace on” and “dbms_xplan.display” which all output the expected execution plan. Of the the three, “set autotrace on” is the most peculiar as we can actually run the query and yet the output is the expect plan not the actually executed plan. Pheww – what a relief with dbms_xplan.display_cursor.
But, hold on to the seat of your pants, because that’s not all folks. The coolest thing is that display_cursor will output the actual execution statistics for each row source line in the real execution plan. Now that’s cool. But to access these statistics, one has to enable the collection of the statistics.
To use, run a query in SQL*Plus and include the hint
/*+ gather_plan_statistics */
then, immediately following the query execution run:
select * from table(dbms_xplan.display_cursor(null,null,’ALLSTATS’));
Then you will see the following columns in the output (minus the color coding)
A few confusing things about the output. Some of the columns are estimated statistics from the optimizer. Some of the columns are actual statistics from executing the query. And some of the columns are not per row statistics but statistics that include the statistics of the child rows.
The columns E-Rows and A-Rows can be used for Tuning by Cardinality Feedback (TCF). TCF refers to the process of comparing “E-Row” and “A-Rows” to determine whether the optimizer is optimizing correctly. The two columns “E-Rows” and “A-Rows” which represent estimated rows and actual rows can be compared to highlight how close or how far off the optimizers predictions where. If the optimizers predictions are far of, it’s an indication, though not a proof, that the query has been inadequately optimized. To get the optimizer to create an optimal execution path, one can look for ways of bringing the optimizers estimated statistics in line with the actual statistics. See Tuning by Cardinality feedback by Wolfgang Breitling.
In V$sql_plan_statistics these “should” be equal
cardinality ~= output_rows/starts
starts*cardinality ~= output_rows
And in the output above
E-Rows=cardinality
A-Rows=output_rows
Thus in order to compare E-Rows to A-Rows, we have to multiply E-Rows by starts.
Other display_cursor 3rd argument options
There are other options besides specifying “ALLSTATS” in the display_cursor. Here are a list of the options and the fields they show:
By default statistics are total for all executions of the cursor but “LAST” can be appended to the 3rd argument to get the stats for the last execution. NOTE: the last four columns on memory and temp space always show total or last as indicated above. Here is an example of using “LAST”:
select * from table ( dbms_xplan.display_cursor (null,null,’MEMSTATS LAST‘));
Notice that for memory consumption, some of the stats are bytes and some are kilo-bytes.
The above tables are images. The following table is in text for searching and copy/paste.
Arguments giving only optimizer estimates RowsBytesTempSpc CostTimeBASIC null * * * *TYPICAL * * * * *SERIAL * * * * *ALL * * * *ADVANCED * * * *
Arguments that give actual row source executions statistics (the red asterisks are estimates)
(the following two tables show the same arguments, but are broken in two to fit the width) E-RowsstartsA-RowsBuffersReadsWritesA-TimeMEMSTATS * * * *ALLSTATS * * * * * * *RUNSTATS_LAST * LAST LAST LAST LAST LAST LASTRUNSTATS_TOT * TOT TOT TOT TOT TOT TOTIOSTATS * * * * * * *
0Mem1MEM0/1/MUsed-MemUsed-TmpMax-TmpMEMSTATS * * TOTLASTLASTTOTALLSTATS * *TOTLASTLASTTOTRUNSTATS_LASTRUNSTATS_TOTIOSTATS
Arguments for special cases TQIN-OUTPQ DistribpstartpstopInstancePARTITIONXXXPARALLELXXREMOTEX
List of arguments for non-statistical output predicate infonoteQuery Block NameColumn projection infooutlinebind varsBASICnullXXTYPICALXXSERIALXXALLXXXXADVANCEDXXXXXXPREDICATESXNOTEXALIASXPROJECTIONXOUTLINESXPEEK_BINDSX
Enabling extended rows source execution statistics gathering
There are three ways to gather row source executions stats:
/*+ gather_plan_statistics */
sql_trace=true;
Statistics_level=all
The first two are cheaper (sets _rowsource_statistics_sampfreq =128 ) but less accurate than the last option which though more accurate can consume signficantly more CPU (_rowsource_statistics_sampfreq =1).
Manually querying extended row source execution statistics
The data retrieved with display_cursor can be queried directly from
v$sql_plan_statistics
v$sql_plan_statistics_all
desc v$sql_plan_statistics
ADDRESS
HASH_VALUE
SQL_ID
PLAN_HASH_VALUE
CHILD_ADDRESS
CHILD_NUMBER
OPERATION_ID
EXECUTIONS
LAST_STARTS STARTS
LAST_OUTPUT_ROWS OUTPUT_ROWS
LAST_CR_BUFFER_GETS CR_BUFFER_GETS
LAST_CU_BUFFER_GETS CU_BUFFER_GETS
LAST_DISK_READS DISK_READS
LAST_DISK_WRITES DISK_WRITES
LAST_ELAPSED_TIME ELAPSED_TIME
and
desc v$sql_plan_statistics_all
ADDRESS
HASH_VALUE
SQL_ID PLAN_HASH_VALUE CHILD_ADDRESS CHILD_NUMBER
TIMESTAMP
OPERATION
OPTIONS
OBJECT_NODE OBJECT# OBJECT_OWNER OBJECT_NAME OBJECT_ALIAS OBJECT_TYPE
OPTIMIZER
ID PARENT_ID DEPTH POSITION
SEARCH_COLUMNS
COST CARDINALITY BYTES
OTHER_TAG
PARTITION_START PARTITION_STOP PARTITION_ID
OTHER
DISTRIBUTION
CPU_COST IO_COST TEMP_SPACE
ACCESS_PREDICATES
FILTER_PREDICATES
PROJECTION
TIME (seconds estimated by optimizer)
QBLOCK_NAME
REMARKS
OTHER_XML (bind vars and other info)
EXECUTIONS LAST_STARTS STARTS LAST_OUTPUT_ROWS OUTPUT_ROWS LAST_CR_BUFFER_GETS CR_BUFFER_GETS LAST_CU_BUFFER_GETS CU_BUFFER_GETS LAST_DISK_READS DISK_READS LAST_DISK_WRITES DISK_WRITES LAST_ELAPSED_TIME ELAPSED_TIME (microseconds)
POLICY
ESTIMATED_OPTIMAL_SIZE (bytes)
ESTIMATED_ONEPASS_SIZE (bytes)
LAST_MEMORY_USED (bytes)
LAST_EXECUTION (LAST whether work area was optimal, one pass or multi)
LAST_DEGREE
TOTAL_EXECUTIONS (number of times work area was active)
OPTIMAL_EXECUTIONS ONEPASS_EXECUTIONS MULTIPASSES_EXECUTIONS ACTIVE_TIME (centi-seconds, time work area is active)
MAX_TEMPSEG_SIZE (kb - mis-documented as bytes, though correctly bytes in v$sql_workarea )
LAST_TEMPSEG_SIZE (kb - "" , see http://www.antognini.ch/2009/05/wrong-information-about-temporary-space-usage/ ) )
so you can write a query to get the data directly such as
col operation for a45
SELECT
LPAD(' ',depth)||P.OPERATION||'_'||P.OPTIONS||' '||P.OBJECT_NAME operation
, last_starts * cardinality e_rows_x_starts
, last_output_rows a_rows
, LAST_CR_BUFFER_GETS bgets
, LAST_DISK_READS pread
, LAST_DISK_WRITES pwrites
, LAST_ELAPSED_TIME elapsed
, LAST_MEMORY_USED
, LAST_TEMPSEG_SIZE
, LAST_EXECUTION
FROM
V$SQL_PLAN_statistics_all P
WHERE
sql_id='&sql_id'
order by child_number,id
/
which makes the comparison between estimate and actual easier because I can include starts*cardinality to get the e-rows_x_starts which I can compare directly to a-rows, ie output_rows.
Putting it all together
We can take the TCF ideas farther by putting them together in a query such that the output is easier to read:
col cn format 99
col ratio format 99
col ratio1 format A6
--set pagesize 1000
set linesize 140
break on sql_id on cn
col lio_rw format 999
col "operation" format a60
col a_rows for 999,999,999
col e_rows for 999,999,999
col elapsed for 999,999,999
Def v_sql_id=&SQL_ID
select
-- sql_id,
--hv,
childn cn,
--ptime, stime,
case when stime - nvl(ptime ,0) > 0 then
stime - nvl(ptime ,0)
else 0 end as elapsed,
nvl(trunc((lio-nvl(plio,0))/nullif(a_rows,0)),0) lio_ratio,
--id,
--parent_id,
--starts,
--nvl(ratio,0) TCF_ratio,
' '||case when ratio > 0 then
rpad('-',ratio,'-')
else
rpad('+',ratio*-1 ,'+')
end as TCF_GRAPH,
starts*cardinality e_rows,
a_rows,
--nvl(lio,0) lio, nvl(plio,0) parent_lio,
"operation"
from (
SELECT
stats.LAST_ELAPSED_TIME stime,
p.elapsed ptime,
stats.sql_id sql_id
, stats.HASH_VALUE hv
, stats.CHILD_NUMBER childn
, to_char(stats.id,'990')
||decode(stats.access_predicates,null,null,'A')
||decode(stats.filter_predicates,null,null,'F') id
, stats.parent_id
, stats.CARDINALITY cardinality
, LPAD(' ',depth)||stats.OPERATION||' '||
stats.OPTIONS||' '||
stats.OBJECT_NAME||
DECODE(stats.PARTITION_START,NULL,' ',':')||
TRANSLATE(stats.PARTITION_START,'(NRUMBE','(NR')||
DECODE(stats.PARTITION_STOP,NULL,' ','-')||
TRANSLATE(stats.PARTITION_STOP,'(NRUMBE','(NR') "operation",
stats.last_starts starts,
stats.last_output_rows a_rows,
(stats.last_cu_buffer_gets+stats.last_cr_buffer_gets) lio,
p.lio plio,
trunc(log(10,nullif
(stats.last_starts*stats.cardinality/
nullif(stats.last_output_rows,0),0))) ratio
FROM
v$sql_plan_statistics_all stats
, (select sum(last_cu_buffer_gets + last_cr_buffer_gets) lio,
sum(LAST_ELAPSED_TIME) elapsed,
child_number,
parent_id,
sql_id
from v$sql_plan_statistics_all
group by child_number,sql_id, parent_id) p
WHERE
stats.sql_id='&v_sql_id' and
p.sql_id(+) = stats.sql_id and
p.child_number(+) = stats.child_number and
p.parent_id(+)=stats.id
)
order by sql_id, childn , id
/
which gives output like
Enter value for sql_id: g2w9n4gksyys6
old 59: stats.sql_id='&v_sql_id' and
new 59: stats.sql_id='g2w9n4gksyys6' and
CN ELAPSED LIO_RATIO TCF_GRAPH E_ROWS A_ROWS operation
--- ------------ --------- ------ ------------ ------------ ------------------------------------------------------------
0 0 0 1 SELECT STATEMENT
5,720,456 0 1 1 HASH GROUP BY
29,711 0 1,909 NESTED LOOPS
0 0 +++ 1 1,909 NESTED LOOPS
1,969,304 0 +++ 1 1,909 NESTED LOOPS
0 0 +++ 1 2,027 NESTED LOOPS
7,939,649 0 +++ 1 1,656 NESTED LOOPS
716,054 0 +++ 1 1,657 NESTED LOOPS
270,201 0 ++ 39 23,171 HASH JOIN
23 0 5 1 JOIN FILTER CREATE :BF0000
31 1 5 1 TABLE ACCESS BY INDEX ROWID PS_PAY_CALENDAR
14 2 5 1 INDEX RANGE SCAN PS0PAY_CALENDAR
141,467 0 18,503 23,171 VIEW VW_SQ_1
3,032,120 0 18,503 23,171 HASH GROUP BY
152,564 0 163,420 33,020 JOIN FILTER USE :BF0000
407,746 0 163,420 33,020 MERGE JOIN
55 0 5 1 SORT JOIN
12 2 5 1 INDEX RANGE SCAN PS0PAY_CALENDAR
79,435 0 40,000 33,020 SORT JOIN
119,852 0 40,000 40,000 INDEX FAST FULL SCAN WB_JOB
2,959,031 13 - 23,171 1,657 TABLE ACCESS BY INDEX ROWID WB_JOB
944,887 1 23,171 23,174 INDEX RANGE SCAN WB_JOB
102,650 0 1,657 1,656 VIEW PUSHED PREDICATE VW_SQ_2
73,769 0 1,657 1,657 SORT AGGREGATE
25,617 0 1,657 1,657 FIRST ROW
225,497 1 1,657 1,657 INDEX RANGE SCAN (MIN/MAX) WB_JOB
357,872 0 3,312 2,027 TABLE ACCESS BY INDEX ROWID WB_RETROPAY_EARNS
3,655,774 1 3,312 2,027 INDEX RANGE SCAN WB_RETROPAY_EARNS_IDX1
199,884 0 2,027 1,909 TABLE ACCESS BY INDEX ROWID PS_RETROPAY_RQST
317,793 1 2,027 1,909 INDEX RANGE SCAN PS_RETROPAY_RQST
71,534 0 1,909 1,909 INDEX RANGE SCAN PS#RETROPAYPGM_TBL
18,396 0 1,909 1,909 TABLE ACCESS BY INDEX ROWID PS_RETROPAYPGM_TBL
The 3 important parts of this query are
Elapsed is per row source, not cumulative of it’s children
LIO_RATIO
TCP_GRAPH
Elapsed time format has a huge drawback in the display_cursor output as each lines elapsed time includes the elapsed time of all the children which makes an execution plan difficult to scan and see where the time is being spent. In the above output the elapsed time represents the elapsed time of each row source line.
LIO_RATIO shows the number of buffers accessed per row returned. Ideally 1 buffer or less is accessed per row returned. When the number of buffers per row becomes large, it’s a good indication that there is a more optimal method to get the rows. The I/O stats include the stats of the child row source, so the query has to get the I/O from the childern and subtract from the parent, making the query a bit more complex.
TCP_GRAPH graphically shows the ratio of estimated rows to actual rows. The estimated rows used is cardinality* starts, not just cardinality. This value can be compared directly to actual_rows and the difference in order of magnitude is shown. Each ‘+’ represents and order of magnitude larger and each “-” represents an order of magnitude smaller. The more orders of magnitude, either way, the more the optimizers calculations are off and thus like more pointing to a possible plan that is suboptimal.
In the above output there are 5 lines where the optimizer only expect 1 row and the actual results were over 1000, ie 3 orders of magnitude difference. These are the three lines with “+++” There is one line with “-” where actual was an order of magnitude smaller. On that same line we see it’s one of the slower lines almost 3 seconds and that the were 13 lio’s per row returned, which is sign of inefficiency.
Comments