Scalar Subqueries

kyle Hailey

Feb 8, 20102 min read

Better formatted at http://tinyurl.com/yfrjbwx

Query 2 The VST diagram looks like

There are two interesting things about this diagram.

Every thing is OUTER JOINED to F_OUTER There are correlated subqueries in the select

There are two things to check – what is the effect of the OUTER JOINS. The OUTER JOINS can easily mean that all joins into F_OUTER don’t change the result set size. Let’s confirm by looking at the TTJ sizes:

The only thing that bounds the number of rows returned by F_OUTER is it’s self join (the line going in and out of F_OUTER) on the bottom left of F_OUTER. What this means is that it doesn’t matter what order we join tables into F_OUTER. Now we can turn to the other interesting thing about the query. There are 4 correlated subselects in the select clause. These queries in the select clause are called “scalar subqueries.” These scalar subqueries can be a bad idea or a good idea depending on how mainly on how many distinct values are used as input into the them. AFAIK, Oracle doesn’t merge subqueries in the select, and certainly not the ones in this query because they are embedded in cases statements – looks like I might have spoken too soon! more reseach to do but looks like Oracle can merge these scalar subqueries even with the case statement. I will try to run some more tests . To be continued) In the worst case scenario the scalar subqueries are going to be run 845,012 times – that is one heck of a lot of work which would be a BAD IDEA.

Looking at the above diagram, and the 4 scalar subqueries in the select clause, the top red number is how many times the scalar subquery will be run (based on the case statement in the select clause) and the orange highlight is how many distinct values will be used in the scalar subquery where clause. P3 will benefit for scalar subquery caching but F won’t because there are too many distinct values. On the other hand for P1 and P2 could if there are no collisions (see CBOF p216-217) and the scalar subquery caching is actually supports 1024 values. ( see http://www.oratechinfo.co.uk/scalar_subqueries.html#scalar3 for a great write up on analysis of scalar subquery caching – the analysis on this page seems to show that caching maxes out way before 1024 but that might be because of collisions)

The subqueries in the select clause look like

select CASE WHEN F.f1 IS NULL

THEN NULL

ELSE (SELECT X.f2

FROM X

WHERE code_vl = F.f1)

END AS f0

from F;

and could be merged into the query like:

select CASE WHEN F.f1 IS NULL

THEN NULL

ELSE ( X.f2)

END AS f0

from F , X

where code_vl(+) = F.f1;

( NOTE: the first query will break if the correlated sub query returns more than one value where as the second query will return the mulitple rows.)

The VST diagram can’t tell you that this is the solution, but they can point out that the place to spend your time is the subqueries in the select.

Scalar Subqueries

Recent Posts

Comments