Visual SQL Tuning

Optimizing problem SQL is probably the most difficul common DBA task. Finding problem SQL is pretty easy using Active Session History data and finding which SQL contribute the most to overall database load.

Once we have found the top SQL , then what do we do to optimize it? An easy issue is a missing index on a field that has a highly selective predicate filter.

But what if there is nothing obious? what if we are joining 10 to 20 tables? How do we know if the optimizer chose the optimal plan? There is no obvious easy answer to this question. There is on the other hand a methodology as outlined in Dan Tow's book "SQL Tuning" the TL;DR is look for the table that has the highest predicate fitler ratio, i.e. find the table in the SQL query with a where clause like "field=value" where that restriction filters out the biggest % of rows. For example a query might have predicate filters on serveral fields , but if there is one of those filters that filters out the largest percentage of rows, that is where we wan to to begin our execution plan of that query.

Let’s take an example query:

SELECT COUNT (*)
FROM   a,
       b,
       c
WHERE
       b.val2 = 100 AND
       a.val1 = b.id AND
       b.val1 = c.id;

There are indexes on b.id and c.id. Diagramming the query in DB Optimizer gives

The red lines with crows feet mean that as far as the definitions go, the relations could be many to many.

Question is “what is the optimal execution path for this query?”

One of the best execution plans is to

start at the most selective table filter
join to parent if possible
else join to children

There is one filter in the diagram, represented by the green F on table B. Table B has a filter criteria in the query “b.val2=100″.

Ok, table B is where we start the query. Now where do we go from B? Who is the parent and who is the child? It’s not defined in the constraints nor indexes on these tables so it’s hard for us to know. Guess what ? It’s also hard for Oracle to figure it out. Well, what does Oracle decide to do? This is where the cool part of DB Optimizer comes in.

The super cool thing with DB Optimizer is we can overlay the diagram with the actual execution path (I think this is so awesome)

For the digram we can see Oracle starts with B and joins to A. The result if this is joined to C. Is this the optimal path?

Well, let’s keep the same indexes and just add some constraints:

alter table c add constraint c_pk_con unique (id);
alter table b add constraint b_pk_con unique (id);

Now let’s diagram the query with DB Optimizer:

We can now see who the parent and child is, so we can determine the optimal query path which is to start at B, the only filter and join to the child C then to the parent A. Now what does Oracle do with the added constraint info:

Guess what? The execution plan has now changed with the addition of constraints and now Oracle’s execution path goes from a suboptimal plan to the optimal path. Moral of the story is to make sure and define constraint information because it helps the optimizer, but what I wanted to show here was the explain plan overlay on the diagram which makes comparing execution plans much easier. Putting the queries VST diagrams side by side along with the overlay of execution path we can clearly and quickly see the differences:

The dark green nodes are the starting nodes and the red nods are the ending nodes. Inbetween nodes would be light green, but there are none in this diagram.

Here is an example from an article by Jonathan Lewis

http://www.simple-talk.com/sql/performance/designing-efficient-sql-a-visual-approach/

The query Jonathan discusses is

SELECT order_line_data
FROM
         customers cus
         INNER JOIN
         orders ord
         ON ord.id_customer = cus.id
         INNER JOIN
         order_lines orl
         ON orl.id_order = ord.id
         INNER JOIN
         products prd1
         ON prd1.id = orl.id_product
         INNER JOIN
         suppliers sup1
         ON sup1.id = prd1.id_supplier
   WHERE
         cus.location = 'LONDON' AND
         ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND
         sup1.location = 'LEEDS' AND
    EXISTS (SELECT NULL
            FROM
                 alternatives alt
                 INNER JOIN
                 products prd2
                 ON prd2.id = alt.id_product_sub
                 INNER JOIN
                 suppliers sup2
                 ON sup2.id = prd2.id_supplier
           WHERE
                  alt.id_product = prd1.id AND
                  sup2.location != 'LEEDS'

which diagrammed looks like

There are multiple filters, so we need to know which one is the most selective to know where to start, so we ask DB Optimizer to display the statistics as well (blue below a table is the filter %, green above is # of rows in table and numbers on join lines are rows returned by a join of just those two tables)

Now that we can determine a candidate for best optimization path, does Oracle take it?

Can you find the optimization error? (Hint the not exists subquery is sub-optimially accessed)

Dark green is where execution starts. There are two starts: one for the main query body and one for the subquery.

The red is where query execution ends

Oracle correctly makes the most important decision which is to start the execution plan on "Customers" because the predicate filter is the strongest. The fillter only returns the 0.5% of the rows form the table. The only mistake Oracle makes, which is minor, is on the subquery joining ALTERNATIVES, PRODUCTS and SUPPLIERS. Oracle starts on ALTERNATIVES and PRODUCTS when it should start on SUPPLIERS which has a predicat fitler (50%) joining to PRODUCTS. IT's not a big mistake but does make a difference, and the only way I would be able to see this error is with the VST diagram which shows all the predicate fitlers and shows where Oracle starts the execution plan. This execution plan is more difficult than most as the there are two start points. There is the start point for the main part of the query starting on CUSTOMERS joined to ORDERS. There is a second starting point in the subquery joining ALTERNATIVES, PRODUCTS and SUPPLIERS. IN this subquery we should have started on SUPPLIERS which has a predicate filter instead of starting at ALTERNATIVES and PRODUCTS neither if which have a predicate filter.

UPDATE December 2024 One of the time consuming parts of making a VST diagram is extracting all the two table joins and predicate filters. Ideally one can run each of the two table joins to see if the join actually filters out rows, i.e. not every row in the parent is in the child . THis is most important in many to many relationships. IN many to many the resulting rows in a join can be more than in either table which "blows up" the running row set and is something to be avoided.

Also one of the most important things is calculating the predicate filter ratio for any tables that have predicat filters. The most important step in a sql execution path is picking the first table and the first table to start with is the one with the highest filter ratio.

It now looks like ChatGPT can do all these extractions. This is the first time since I started testing in over the past two years that it got all of these extractions correct: can you extra a list of all possible two table joins not including cartesian joins. The joins have to be on a field in each of the two tables. Here is the query: SELECT order_line_data FROM customers cus INNER JOIN orders ord ON ord.id_customer = cus.id INNER JOIN order_lines orl ON orl.id_order = ord.id INNER JOIN products prd1 ON prd1.id = orl.id_product INNER JOIN suppliers sup1 ON sup1.id = prd1.id_supplier WHERE cus.location = 'LONDON' AND ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND sup1.location = 'LEEDS' AND EXISTS (SELECT NULL FROM alternatives alt INNER JOIN products prd2 ON prd2.id = alt.id_product_sub INNER JOIN suppliers sup2 ON sup2.id = prd2.id_supplier WHERE alt.id_product = prd1.id AND sup2.location != 'LEEDS'

ChatGPT said:

ChatGPT

To extract a list of all possible two-table joins from the query, not including Cartesian joins, we need to focus on the ON clauses where there is a relationship defined between fields in two tables. Each INNER JOIN defines such a relationship.

Here is the list of all the joins from the query, including their joining conditions:

List of Two-Table Joins

customers and orders
- Join condition: orders.id_customer = customers.id
orders and order_lines
- Join condition: order_lines.id_order = orders.id
order_lines and products (prd1)
- Join condition: products.id = order_lines.id_product
products (prd1) and suppliers (sup1)
- Join condition: suppliers.id = products.id_supplier
products (prd1) and alternatives
- Join condition: alternatives.id_product = products.id
alternatives and products (prd2)
- Join condition: products.id = alternatives.id_product_sub
products (prd2) and suppliers (sup2)
- Join condition: suppliers.id = products.id_supplier

Additional Notes

Each join specifies a relationship between two tables using their keys or foreign keys, which ensures no Cartesian join (a join without a condition).
Some tables (like products and suppliers) are joined multiple times under different aliases (e.g., prd1 and prd2, sup1 and sup2).

Formatted Table of Joins

Table 1	Table 2	Join Condition
customers	orders	orders.id_customer = customers.id
orders	order_lines	order_lines.id_order = orders.id
order_lines	products	products.id = order_lines.id_product
products	suppliers	suppliers.id = products.id_supplier
products	alternatives	alternatives.id_product = products.id
alternatives	products	products.id = alternatives.id_product_sub
products	suppliers	suppliers.id = products.id_supplier

This structure provides all two-table joins from your query based on their ON conditions. Let me know if you'd like further details or analysis!

You said:

can you extract actual SQL statements to do the two table joins, and include any predicate filters on either of the two tables in the two table joins: SELECT order_line_data FROM customers cus INNER JOIN orders ord ON ord.id_customer = cus.id INNER JOIN order_lines orl ON orl.id_order = ord.id INNER JOIN products prd1 ON prd1.id = orl.id_product INNER JOIN suppliers sup1 ON sup1.id = prd1.id_supplier WHERE cus.location = 'LONDON' AND ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND sup1.location = 'LEEDS' AND EXISTS (SELECT NULL FROM alternatives alt INNER JOIN products prd2 ON prd2.id = alt.id_product_sub INNER JOIN suppliers sup2 ON sup2.id = prd2.id_supplier WHERE alt.id_product = prd1.id AND sup2.location != 'LEEDS'

ChatGPT said:

ChatGPT

Here are the extracted two-table join SQL statements from the provided query, including any relevant predicate filters applied to the tables in the WHERE clause:

1. customers and orders

SELECT cus.id AS customer_id, ord.id AS order_id FROM customers cus INNER JOIN orders ord ON ord.id_customer = cus.id WHERE cus.location = 'LONDON' AND ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10';

Predicates:
- cus.location = 'LONDON'
- ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10'

2. orders and order_lines

SELECT ord.id AS order_id, orl.id AS order_line_id FROM orders ord INNER JOIN order_lines orl ON orl.id_order = ord.id WHERE ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10';

Predicates:
- ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10'

3. order_lines and products (prd1)

SELECT orl.id AS order_line_id, prd1.id AS product_id FROM order_lines orl INNER JOIN products prd1 ON prd1.id = orl.id_product;

No specific predicates for these two tables, but downstream filters on prd1 may apply when joined with suppliers.

4. products (prd1) and suppliers (sup1)

SELECT prd1.id AS product_id, sup1.id AS supplier_id FROM products prd1 INNER JOIN suppliers sup1 ON sup1.id = prd1.id_supplier WHERE sup1.location = 'LEEDS';

Predicate:
- sup1.location = 'LEEDS'

5. products (prd1) and alternatives

SELECT prd1.id AS product_id, alt.id_product_sub AS alternative_product_id FROM products prd1 INNER JOIN alternatives alt ON alt.id_product = prd1.id;

No direct predicates, but this join supports the nested query.

6. alternatives and products (prd2)

SELECT alt.id_product_sub AS alternative_product_id, prd2.id AS product_id FROM alternatives alt INNER JOIN products prd2 ON prd2.id = alt.id_product_sub;

No specific predicates, but downstream filters apply when joined with suppliers.

7. products (prd2) and suppliers (sup2)

SELECT prd2.id AS product_id, sup2.id AS supplier_id FROM products prd2 INNER JOIN suppliers sup2 ON sup2.id = prd2.id_supplier WHERE sup2.location != 'LEEDS';

Predicate:
- sup2.location != 'LEEDS'

Key Notes

Each join includes relevant filters from the WHERE clause.
The filters are applied to ensure only the necessary rows are processed at each stage of the join.
These extracted SQL statements represent the relationships and filtering logic used in the original query.

Let me know if you need further clarification or adjustments!

You said:

can you extract a set of each "select count(*) from table with field=value" , i.e. extract out SQL queries to count the # of rows after applying any predicate filters . If a table has multiple predicate filters, include all of them in the select statement from that table: SELECT order_line_data FROM customers cus INNER JOIN orders ord ON ord.id_customer = cus.id INNER JOIN order_lines orl ON orl.id_order = ord.id INNER JOIN products prd1 ON prd1.id = orl.id_product INNER JOIN suppliers sup1 ON sup1.id = prd1.id_supplier WHERE cus.location = 'LONDON' AND ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND sup1.location = 'LEEDS' AND EXISTS (SELECT NULL FROM alternatives alt INNER JOIN products prd2 ON prd2.id = alt.id_product_sub INNER JOIN suppliers sup2 ON sup2.id = prd2.id_supplier WHERE alt.id_product = prd1.id AND sup2.location != 'LEEDS'

ChatGPT said:

ChatGPT

Here is the set of SELECT COUNT(*) queries for each table in the provided SQL query, including all applicable predicate filters:

1. customers

SELECT COUNT(*) FROM customers cus WHERE cus.location = 'LONDON';

Predicate: cus.location = 'LONDON'

2. orders

SELECT COUNT(*) FROM orders ord WHERE ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10';

Predicate: ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10'

3. order_lines

SELECT COUNT(*) FROM order_lines orl;

No specific predicate filters on order_lines.

4. products (prd1)

SELECT COUNT(*) FROM products prd1;

No specific predicate filters on products in the main query.

5. suppliers (sup1)

SELECT COUNT(*) FROM suppliers sup1 WHERE sup1.location = 'LEEDS';

Predicate: sup1.location = 'LEEDS'

6. alternatives

SELECT COUNT(*) FROM alternatives alt;

No specific predicate filters on alternatives.

7. products (prd2)`

SELECT COUNT(*) FROM products prd2;

No specific predicate filters on products in the subquery.

8. suppliers (sup2)

SELECT COUNT(*) FROM suppliers sup2 WHERE sup2.location != 'LEEDS';

Predicate: sup2.location != 'LEEDS'

Explanation

Each SELECT COUNT(*) includes any filters that apply directly to the table as defined in the WHERE clause of the query.
If no filters are directly applied to a table, the query simply counts all rows in that table.

can you list ever set of parent - child relationships i.e. list table pairs and who is the parent and who is the child:
ChatGPT said:
ChatGPT
To identify the parent-child relationships in your query, we examine the ON conditions in the INNER JOIN clauses. A parent table typically has a primary key that is referenced as a foreign key in the child table.
Here’s the list of parent-child relationships from your query:
1. customers → orders
- Parent: customers
- Child: orders
- Relationship: orders.id_customer = customers.id
2. orders → order_lines
- Parent: orders
- Child: order_lines
- Relationship: order_lines.id_order = orders.id
3. order_lines → products (prd1)
- Parent: products (prd1)
- Child: order_lines
- Relationship: order_lines.id_product = products.id
4. products (prd1) → suppliers (sup1)
- Parent: suppliers (sup1)
- Child: products (prd1)
- Relationship: products.id_supplier = suppliers.id
5. products (prd1) → alternatives
- Parent: products (prd1)
- Child: alternatives
- Relationship: alternatives.id_product = products.id
6. alternatives → products (prd2)
- Parent: products (prd2)
- Child: alternatives
- Relationship: alternatives.id_product_sub = products.id
7. products (prd2) → suppliers (sup2)
- Parent: suppliers (sup2)
- Child: products (prd2)
- Relationship: products.id_supplier = suppliers.id
Summary of Parent-Child Relationships

Parent Table	Child Table	Relationship
customers	orders	orders.id_customer = customers.id
orders	order_lines	order_lines.id_order = orders.id
products (prd1)	order_lines	order_lines.id_product = products.id
suppliers (sup1)	products (prd1)	products.id_supplier = suppliers.id
products (prd1)	alternatives	alternatives.id_product = products.id
products (prd2)	alternatives	alternatives.id_product_sub = products.id
suppliers (sup2)	products (prd2)	products.id_supplier = suppliers.id

Visual SQL Tuning

ChatGPT said:

List of Two-Table Joins

Additional Notes

Formatted Table of Joins

You said:

ChatGPT said:

1. customers and orders

2. orders and order_lines

3. order_lines and products (prd1)

4. products (prd1) and suppliers (sup1)

5. products (prd1) and alternatives

6. alternatives and products (prd2)

7. products (prd2) and suppliers (sup2)

Key Notes

You said:

ChatGPT said:

1. customers

2. orders

3. order_lines

4. products (prd1)

5. suppliers (sup1)

6. alternatives

7. products (prd2)`

8. suppliers (sup2)

Explanation

ChatGPT said:

1. customers → orders

2. orders → order_lines

3. order_lines → products (prd1)

4. products (prd1) → suppliers (sup1)

5. products (prd1) → alternatives

6. alternatives → products (prd2)

7. products (prd2) → suppliers (sup2)

Summary of Parent-Child Relationships

Recent Posts

Comments