Optimizing problem SQL is probably the most difficul common DBA task. Finding problem SQL is pretty easy using Active Session History data and finding which SQL contribute the most to overall database load.
Once we have found the top SQL , then what do we do to optimize it? An easy issue is a missing index on a field that has a highly selective predicate filter.
But what if there is nothing obious? what if we are joining 10 to 20 tables? How do we know if the optimizer chose the optimal plan? There is no obvious easy answer to this question. There is on the other hand a methodology as outlined in Dan Tow's book "SQL Tuning" the TL;DR is look for the table that has the highest predicate fitler ratio, i.e. find the table in the SQL query with a where clause like "field=value" where that restriction filters out the biggest % of rows. For example a query might have predicate filters on serveral fields , but if there is one of those filters that filters out the largest percentage of rows, that is where we wan to to begin our execution plan of that query.
Let’s take an example query:
SELECT COUNT (*)
FROM a,
b,
c
WHERE
b.val2 = 100 AND
a.val1 = b.id AND
b.val1 = c.id;
The red lines with crows feet mean that as far as the definitions go, the relations could be many to many.
Question is “what is the optimal execution path for this query?”
One of the best execution plans is to
start at the most selective table filter
join to parent if possible
else join to children
There is one filter in the diagram, represented by the green F on table B. Table B has a filter criteria in the query “b.val2=100″.
Ok, table B is where we start the query. Now where do we go from B? Who is the parent and who is the child? It’s not defined in the constraints nor indexes on these tables so it’s hard for us to know. Guess what ? It’s also hard for Oracle to figure it out. Well, what does Oracle decide to do? This is where the cool part of DB Optimizer comes in.
The super cool thing with DB Optimizer is we can overlay the diagram with the actual execution path (I think this is so awesome)
For the digram we can see Oracle starts with B and joins to A. The result if this is joined to C. Is this the optimal path?
Well, let’s keep the same indexes and just add some constraints:
alter table c add constraint c_pk_con unique (id);
alter table b add constraint b_pk_con unique (id);
Now let’s diagram the query with DB Optimizer:
We can now see who the parent and child is, so we can determine the optimal query path which is to start at B, the only filter and join to the child C then to the parent A. Now what does Oracle do with the added constraint info:
Guess what? The execution plan has now changed with the addition of constraints and now Oracle’s execution path goes from a suboptimal plan to the optimal path. Moral of the story is to make sure and define constraint information because it helps the optimizer, but what I wanted to show here was the explain plan overlay on the diagram which makes comparing execution plans much easier. Putting the queries VST diagrams side by side along with the overlay of execution path we can clearly and quickly see the differences:
The dark green nodes are the starting nodes and the red nods are the ending nodes. Inbetween nodes would be light green, but there are none in this diagram.
Here is an example from an article by Jonathan Lewis
The query Jonathan discusses is
SELECT order_line_data
FROM
customers cus
INNER JOIN
orders ord
ON ord.id_customer = cus.id
INNER JOIN
order_lines orl
ON orl.id_order = ord.id
INNER JOIN
products prd1
ON prd1.id = orl.id_product
INNER JOIN
suppliers sup1
ON sup1.id = prd1.id_supplier
WHERE
cus.location = 'LONDON' AND
ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND
sup1.location = 'LEEDS' AND
EXISTS (SELECT NULL
FROM
alternatives alt
INNER JOIN
products prd2
ON prd2.id = alt.id_product_sub
INNER JOIN
suppliers sup2
ON sup2.id = prd2.id_supplier
WHERE
alt.id_product = prd1.id AND
sup2.location != 'LEEDS'
which diagrammed looks like
There are multiple filters, so we need to know which one is the most selective to know where to start, so we ask DB Optimizer to display the statistics as well (blue below a table is the filter %, green above is # of rows in table and numbers on join lines are rows returned by a join of just those two tables)
Now that we can determine a candidate for best optimization path, does Oracle take it?
Can you find the optimization error? (Hint the not exists subquery is sub-optimially accessed)
Dark green is where execution starts. There are two starts: one for the main query body and one for the subquery.
The red is where query execution ends
Oracle correctly makes the most important decision which is to start the execution plan on "Customers" because the predicate filter is the strongest. The fillter only returns the 0.5% of the rows form the table. The only mistake Oracle makes, which is minor, is on the subquery joining ALTERNATIVES, PRODUCTS and SUPPLIERS. Oracle starts on ALTERNATIVES and PRODUCTS when it should start on SUPPLIERS which has a predicat fitler (50%) joining to PRODUCTS. IT's not a big mistake but does make a difference, and the only way I would be able to see this error is with the VST diagram which shows all the predicate fitlers and shows where Oracle starts the execution plan. This execution plan is more difficult than most as the there are two start points. There is the start point for the main part of the query starting on CUSTOMERS joined to ORDERS. There is a second starting point in the subquery joining ALTERNATIVES, PRODUCTS and SUPPLIERS. IN this subquery we should have started on SUPPLIERS which has a predicate filter instead of starting at ALTERNATIVES and PRODUCTS neither if which have a predicate filter.
UPDATE December 2024 One of the time consuming parts of making a VST diagram is extracting all the two table joins and predicate filters. Ideally one can run each of the two table joins to see if the join actually filters out rows, i.e. not every row in the parent is in the child . THis is most important in many to many relationships. IN many to many the resulting rows in a join can be more than in either table which "blows up" the running row set and is something to be avoided.
Also one of the most important things is calculating the predicate filter ratio for any tables that have predicat filters. The most important step in a sql execution path is picking the first table and the first table to start with is the one with the highest filter ratio.
It now looks like ChatGPT can do all these extractions. This is the first time since I started testing in over the past two years that it got all of these extractions correct: can you extra a list of all possible two table joins not including cartesian joins. The joins have to be on a field in each of the two tables. Here is the query: SELECT order_line_data FROM customers cus INNER JOIN orders ord ON ord.id_customer = cus.id INNER JOIN order_lines orl ON orl.id_order = ord.id INNER JOIN products prd1 ON prd1.id = orl.id_product INNER JOIN suppliers sup1 ON sup1.id = prd1.id_supplier WHERE cus.location = 'LONDON' AND ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND sup1.location = 'LEEDS' AND EXISTS (SELECT NULL FROM alternatives alt INNER JOIN products prd2 ON prd2.id = alt.id_product_sub INNER JOIN suppliers sup2 ON sup2.id = prd2.id_supplier WHERE alt.id_product = prd1.id AND sup2.location != 'LEEDS'
ChatGPT said:
ChatGPT
To extract a list of all possible two-table joins from the query, not including Cartesian joins, we need to focus on the ON clauses where there is a relationship defined between fields in two tables. Each INNER JOIN defines such a relationship.
Here is the list of all the joins from the query, including their joining conditions:
List of Two-Table Joins
customers and orders
Join condition: orders.id_customer = customers.id
orders and order_lines
Join condition: order_lines.id_order = orders.id
order_lines and products (prd1)
Join condition: products.id = order_lines.id_product
products (prd1) and suppliers (sup1)
Join condition: suppliers.id = products.id_supplier
products (prd1) and alternatives
Join condition: alternatives.id_product = products.id
alternatives and products (prd2)
Join condition: products.id = alternatives.id_product_sub
products (prd2) and suppliers (sup2)
Join condition: suppliers.id = products.id_supplier
Additional Notes
Each join specifies a relationship between two tables using their keys or foreign keys, which ensures no Cartesian join (a join without a condition).
Some tables (like products and suppliers) are joined multiple times under different aliases (e.g., prd1 and prd2, sup1 and sup2).
Formatted Table of Joins
Table 1 | Table 2 | Join Condition |
customers | orders | orders.id_customer = customers.id |
orders | order_lines | order_lines.id_order = orders.id |
order_lines | products | products.id = order_lines.id_product |
products | suppliers | suppliers.id = products.id_supplier |
products | alternatives | alternatives.id_product = products.id |
alternatives | products | products.id = alternatives.id_product_sub |
products | suppliers | suppliers.id = products.id_supplier |
This structure provides all two-table joins from your query based on their ON conditions. Let me know if you'd like further details or analysis!
4o
You said:
can you extract actual SQL statements to do the two table joins, and include any predicate filters on either of the two tables in the two table joins: SELECT order_line_data FROM customers cus INNER JOIN orders ord ON ord.id_customer = cus.id INNER JOIN order_lines orl ON orl.id_order = ord.id INNER JOIN products prd1 ON prd1.id = orl.id_product INNER JOIN suppliers sup1 ON sup1.id = prd1.id_supplier WHERE cus.location = 'LONDON' AND ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND sup1.location = 'LEEDS' AND EXISTS (SELECT NULL FROM alternatives alt INNER JOIN products prd2 ON prd2.id = alt.id_product_sub INNER JOIN suppliers sup2 ON sup2.id = prd2.id_supplier WHERE alt.id_product = prd1.id AND sup2.location != 'LEEDS'
ChatGPT said:
ChatGPT
Here are the extracted two-table join SQL statements from the provided query, including any relevant predicate filters applied to the tables in the WHERE clause:
1. customers and orders
SELECT cus.id AS customer_id, ord.id AS order_id FROM customers cus INNER JOIN orders ord ON ord.id_customer = cus.id WHERE cus.location = 'LONDON' AND ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10';
Predicates:
cus.location = 'LONDON'
ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10'
2. orders and order_lines
SELECT ord.id AS order_id, orl.id AS order_line_id FROM orders ord INNER JOIN order_lines orl ON orl.id_order = ord.id WHERE ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10';
Predicates:
ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10'
3. order_lines and products (prd1)
SELECT orl.id AS order_line_id, prd1.id AS product_id FROM order_lines orl INNER JOIN products prd1 ON prd1.id = orl.id_product;
No specific predicates for these two tables, but downstream filters on prd1 may apply when joined with suppliers.
4. products (prd1) and suppliers (sup1)
SELECT prd1.id AS product_id, sup1.id AS supplier_id FROM products prd1 INNER JOIN suppliers sup1 ON sup1.id = prd1.id_supplier WHERE sup1.location = 'LEEDS';
Predicate:
sup1.location = 'LEEDS'
5. products (prd1) and alternatives
SELECT prd1.id AS product_id, alt.id_product_sub AS alternative_product_id FROM products prd1 INNER JOIN alternatives alt ON alt.id_product = prd1.id;
No direct predicates, but this join supports the nested query.
6. alternatives and products (prd2)
SELECT alt.id_product_sub AS alternative_product_id, prd2.id AS product_id FROM alternatives alt INNER JOIN products prd2 ON prd2.id = alt.id_product_sub;
No specific predicates, but downstream filters apply when joined with suppliers.
7. products (prd2) and suppliers (sup2)
SELECT prd2.id AS product_id, sup2.id AS supplier_id FROM products prd2 INNER JOIN suppliers sup2 ON sup2.id = prd2.id_supplier WHERE sup2.location != 'LEEDS';
Predicate:
sup2.location != 'LEEDS'
Key Notes
Each join includes relevant filters from the WHERE clause.
The filters are applied to ensure only the necessary rows are processed at each stage of the join.
These extracted SQL statements represent the relationships and filtering logic used in the original query.
Let me know if you need further clarification or adjustments!
You said:
can you extract a set of each "select count(*) from table with field=value" , i.e. extract out SQL queries to count the # of rows after applying any predicate filters . If a table has multiple predicate filters, include all of them in the select statement from that table: SELECT order_line_data FROM customers cus INNER JOIN orders ord ON ord.id_customer = cus.id INNER JOIN order_lines orl ON orl.id_order = ord.id INNER JOIN products prd1 ON prd1.id = orl.id_product INNER JOIN suppliers sup1 ON sup1.id = prd1.id_supplier WHERE cus.location = 'LONDON' AND ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND sup1.location = 'LEEDS' AND EXISTS (SELECT NULL FROM alternatives alt INNER JOIN products prd2 ON prd2.id = alt.id_product_sub INNER JOIN suppliers sup2 ON sup2.id = prd2.id_supplier WHERE alt.id_product = prd1.id AND sup2.location != 'LEEDS'
ChatGPT said:
ChatGPT
Here is the set of SELECT COUNT(*) queries for each table in the provided SQL query, including all applicable predicate filters:
1. customers
SELECT COUNT(*) FROM customers cus WHERE cus.location = 'LONDON';
Predicate: cus.location = 'LONDON'
2. orders
SELECT COUNT(*) FROM orders ord WHERE ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10';
Predicate: ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10'
3. order_lines
SELECT COUNT(*) FROM order_lines orl;
No specific predicate filters on order_lines.
4. products (prd1)
SELECT COUNT(*) FROM products prd1;
No specific predicate filters on products in the main query.
5. suppliers (sup1)
SELECT COUNT(*) FROM suppliers sup1 WHERE sup1.location = 'LEEDS';
Predicate: sup1.location = 'LEEDS'
6. alternatives
SELECT COUNT(*) FROM alternatives alt;
No specific predicate filters on alternatives.
7. products (prd2)`
SELECT COUNT(*) FROM products prd2;
No specific predicate filters on products in the subquery.
8. suppliers (sup2)
SELECT COUNT(*) FROM suppliers sup2 WHERE sup2.location != 'LEEDS';
Predicate: sup2.location != 'LEEDS'
Explanation
Each SELECT COUNT(*) includes any filters that apply directly to the table as defined in the WHERE clause of the query.
If no filters are directly applied to a table, the query simply counts all rows in that table.
can you list ever set of parent - child relationships i.e. list table pairs and who is the parent and who is the child:
ChatGPT said:
ChatGPT
To identify the parent-child relationships in your query, we examine the ON conditions in the INNER JOIN clauses. A parent table typically has a primary key that is referenced as a foreign key in the child table.
Here’s the list of parent-child relationships from your query:
1. customers → orders
Parent: customers
Child: orders
Relationship: orders.id_customer = customers.id
2. orders → order_lines
Parent: orders
Child: order_lines
Relationship: order_lines.id_order = orders.id
3. order_lines → products (prd1)
Parent: products (prd1)
Child: order_lines
Relationship: order_lines.id_product = products.id
4. products (prd1) → suppliers (sup1)
Parent: suppliers (sup1)
Child: products (prd1)
Relationship: products.id_supplier = suppliers.id
5. products (prd1) → alternatives
Parent: products (prd1)
Child: alternatives
Relationship: alternatives.id_product = products.id
6. alternatives → products (prd2)
Parent: products (prd2)
Child: alternatives
Relationship: alternatives.id_product_sub = products.id
7. products (prd2) → suppliers (sup2)
Parent: suppliers (sup2)
Child: products (prd2)
Relationship: products.id_supplier = suppliers.id
Summary of Parent-Child Relationships
Parent Table | Child Table | Relationship |
customers | orders | orders.id_customer = customers.id |
orders | order_lines | order_lines.id_order = orders.id |
products (prd1) | order_lines | order_lines.id_product = products.id |
suppliers (sup1) | products (prd1) | products.id_supplier = suppliers.id |
products (prd1) | alternatives | alternatives.id_product = products.id |
products (prd2) | alternatives | alternatives.id_product_sub = products.id |
suppliers (sup2) | products (prd2) | products.id_supplier = suppliers.id |
Comentários