You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Filtering on a partition key in a SQL query does not always produce a plan with a table scan constrained to scan the minimum number of partitions required for the query.
Example using the orders fixture:
WITH cte AS (
SELECT*, CAST(orderkey ASvarchar) as orderkey_string
FROM orders
)
SELECT*FROM cte
WHERE (orderstatus ='F'OR orderstatus ='P') AND orderkey_string ='2'
Notice from bolded line in the plan, all the possible values (F, O, P) of orderstatus are chosen in the ScanFilterProject node. Then, a Filter node above it does the actual filtering on orderstatus specified in the query (just F and P).
@nayeemzen and I have been digging into why this occurs and it seems related to the way predicates are pushed down from a Project. The following 3 conditions seem to trigger the suboptimal partition selection in the ScanFilterProject node.
Query involves a subquery (ex: CTE or view)
The subquery contains a non-identity expression that is referenced in the main query (ex: CAST(orderkey AS varchar) as orderkey_string)
More than 1 reference to the same partition key within a clause (ex: (orderstatus = 'F' OR orderstatus = 'P')
From tracing through how the plan is optimized for the example query, we've seen that the PredicatePushDown optimizer identifies the (orderstatus = 'F' OR orderstatus = 'P') clause as a non-inlining candidate, which excludes it from being pushed down to the table scan. Instead an additional filter node is created for the (orderstatus = 'F' OR orderstatus = 'P') clause.
@MichelleArk : A similar issue has been discussed in #11265 . Here is the conclusion:
This behavior, in fact, conforms to the SQL Standard
However, our current behavior on subquery is not consistent (some subquery can still have predicate pushdown). We need to have consistent behavior, and probably have this controlled by a session property.
This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.
Filtering on a partition key in a SQL query does not always produce a plan with a table scan constrained to scan the minimum number of partitions required for the query.
Example using the
orders
fixture:Produces the following plan:
Notice from bolded line in the plan, all the possible values (
F
,O
,P
) oforderstatus
are chosen in theScanFilterProject
node. Then, aFilter
node above it does the actual filtering onorderstatus
specified in the query (justF
andP
).@nayeemzen and I have been digging into why this occurs and it seems related to the way predicates are pushed down from a Project. The following 3 conditions seem to trigger the suboptimal partition selection in the
ScanFilterProject
node.CAST(orderkey AS varchar) as orderkey_string
)(orderstatus = 'F' OR orderstatus = 'P')
From tracing through how the plan is optimized for the example query, we've seen that the
PredicatePushDown
optimizer identifies the(orderstatus = 'F' OR orderstatus = 'P')
clause as a non-inlining candidate, which excludes it from being pushed down to the table scan. Instead an additional filter node is created for the(orderstatus = 'F' OR orderstatus = 'P')
clause.This issue seems related to the this change: #10860.
Specifically, constraining the number of references to a symbol within a given clause to be 1 is what flags the clause in the example query as non-inlining candidate.
The text was updated successfully, but these errors were encountered: