Predicate Pushdown of Partition Key Inconsistent with Specified Filter in SQL #12538

MichelleArk · 2019-03-27T15:17:52Z

Filtering on a partition key in a SQL query does not always produce a plan with a table scan constrained to scan the minimum number of partitions required for the query.

Example using the orders fixture:

WITH cte AS (
    SELECT *, CAST(orderkey AS varchar) as orderkey_string 
    FROM orders
) 

SELECT * 
FROM cte 
WHERE (orderstatus = 'F' OR orderstatus = 'P') AND orderkey_string = '2'

Produces the following plan:

 - Output[orderkey, custkey, orderstatus, totalprice, orderdate, orderpriority, clerk, shippriority, comment, orderkey_string] => [orderkey:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:integer, comment:varchar(79), expr_8:varchar]
        Cost: ?, Output: ? rows (?B)
        orderkey_string := expr_8
    - RemoteExchange[GATHER] => [orderkey:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:integer, comment:varchar(79), expr_8:varchar]
            Cost: ?, Output: ? rows (?B)
        - Filter[filterPredicate = (("orderstatus" = 'F') OR ("orderstatus" = 'P'))] => [orderkey:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:integer, comment:varchar(79), expr_8:varchar]
                Cost: ?, Output: ? rows (?B)
            - ScanFilterProject[table = local:orders:sf0.01, filterPredicate = (CAST("orderkey" AS varchar) = CAST('2' AS varchar))] => [orderkey:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:integer, comment:varchar(79), expr_8:varchar]
                    Cost: ?, Output: ? rows (?B)
                    expr_8 := CAST("orderkey" AS varchar)
                    clerk := tpch:clerk
                    orderkey := tpch:orderkey
                    orderstatus := tpch:orderstatus
                        :: [[F], [O], [P]]
                    totalprice := tpch:totalprice
                    custkey := tpch:custkey
                    comment := tpch:comment
                    orderdate := tpch:orderdate
                    shippriority := tpch:shippriority
                    orderpriority := tpch:orderpriority

Notice from bolded line in the plan, all the possible values (F, O, P) of orderstatus are chosen in the ScanFilterProject node. Then, a Filter node above it does the actual filtering on orderstatus specified in the query (just F and P).

@nayeemzen and I have been digging into why this occurs and it seems related to the way predicates are pushed down from a Project. The following 3 conditions seem to trigger the suboptimal partition selection in the ScanFilterProject node.

Query involves a subquery (ex: CTE or view)
The subquery contains a non-identity expression that is referenced in the main query (ex: CAST(orderkey AS varchar) as orderkey_string)
More than 1 reference to the same partition key within a clause (ex: (orderstatus = 'F' OR orderstatus = 'P')

From tracing through how the plan is optimized for the example query, we've seen that the PredicatePushDown optimizer identifies the (orderstatus = 'F' OR orderstatus = 'P') clause as a non-inlining candidate, which excludes it from being pushed down to the table scan. Instead an additional filter node is created for the (orderstatus = 'F' OR orderstatus = 'P') clause.

This issue seems related to the this change: #10860.
Specifically, constraining the number of references to a symbol within a given clause to be 1 is what flags the clause in the example query as non-inlining candidate.

The text was updated successfully, but these errors were encountered:

wenleix · 2019-03-28T17:06:23Z

@MichelleArk : A similar issue has been discussed in #11265 . Here is the conclusion:

This behavior, in fact, conforms to the SQL Standard
However, our current behavior on subquery is not consistent (some subquery can still have predicate pushdown). We need to have consistent behavior, and probably have this controlled by a session property.

Let me know if you have any further questions :)

stale · 2021-06-22T17:34:12Z

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

wenleix self-assigned this Mar 28, 2019

stale bot added the stale label Jun 22, 2021

weiatwork mentioned this issue Jul 1, 2021

Relax the restriction on predicate pushdown over projection trinodb/trino#8451

Closed

stale bot closed this as completed Jul 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicate Pushdown of Partition Key Inconsistent with Specified Filter in SQL #12538

Predicate Pushdown of Partition Key Inconsistent with Specified Filter in SQL #12538

MichelleArk commented Mar 27, 2019

wenleix commented Mar 28, 2019 •

edited

Loading

stale bot commented Jun 22, 2021

Predicate Pushdown of Partition Key Inconsistent with Specified Filter in SQL #12538

Predicate Pushdown of Partition Key Inconsistent with Specified Filter in SQL #12538

Comments

MichelleArk commented Mar 27, 2019

wenleix commented Mar 28, 2019 • edited Loading

stale bot commented Jun 22, 2021

wenleix commented Mar 28, 2019 •

edited

Loading