Implement predicate pushdown for table functions #17928

homar · 2023-06-16T08:39:34Z

Description

Implement predicate pushdown for table functions - it is necessary to improve performance of queries using table_changes on delta lake

Additional context and related issues

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

homar · 2023-07-13T09:22:54Z

Failure is not related: #17436

findepi · 2023-07-18T08:35:00Z

Failure is not related: #17436

thanks, restarted

ebyhr · 2023-07-31T02:56:45Z

Could you rebase on master to resolve conflicts? CountingAccessMetadata was removed in the recent commit.

findepi

some high level design comments

findepi · 2023-08-01T11:37:02Z

core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java

@@ -454,6 +455,7 @@ public PlanOptimizers(
                                        new PruneOrderByInAggregation(metadata),
                                        new RewriteSpatialPartitioningAggregation(plannerContext),
                                        new SimplifyCountOverConstant(plannerContext),
+                                        new PushFilterIntoTableFunction(plannerContext, typeAnalyzer),


why add it here? and not somewhere else?

for example new PushPredicateIntoTableScan occurs 7 times in this file and intuitiviely it's not clear why table functions should be treated differently from table scans. In fact, one could model table scans as table functions.

Because it is the only set that contains ImplementTableFunctionSource so I thought this is the only place when this is needed. Is this incorrect?

if we follow the analogy between table functions and table scans, in how many places are table scans added into the plan?

I have no idea how to check this :|

findepi · 2023-08-01T11:39:58Z

core/trino-spi/src/main/java/io/trino/spi/function/table/ConnectorTableFunctionHandle.java


 /**
 * An area to store all information necessary to execute the table function, gathered at analysis time
 */
 @Experimental(eta = "2022-10-31")
 public interface ConnectorTableFunctionHandle
 {
+    default Map<String, ColumnHandle> getColumnHandles()


i guess it's not without a design thought that Connector*Handle classes are generally marker interfaces (have no methods)

also, the relation described by the function is known to the engine (TableFunctionAnalysis.returnedType), to the engine doesn't need to ask for it again

please remove this method

findepi · 2023-08-01T11:47:37Z

core/trino-spi/src/main/java/io/trino/spi/function/table/ConnectorTableFunctionHandle.java

+        return Map.of();
+    }
+
+    default boolean supportsPredicatePushdown()


why would this be needed?

ConnectorMetadata.applyFilter(ConnectorSession, ConnectorTableHandle, Constraint) is invoked without asking first whether "pushdown is supported" and this lets the implementation consider whether this particular pushdown is supported. Of course this results in a bit of redundant computation. Redundant computation can be addressed, but for regular table scans is definitely much more important than for table functions (based solely on usage frequency). I don't see a reason to do differently (presumably smarter) for table functions.

please remove this method

findepi · 2023-08-01T11:54:26Z

core/trino-main/src/main/java/io/trino/sql/planner/plan/TableFunctionProcessorNode.java

+        this.enforcedConstraint = null;
+    }
+
+    public TableFunctionProcessorNode(


In this project we have a rule that classes have a single constructor responsible for initialization.
I.e. all other constructors should delegate to one chosen constructor.

(i see that TableScanNode deviates from the rule and should be fixed)

findepi · 2023-08-01T11:55:05Z

core/trino-main/src/main/java/io/trino/sql/planner/plan/TableFunctionProcessorNode.java

@@ -211,6 +268,14 @@ public List<Symbol> getOutputSymbols()
        return symbols.build();
    }

+    @Nullable


I see checkState(enforcedConstraint != null, inside the method.

there were 2 constructors, one assumed this will never be null and second just it to null, i will change it so there is one constructor and the field can be null

findepi · 2023-08-01T11:56:51Z

core/trino-main/src/main/java/io/trino/sql/planner/plan/TableFunctionProcessorNode.java

+        Set<Symbol> partitionBy = specification
+                .map(DataOrganizationSpecification::getPartitionBy)
+                .map(ImmutableSet::copyOf)
+                .orElse(ImmutableSet.of());
+        checkArgument(partitionBy.containsAll(prePartitioned), "all pre-partitioned symbols must be contained in the partitioning list");
+        this.preSorted = preSorted;
+        checkArgument(
+                specification
+                        .flatMap(DataOrganizationSpecification::getOrderingScheme)
+                        .map(OrderingScheme::getOrderBy)
+                        .map(List::size)
+                        .orElse(0) >= preSorted,
+                "the number of pre-sorted symbols cannot be greater than the number of all ordering symbols");
+        checkArgument(preSorted == 0 || partitionBy.equals(prePartitioned), "to specify pre-sorted symbols, it is required that all partitioning symbols are pre-partitioned");


I don't think i would want to maintain two separate copies of this logic in two independent constructors.

findepi · 2023-08-01T12:00:50Z

core/trino-main/src/main/java/io/trino/sql/planner/plan/TableFunctionProcessorNode.java

@@ -70,6 +76,9 @@ public class TableFunctionProcessorNode

    private final TableFunctionHandle handle;

+    @Nullable // null on workers
+    private final TupleDomain<ColumnHandle> enforcedConstraint;


i am not convinced ColumnHandle should be the TupleDomain key here.

TableFunctionAnalysis.returnedType describes the relation in terms of Fields with optional names so they would be positionally discernable (TupleDomain<Integer> ...)

properOutputs field captures the symbols produced by the function, and we could use that ( TupleDomain<Symbol>)

i think this should be TupleDomain<Symbol>

It will be TupleDomain<Integer>

…ndle

martint

We don't want to introduce infrastructure to support pushing down arbitrary operations into table functions. Such an approach will force us into a matrix of pushdown capabilities and require implementations of all the applyXXX for TableHandle and for TableFunctionHandle, and corresponding optimizers.

We already have a mechanism for this. A table function that's at the leaf of the plan (i.e., takes no inputs) and wants to behave as a table and participate in pushdown can implement the applyTableFunction call and then rely on applyXXX for TableHandle.

findepi · 2023-08-10T08:40:00Z

A table function that's at the leaf of the plan (i.e., takes no inputs) and wants to behave as a table and participate in pushdown can implement the applyTableFunction call and then rely on applyXXX for TableHandle.

We definitely didn't want to squeeze everything into TableHandle abstraction (and that have a matrix of pushdown-related ifs and in page sources), that's why initial CDF PTF implementation waited for the PTF execution support. But we definitely can go that route.
That makes leaf TF pretty limited though. Even as simple table functions as sequence() could eventually want to leverage predicate pushdown (eg if join sequence in my query with something and engine can derive some predicates from that). Would that mean practically no leaf TF should be implemented using leaf TF execution support?

homar · 2023-08-10T19:48:15Z

We don't want to introduce infrastructure to support pushing down arbitrary operations into table functions. Such an approach will force us into a matrix of pushdown capabilities and require implementations of all the applyXXX for TableHandle and for TableFunctionHandle, and corresponding optimizers.

We already have a mechanism for this. A table function that's at the leaf of the plan (i.e., takes no inputs) and wants to behave as a table and participate in pushdown can implement the applyTableFunction call and then rely on applyXXX for TableHandle.

@martint my approach was suggested by @kasiafi who implemented most of the table functions infrastructure - also it is described in a Polymorphic table functions's follow up section here and no concerns were raised there.

I understand your concern, it is not great to have all those applyXXX functions but I am afraid table functions won't be useful if they aren't performant.
If you have another idea how to improve performance of table functions please share it.

github-actions · 2024-01-15T17:03:15Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

mosabua · 2024-01-15T18:36:44Z

@homar @findepi @martint is this still in progress and discussion?

homar · 2024-01-16T11:27:41Z

@homar @findepi @martint is this still in progress and discussion?

I am afraid that @martint decided that this is not a way to go so there is nothing I can do here. I am closing it

cla-bot bot added the cla-signed label Jun 16, 2023

github-actions bot added delta-lake Delta Lake connector hudi Hudi connector labels Jun 16, 2023

homar force-pushed the homar/add_possiblity_to_implement_predicate_pushdown_in_ptf branch 5 times, most recently from b08f7f2 to dc0bbf0 Compare June 19, 2023 08:25

homar changed the title ~~initial~~ Implement predicate pushdown for table functions Jun 19, 2023

homar marked this pull request as ready for review June 26, 2023 09:56

homar requested a review from alexjo2144 June 26, 2023 12:35

homar force-pushed the homar/add_possiblity_to_implement_predicate_pushdown_in_ptf branch from e76dc84 to 50f910d Compare July 3, 2023 10:48

findepi requested review from martint, ebyhr, kasiafi and findinpath July 18, 2023 08:35

homar force-pushed the homar/add_possiblity_to_implement_predicate_pushdown_in_ptf branch 2 times, most recently from ccdf68c to 287854c Compare August 1, 2023 09:40

findepi reviewed Aug 1, 2023

View reviewed changes

findinpath mentioned this pull request Aug 1, 2023

Optional check for query partition filter for Delta #18345

Merged

marcinsbd mentioned this pull request Aug 2, 2023

Enforce delta.query-partition-filter-required property in Delta Lake table_changes table function #18498

Open

homar added 2 commits August 5, 2023 00:41

Make ConstraintApplicationResult use generic type instead of ColumnHa…

5ded09c

…ndle

Make Constraint use generic type instead of ColumnHandle

4eba5ba

homar force-pushed the homar/add_possiblity_to_implement_predicate_pushdown_in_ptf branch from 287854c to 3dd2d90 Compare August 5, 2023 09:13

github-actions bot added tests:hive iceberg Iceberg connector hive Hive connector labels Aug 5, 2023

github-actions bot added bigquery BigQuery connector mongodb MongoDB connector labels Aug 5, 2023

homar force-pushed the homar/add_possiblity_to_implement_predicate_pushdown_in_ptf branch from 3dd2d90 to aeb2a98 Compare August 5, 2023 19:29

homar requested a review from findepi August 5, 2023 21:00

Implement predicate pushdown for table functions

8af4978

homar force-pushed the homar/add_possiblity_to_implement_predicate_pushdown_in_ptf branch from aeb2a98 to 8af4978 Compare August 6, 2023 21:32

martint requested changes Aug 10, 2023

View reviewed changes

findepi mentioned this pull request Aug 10, 2023

Add table function for generating Iceberg CDC records #15677

Merged

github-actions bot added the stale label Jan 15, 2024

homar closed this Jan 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement predicate pushdown for table functions #17928

Implement predicate pushdown for table functions #17928

homar commented Jun 16, 2023 •

edited

Loading

homar commented Jul 13, 2023

findepi commented Jul 18, 2023

ebyhr commented Jul 31, 2023

findepi left a comment

findepi Aug 1, 2023

homar Aug 1, 2023

findepi Aug 1, 2023

homar Aug 2, 2023

findepi Aug 1, 2023

findepi Aug 1, 2023

findepi Aug 1, 2023

findepi Aug 1, 2023

homar Aug 4, 2023

findepi Aug 1, 2023

findepi Aug 1, 2023

homar Aug 4, 2023

martint left a comment

findepi commented Aug 10, 2023

homar commented Aug 10, 2023

github-actions bot commented Jan 15, 2024

mosabua commented Jan 15, 2024

homar commented Jan 16, 2024

Implement predicate pushdown for table functions #17928

Implement predicate pushdown for table functions #17928

Conversation

homar commented Jun 16, 2023 • edited Loading

Description

Additional context and related issues

Release notes

homar commented Jul 13, 2023

findepi commented Jul 18, 2023

ebyhr commented Jul 31, 2023

findepi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martint left a comment

Choose a reason for hiding this comment

findepi commented Aug 10, 2023

homar commented Aug 10, 2023

github-actions bot commented Jan 15, 2024

mosabua commented Jan 15, 2024

homar commented Jan 16, 2024

homar commented Jun 16, 2023 •

edited

Loading