Pushing down reducing projections in a more generic manner #567

phd3 · 2019-03-29T23:06:08Z

Several optimizations focus on achieving this general goal for specific node patterns:

Let’s say there is a planNode A, with a list of outputSymbols L1. A.getSources() returns a single-element list B. Say B’s list of outputSymbols is L2. Optimizers identify that there is “reduction in datatypes” going from L2 to L1. For example, losing unreferenced columns, losing unused fields from nested structures etc.
Then the “reduction” is pushed down as much as possible in order to get rid of tossing around unnecessary data early.

Rather than implementing separate rules in an ad hoc manner for every such optimization, it may be worth considering the possibility of implementing something generic that looks for such reductions and pushes them down.

The reductions that I can think of are:

child’s output symbol not used by the parent (implemented in PruneUnreferencedOutputs)
All implementation of ProjectOffPushDownRule (Removing symbols from the child’s output columns)
Subscript pushdown efforts: Initial refactoring of pushdown subscripts rule prestodb/presto#12534
Efforts in [WIP] Pushdown dereference expression to Parquet reader #187 to push down dereferences
Possible unnest optimizations to extract only required fields and subfields from a deeply nested structure (maps, arrays) while unnesting, subscripting, dereferencing
substr, regexp_extract, slice (of an array), functions that return a boolean (e.g. like, contains), most functions that return a numeric type (e.g. length of a string), etc. (We need to be a bit careful here, since there can be a bigger cputime price to pay to execute cpu intensive udfs)

As described above, many of the current optimizations and possible future optimizations can be thought of as specific implementations of the general idea of “reduction” optimization.

It can be worth considering the idea of implementing a generic mechanism that can be extended to detect such “reductions” and then push them further down. That way the implementation of specific projection pushdowns can be more systematic. @martint Do you have any thoughts on this?

martint · 2019-04-09T06:12:31Z

I think the biggest challenge is how to pull this off without ending up with a monolithic optimization that needs to understand every possible node type. In the current design of the optimizer (Rules + PlanNodes), the decoupling of the hierarchy of nodes from how optimizations are structured makes it possible to (in the future) support an extensible IR and optimization rules supplied by connectors.

Also, as we continue evolving the optimizer to support a Memo that 1) can represent multiple simultaneous plans and 2) can reason about cost intrinsically, there is a benefit in optimization rules being more granular. The optimizer can selectively apply some but not others depending on whether the transformations are productive.

That's not to say we can't come up with "utility" functions/classes/libraries that make it easier to implement the optimizations you described above.

phd3 · 2019-04-15T19:17:53Z

I see. I understand the point of keeping things granular for selective application of transformation. In that case, I believe we could somehow keep track of reducing transformations (using utility functions) while creating the plan tree, and then implement a common Rule<? extends PlanNode> with different captures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pushing down reducing projections in a more generic manner #567

Pushing down reducing projections in a more generic manner #567

phd3 commented Mar 29, 2019

martint commented Apr 9, 2019

phd3 commented Apr 15, 2019

Pushing down reducing projections in a more generic manner #567

Pushing down reducing projections in a more generic manner #567

Comments

phd3 commented Mar 29, 2019

martint commented Apr 9, 2019

phd3 commented Apr 15, 2019