Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushing down reducing projections in a more generic manner #567

Open
phd3 opened this issue Mar 29, 2019 · 2 comments
Open

Pushing down reducing projections in a more generic manner #567

phd3 opened this issue Mar 29, 2019 · 2 comments

Comments

@phd3
Copy link
Member

phd3 commented Mar 29, 2019

Several optimizations focus on achieving this general goal for specific node patterns:

  • Let’s say there is a planNode A, with a list of outputSymbols L1. A.getSources() returns a single-element list B. Say B’s list of outputSymbols is L2. Optimizers identify that there is “reduction in datatypes” going from L2 to L1. For example, losing unreferenced columns, losing unused fields from nested structures etc.
  • Then the “reduction” is pushed down as much as possible in order to get rid of tossing around unnecessary data early.

Rather than implementing separate rules in an ad hoc manner for every such optimization, it may be worth considering the possibility of implementing something generic that looks for such reductions and pushes them down.

The reductions that I can think of are:

  • child’s output symbol not used by the parent (implemented in PruneUnreferencedOutputs)
  • All implementation of ProjectOffPushDownRule (Removing symbols from the child’s output columns)
  • Subscript pushdown efforts: Initial refactoring of pushdown subscripts rule prestodb/presto#12534
  • Efforts in [WIP] Pushdown dereference expression to Parquet reader #187 to push down dereferences
  • Possible unnest optimizations to extract only required fields and subfields from a deeply nested structure (maps, arrays) while unnesting, subscripting, dereferencing
  • substr, regexp_extract, slice (of an array), functions that return a boolean (e.g. like, contains), most functions that return a numeric type (e.g. length of a string), etc. (We need to be a bit careful here, since there can be a bigger cputime price to pay to execute cpu intensive udfs)

As described above, many of the current optimizations and possible future optimizations can be thought of as specific implementations of the general idea of “reduction” optimization.

It can be worth considering the idea of implementing a generic mechanism that can be extended to detect such “reductions” and then push them further down. That way the implementation of specific projection pushdowns can be more systematic. @martint Do you have any thoughts on this?

@martint
Copy link
Member

martint commented Apr 9, 2019

I think the biggest challenge is how to pull this off without ending up with a monolithic optimization that needs to understand every possible node type. In the current design of the optimizer (Rules + PlanNodes), the decoupling of the hierarchy of nodes from how optimizations are structured makes it possible to (in the future) support an extensible IR and optimization rules supplied by connectors.

Also, as we continue evolving the optimizer to support a Memo that 1) can represent multiple simultaneous plans and 2) can reason about cost intrinsically, there is a benefit in optimization rules being more granular. The optimizer can selectively apply some but not others depending on whether the transformations are productive.

That's not to say we can't come up with "utility" functions/classes/libraries that make it easier to implement the optimizations you described above.

@phd3
Copy link
Member Author

phd3 commented Apr 15, 2019

I see. I understand the point of keeping things granular for selective application of transformation. In that case, I believe we could somehow keep track of reducing transformations (using utility functions) while creating the plan tree, and then implement a common Rule<? extends PlanNode> with different captures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants