You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several optimizations focus on achieving this general goal for specific node patterns:
Let’s say there is a planNode A, with a list of outputSymbols L1. A.getSources() returns a single-element list B. Say B’s list of outputSymbols is L2. Optimizers identify that there is “reduction in datatypes” going from L2 to L1. For example, losing unreferenced columns, losing unused fields from nested structures etc.
Then the “reduction” is pushed down as much as possible in order to get rid of tossing around unnecessary data early.
Rather than implementing separate rules in an ad hoc manner for every such optimization, it may be worth considering the possibility of implementing something generic that looks for such reductions and pushes them down.
The reductions that I can think of are:
child’s output symbol not used by the parent (implemented in PruneUnreferencedOutputs)
All implementation of ProjectOffPushDownRule (Removing symbols from the child’s output columns)
Possible unnest optimizations to extract only required fields and subfields from a deeply nested structure (maps, arrays) while unnesting, subscripting, dereferencing
substr, regexp_extract, slice (of an array), functions that return a boolean (e.g. like, contains), most functions that return a numeric type (e.g. length of a string), etc. (We need to be a bit careful here, since there can be a bigger cputime price to pay to execute cpu intensive udfs)
As described above, many of the current optimizations and possible future optimizations can be thought of as specific implementations of the general idea of “reduction” optimization.
It can be worth considering the idea of implementing a generic mechanism that can be extended to detect such “reductions” and then push them further down. That way the implementation of specific projection pushdowns can be more systematic. @martint Do you have any thoughts on this?
The text was updated successfully, but these errors were encountered:
I think the biggest challenge is how to pull this off without ending up with a monolithic optimization that needs to understand every possible node type. In the current design of the optimizer (Rules + PlanNodes), the decoupling of the hierarchy of nodes from how optimizations are structured makes it possible to (in the future) support an extensible IR and optimization rules supplied by connectors.
Also, as we continue evolving the optimizer to support a Memo that 1) can represent multiple simultaneous plans and 2) can reason about cost intrinsically, there is a benefit in optimization rules being more granular. The optimizer can selectively apply some but not others depending on whether the transformations are productive.
That's not to say we can't come up with "utility" functions/classes/libraries that make it easier to implement the optimizations you described above.
I see. I understand the point of keeping things granular for selective application of transformation. In that case, I believe we could somehow keep track of reducing transformations (using utility functions) while creating the plan tree, and then implement a common Rule<? extends PlanNode> with different captures.
Several optimizations focus on achieving this general goal for specific node patterns:
Rather than implementing separate rules in an ad hoc manner for every such optimization, it may be worth considering the possibility of implementing something generic that looks for such reductions and pushes them down.
The reductions that I can think of are:
As described above, many of the current optimizations and possible future optimizations can be thought of as specific implementations of the general idea of “reduction” optimization.
It can be worth considering the idea of implementing a generic mechanism that can be extended to detect such “reductions” and then push them further down. That way the implementation of specific projection pushdowns can be more systematic. @martint Do you have any thoughts on this?
The text was updated successfully, but these errors were encountered: