You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The high level idea is that if an expression can be partially evaluated during planning time then
The execution time will be increased
There may be additional optimizations possible (like removing entire LogicalPlan nodes, for example)
I recently saw the following selection expression created (by the [predicate push down|https://github.com/apache/arrow/pull/7880])
{code}
Selection: #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1) And #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1)
TableScan: test projection=None
{code}
This could be simplified significantly:
Duplicate clauses could be removed (e.g. #a Eq Int64(1) And #a Eq Int64(1) --> #a Eq Int64(1))
Algebraic simplification (e.g. if A<=B and A=5, is the same as A=5)
Comment from Jorge Leitão(jorgecarleitao) @ 2020-08-23T21:51:08.856+0000:
That is a cool idea. It seems to be a case of algebraic simplification. Maybe there is already a rust library that can do that work for us?
Comment from Remi Dettai(rdettai) @ 2020-12-08T14:11:11.412+0000:
This could also be used to apply filter pushdown into the catalog, if the combination (and) of:
- the file/partition statistics ({{col( x ) > min and col( x ) < max}})
- the filter expression
can be simplified to {{false}}, their is no need to read that file/partition. Isn't that wonderful ? :)
Note: there is an implementation of this folding for the C++ dataset API: [https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/filter.h]
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-9770
The high level idea is that if an expression can be partially evaluated during planning time then
The execution time will be increased
There may be additional optimizations possible (like removing entire LogicalPlan nodes, for example)
I recently saw the following selection expression created (by the [predicate push down|https://github.com/apache/arrow/pull/7880])
{code}
Selection: #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1) And #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1)
TableScan: test projection=None
{code}
This could be simplified significantly:
#a Eq Int64(1) And #a Eq Int64(1)
-->#a Eq Int64(1)
)A<=B and A=5
, is the same asA=5
)Inspiration can be taken from the postgres code that evaluates constant expressions https://doxygen.postgresql.org/clauses_8c.html#ac91c4055a7eb3aa6f1bc104479464b28
(in this case, for example if you have a predicate A=5 then you can basically substitute in A=5 for any expression higher up in the the plan).
Other classic optimizations include things such as
A OR TRUE
-->A
,A AND TRUE
--> TRUE, etc.The text was updated successfully, but these errors were encountered: