You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#1102 adds dynamic partition pruning functionality. While working on this, I noticed several features that could be used to enhance this optimization rule that are outside of the original intended scope of the project. I think DPP could benefit by expanding to include these cases in the future.
Currently, we only check a join's on conditions, but we should also try to check and make use of join filters
Right now, we only use DPP for joins between 2 columns. However, it would also be possible to run DPP for joins between binary expressions, e.g. WHERE col1 + 10 = col2 + 20
In a similar vein, we should expand the get_filtered_fields function to be able to handle more complex binary expressions than it currently does
Allow the fact_dimension_ratio and possibly other parameters to be specified by the user
Be careful if there's more than 1 scan of the same table
Modify the c.explain() function to cut off large strings of INLIST vectors
Currently, we can only use DPP with local Parquet files, and we assume a Parquet table is formatted as table_name/*.parquet. Ideally, we should have logic handling remote files (i.e., adding checks to not apply DPP for remote files), folders of subfolders with Parquet files (like Hive partitioning), and other format types like CSV, etc.
In the satisfies_int64 function, if we match a Utf8, we should add logic to check if the string can be converted to a timestamp.
In addition, we should add some DPP tests, including:
Rust functionality tests
DPP functionality PyTests
DPP config PyTests
The text was updated successfully, but these errors were encountered:
#1102 adds dynamic partition pruning functionality. While working on this, I noticed several features that could be used to enhance this optimization rule that are outside of the original intended scope of the project. I think DPP could benefit by expanding to include these cases in the future.
on
conditions, but we should also try to check and make use of joinfilters
WHERE col1 + 10 = col2 + 20
get_filtered_fields
function to be able to handle more complex binary expressions than it currently doesfact_dimension_ratio
and possibly other parameters to be specified by the userc.explain()
function to cut off large strings of INLIST vectorstable_name/*.parquet
. Ideally, we should have logic handling remote files (i.e., adding checks to not apply DPP for remote files), folders of subfolders with Parquet files (like Hive partitioning), and other format types like CSV, etc.satisfies_int64
function, if we match a Utf8, we should add logic to check if the string can be converted to a timestamp.In addition, we should add some DPP tests, including:
The text was updated successfully, but these errors were encountered: