Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

dask-contrib / dask-sql Public

Notifications You must be signed in to change notification settings
Fork 72
Star 397

Code
Issues 214
Pull requests 21
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[ENH] Dynamic partition pruning improvements #1121

Open

4 of 11 tasks

sarahyurick opened this issue Apr 21, 2023 · 0 comments

Open

4 of 11 tasks

[ENH] Dynamic partition pruning improvements #1121

sarahyurick opened this issue Apr 21, 2023 · 0 comments

Labels

New feature or request

Awaiting triage by a dask-sql maintainer

Comments

Copy link

Collaborator

sarahyurick commented Apr 21, 2023 •

edited

Loading

#1102 adds dynamic partition pruning functionality. While working on this, I noticed several features that could be used to enhance this optimization rule that are outside of the original intended scope of the project. I think DPP could benefit by expanding to include these cases in the future.

Currently, we only check a join's on conditions, but we should also try to check and make use of join filters
Right now, we only use DPP for joins between 2 columns. However, it would also be possible to run DPP for joins between binary expressions, e.g. WHERE col1 + 10 = col2 + 20
In a similar vein, we should expand the get_filtered_fields function to be able to handle more complex binary expressions than it currently does
Allow the fact_dimension_ratio and possibly other parameters to be specified by the user
Be careful if there's more than 1 scan of the same table
Modify the c.explain() function to cut off large strings of INLIST vectors
Currently, we can only use DPP with local Parquet files, and we assume a Parquet table is formatted as table_name/*.parquet. Ideally, we should have logic handling remote files (i.e., adding checks to not apply DPP for remote files), folders of subfolders with Parquet files (like Hive partitioning), and other format types like CSV, etc.
In the satisfies_int64 function, if we match a Utf8, we should add logic to check if the string can be converted to a timestamp.

In addition, we should add some DPP tests, including:

Rust functionality tests
DPP functionality PyTests
DPP config PyTests

The text was updated successfully, but these errors were encountered:

All reactions

sarahyurick added enhancement

New feature or request

needs triage Awaiting triage by a dask-sql maintainer labels

sarahyurick mentioned this issue

TableScan filters to PyArrow DNF format #1130

Merged

sarahyurick mentioned this issue

Rust optimizer improvements #1199

Merged

5 tasks

sarahyurick mentioned this issue

[REVIEW] Bump Arrow DataFusion Python dependency to 28.0.0 #1181

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

No one assigned

Labels

New feature or request

Awaiting triage by a dask-sql maintainer

Projects

None yet

Milestone

No milestone

Development

No branches or pull requests

1 participant

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.