-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for filter pushdown rule #924
Add support for filter pushdown rule #924
Conversation
…ers from TableScan struct
…filter-pushdown-rule
Codecov Report
@@ Coverage Diff @@
## main #924 +/- ##
==========================================
+ Coverage 75.81% 76.04% +0.22%
==========================================
Files 73 73
Lines 4065 4082 +17
Branches 737 739 +2
==========================================
+ Hits 3082 3104 +22
+ Misses 823 814 -9
- Partials 160 164 +4
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…k-sql into support-filter-pushdown-rule
…filter-pushdown-rule
[ | ||
[("a", "==", 1), ("b", "<", 10)], | ||
[("a", "==", 1), ("b", ">", 5)], | ||
[("b", ">", 5), ("b", "<", 10)], | ||
[("a", "==", 1)], | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version of datafusion applies some conversions that convert filters to a cnf like format. So the dnf here
(b > 5 AND b < 10) OR a = 1
gets remapped to
(b>5 or a=1) and (b<10 or a=1)
by datafusion which becomes this is dnf
(b>5 and a=1) or (b<10 and a=1) or (b>5 and b<10) or (a=1)
.
I don't know if there's an easy way to simplify these predicate expressions but some cases might lead to larger number of redundant filters because of this.
@@ -11,7 +11,6 @@ | |||
14, | |||
16, | |||
18, | |||
21, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into both these queries, for both it seems like they previously failed with metadata inference errors trying to do a comparison between a datetime64 and string object.
I'll open up another issue to investigate but in the meanwhile pushing those filters down to the io seems to fix the queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging into this before it seemed like the underlying issue was that we were at some point casting a column to Date32
and trying to compare to a Date32
scalar, but their dtypes didn't match up - not sure if that's helpful to make a simple reproducer for this
rerun tests |
This PR enables using the
filter_pushdown_rule
from datafusion and allows passing filters down to the tablescan operation.