Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add row-wise filtering step to
read_parquet
#13334Add row-wise filtering step to
read_parquet
#13334Changes from 5 commits
6488c5e
6932869
4b6c52c
dcddab0
a569982
3e8bb8f
f833445
b76bc93
81fc31f
5c33b86
2f9ea8f
e4ff979
d8f0d15
adc4358
14af140
9d747c7
2193bac
fc8b5dd
fef329e
1e8e811
ac21f1b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I propose a rewrite like this, which I think separates the handler logic from the conjunction/disjunction a little more clearly? WDYT?
(Probably needs some minor modifications for py 3.9 compat (with the type-annotations)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reduction over the disjunctions could be merged in (so no looping) but I think it's a little less readable (would be something like):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I revised the code to look more like your example (there were some minor bugs, and I could't quite get the type annotations right yet - so left them out). Wouldn't mind going with the loop-free code, but didn't get a chance to try it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully sure this is correct handling (my suggestion might also have been wrong).
AIUI, these are possible inputs as
filters
:(A, op, B)
=>(A op B)
[(A, op, B)]
=>(A op B)
[(A, op, B), (C, op, D)]
=>(A op B) v (C op D)
[[(A, op, B), (C, op, D)], (E, op, F)]
=>((A op B) ^ (C op D)) v (E op F)
[[(A, op, B), (C, op, D)], [(E, op, F), (G, op H)]]
=>((A op B) ^ (C op D)) v ((E op F) ^ (G op H))
So the input type is
tuple | list[tuple | list[tuple]]
But this code only handles
tuple | list[list[tuple]]
.TBF, my code only handled
list[tuple | list[tuple]]
.To rephrase, who should do the sanitisation of the
filters
argument to this function? It would be much easier if, by the time we got here, we always just hadlist[list[tuple]]
. That sanitisation could either be pushed up toread_parquet
or else here but a little bit earlier, so we would say something like: