-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17483: [Python] Support Expression filters in non-legacy ParquetDataset/read_table #14011
ARROW-17483: [Python] Support Expression filters in non-legacy ParquetDataset/read_table #14011
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
As expressions get supported with this PR I guess expressions with nested fields also work? Could we add that in the tests, something like:
integer_keys = [0, 1, 2, 3, 4]
df = pd.DataFrame({
'index': np.arange(len(integer_keys)),
'integers': np.array(integer_keys, dtype='i4'),
'nested': np.array([{'a': j % 3, 'b': str(j % 3)} for j in range(5)])
}, columns=['index', 'integers', 'nested'])
and add pc.field("nested", "b") == 1
to the fixture?
I haven't tested it, but am curious if this works.
Thanks @AlenkaF! |
This PR tries to redo the work from #9799. It will unblock: - https://issues.apache.org/jira/browse/ARROW-13798 - https://issues.apache.org/jira/browse/ARROW-14596 cc @jorisvandenbossche @pitrou Closes #12863 from AlenkaF/ARROW-11259 Lead-authored-by: Alenka Frim <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
@jorisvandenbossche, as you're the one who pointed me to this issue, would you like to take a gander? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Some minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Benchmark runs are scheduled for baseline = ef8cb09 and contender = 8ceb3b8. 8ceb3b8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…tDataset/read_table (apache#14011) Authored-by: Miles Granger <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…tDataset/read_table (apache#14011) Authored-by: Miles Granger <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
No description provided.