-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17463: [R] Avoid unnecessary projections #13954
ARROW-17463: [R] Avoid unnecessary projections #13954
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of needs_projection()
seems sound to me and the coverage of cases you noticed that added an extra projection seems excellent. Our existing coverage for datasets is extensive and I'm not worried this will break anything that isn't covered by CI somehow!
CI failures are unrelated (one fixed by #13952, the other is the rtools35 build). Merging. |
Benchmark runs are scheduled for baseline = 1b9c57e and contender = 80bba29. 80bba29 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
Before: ``` > mtcars |> arrow_table() |> count(cyl) |> explain() ExecPlan with 6 nodes: 5:SinkNode{} 4:ProjectNode{projection=[cyl, n]} 3:ProjectNode{projection=[cyl, n]} 2:GroupByNode{keys=["cyl"], aggregates=[ hash_sum(n, {skip_nulls=true, min_count=1}), ]} 1:ProjectNode{projection=["n": 1, cyl]} 0:TableSourceNode{} ``` After: ``` ExecPlan with 5 nodes: 4:SinkNode{} 3:ProjectNode{projection=[cyl, n]} 2:GroupByNode{keys=["cyl"], aggregates=[ hash_sum(n, {skip_nulls=true, min_count=1}), ]} 1:ProjectNode{projection=["n": 1, cyl]} 0:TableSourceNode{} ``` Authored-by: Neal Richardson <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
Before: ``` > mtcars |> arrow_table() |> count(cyl) |> explain() ExecPlan with 6 nodes: 5:SinkNode{} 4:ProjectNode{projection=[cyl, n]} 3:ProjectNode{projection=[cyl, n]} 2:GroupByNode{keys=["cyl"], aggregates=[ hash_sum(n, {skip_nulls=true, min_count=1}), ]} 1:ProjectNode{projection=["n": 1, cyl]} 0:TableSourceNode{} ``` After: ``` ExecPlan with 5 nodes: 4:SinkNode{} 3:ProjectNode{projection=[cyl, n]} 2:GroupByNode{keys=["cyl"], aggregates=[ hash_sum(n, {skip_nulls=true, min_count=1}), ]} 1:ProjectNode{projection=["n": 1, cyl]} 0:TableSourceNode{} ``` Authored-by: Neal Richardson <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
Before: ``` > mtcars |> arrow_table() |> count(cyl) |> explain() ExecPlan with 6 nodes: 5:SinkNode{} 4:ProjectNode{projection=[cyl, n]} 3:ProjectNode{projection=[cyl, n]} 2:GroupByNode{keys=["cyl"], aggregates=[ hash_sum(n, {skip_nulls=true, min_count=1}), ]} 1:ProjectNode{projection=["n": 1, cyl]} 0:TableSourceNode{} ``` After: ``` ExecPlan with 5 nodes: 4:SinkNode{} 3:ProjectNode{projection=[cyl, n]} 2:GroupByNode{keys=["cyl"], aggregates=[ hash_sum(n, {skip_nulls=true, min_count=1}), ]} 1:ProjectNode{projection=["n": 1, cyl]} 0:TableSourceNode{} ``` Authored-by: Neal Richardson <[email protected]> Signed-off-by: Neal Richardson <[email protected]>
Before:
After: