Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split batch by key for window operations #2458

Merged
merged 19 commits into from
Jun 3, 2021

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented May 20, 2021

This fixes #1856

It makes it so that window operations do not need a RequireSingleBatch for anything except when there is no partitioning. This allows us to do window operations where there are a lot of keys, but more data than fits on the GPU. I have not updated the tests yet. I'll post that shortly, but I have done manual testing on very large data sets and it has essentially no performance impact on queries that do fit in GPU memory.

@revans2 revans2 added the feature request New feature or request label May 20, 2021
@revans2 revans2 added this to the May 10 - May 21 milestone May 20, 2021
@revans2 revans2 self-assigned this May 20, 2021
@revans2 revans2 marked this pull request as draft May 21, 2021 12:16
@revans2
Copy link
Collaborator Author

revans2 commented May 21, 2021

Switched to draft. I found a bug in cudf while adding the tests and I want to fix it before adding the tests. I also have a new idea on how to get the framework to insert the chunking only when needed and I want to work on that too.

@revans2
Copy link
Collaborator Author

revans2 commented May 21, 2021

I added tests and fixed some issues, including #2473

But if found a bug in cudf as a part of the tests so this is also blocked by rapidsai/cudf#8314

@revans2
Copy link
Collaborator Author

revans2 commented May 21, 2021

I am going to move the test fixes to another PR, and then move this to the next release, and I don't think the benefit is worth the added risk right now.

@revans2 revans2 removed this from the May 10 - May 21 milestone May 21, 2021
Signed-off-by: Robert (Bobby) Evans <[email protected]>
@revans2
Copy link
Collaborator Author

revans2 commented May 25, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented May 25, 2021

This is actually waiting for a new branch because it is a bit too risky for 21.06

@revans2 revans2 changed the base branch from branch-21.06 to branch-21.08 June 1, 2021 17:46
@revans2 revans2 marked this pull request as ready for review June 1, 2021 17:47
@revans2
Copy link
Collaborator Author

revans2 commented Jun 1, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Jun 2, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Jun 3, 2021

build

@revans2
Copy link
Collaborator Author

revans2 commented Jun 3, 2021

@jlowe I have addressed your comments please take another look

jlowe
jlowe previously approved these changes Jun 3, 2021
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a minor nit that's fine to leave, lgtm.

@revans2
Copy link
Collaborator Author

revans2 commented Jun 3, 2021

build

@revans2 revans2 merged commit e6543ef into NVIDIA:branch-21.08 Jun 3, 2021
@revans2 revans2 deleted the split_batch_by_key branch June 3, 2021 21:15
tgravescs pushed a commit that referenced this pull request Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Create a batch chunking iterator and integrate it with GpuWindowExec
2 participants