Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support re-partitioning large data multiple times and each time with a different seed. #7911

Closed
Tracked by #7832
firestarman opened this issue Mar 21, 2023 · 0 comments · Fixed by #7996
Closed
Tracked by #7832
Assignees
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@firestarman
Copy link
Collaborator

firestarman commented Mar 21, 2023

This is a follow-up of #7794 to support the case when the build side batch is really-really-really big.

Currently we are re-partitioning the data 16 ways by default. If the build side is very large, like 64 GiB, then each build said batch would be about 4 GiB in size. This is larger than our target size so it would likely result in us using too much GPU memory to do the computation.
The idea would be that if we do hit a situation where one or more the build side batches are still too large we would try to re-partition them again, but with a different seed. We also would have seen all of the data so we could guess at a better number of partitions to try the second time. (This is really minor because the number of partitions is configurable, but I would like prefer to have us to not crash even in bad situations).

@firestarman firestarman self-assigned this Mar 21, 2023
@sameerz sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants