-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Expose size estimation for join #8237
Comments
What I'd like to see here is 3 new APIs for the
Then for each of the existing join APIs, add a
If Like Bobby said, I want to use this opportunity to just get rid of the size estimation logic all together. We should just always compute an exact value, whether the user is requesting it explicitly via |
CC @magnatelee I recall long ago you were asking for the same thing. |
Addresses #8237 This PR adds 3 join size APIs (`hash_join::inner_join_size`, `hash_join::left_join_size` and `hash_join::full_join_size`) into `hash_join` class, one for each type of join that returns the exact number of matches with the specified probe table. It completely removed the deprecated size estimation logic in the current implementation. Also, this PR updates the existing join APIs by adding an optional `output_size` as an argument. If `output_size.has_value()`, we take that value directly for further computation. Otherwise, the target join will internally invoke its corresponding size function. `TODO`: the current `full_join_size` uses a 2-step algorithm similar to what's used in `hash_join::full_join`. It duplicates certain computations with `full_join` also thus should be refactored during `cuco` integration. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Robert Maynard (https://github.com/robertmaynard) - Robert (Bobby) Evans (https://github.com/revans2) URL: #8453
@PointKernel based on the discussion, it looks like this was fully addressed by #8453 . Can this be closed? |
Thanks for the reminder. Let me close this. |
Is your feature request related to a problem? Please describe.
We have started to work on joins that are too large to fit in GPU memory. The first step in this was to get back the gather maps from the join instead of materializing the results directly, and then be able to output the results in batches that are the appropriate size. For the most part this is working really well (All of the NDS queries except for 1 we are able to run at scale factor 3000). For query 72, however, it is not great. In a typical run we are seeing a join that really explodes. When joining a 600MB table by a 400MB table we see the join try to allocate over 260GB just for the gather map. (The row count on average will increase 2,000x).
We run into issues in part because this alloc request is so large that we end up spilling a lot of data to host memory, or even disk, when we could avoid doing it if we knew the join would explode, so we could split the input batches up.
Part of the problem is that this is an equi-join with an inequality join as well. So the 2,000x increase in size is before filtering. After the filter it is only a 200x explosion in size. So the AST work at #8214 will be a huge win for us. But even then on this join the gather map would be 26GB, which is larger than can fit on a typical 16GB GPU.
Describe the solution you'd like
We would like to be able to expose an API in
cudf::hash_join
that can return an accurate size estimate similar tocudf::detail::estimate_join_output_size
. Talking to @jrhemstad he said that we might want to have the output size be exact instead of the estimate it is now.Then also have the join APIs in
cudf::hash_join
optionally take in the output of the size API so they don't have to re-compute it in the common case.This way we can get the output size of the join ahead of time and depending on how large it is decide to go ahead with the join, with no performance penalty, or if the gather map is larger than we have a memory budget for, we can split up the incoming batches into smaller pieces, and we can use the output size of the join to estimate how big each batch should be.
Describe alternatives you've considered
Catch the out of memory error when it cannot allocate the gather map and do size estimation ourselves by doing a count for each key, followed by min, max, and average on those keys to inform us on how big to split each incoming batch.
The text was updated successfully, but these errors were encountered: