[FEA] Split stream side join batches into smaller sizes if the join explodes #2408
Labels
task
Work required that improves the product but is not user facing
Milestone
We have started to work on joins that are too large to fit in GPU memory. The first step in this was to get back the gather maps from the join instead of materializing the results directly, and then be able to output the results in batches that are the appropriate size. For the most part this is working really well (All of the NDS queries except for 1 we are able to run at scale factor 3000). For query 72, however, it is not great. In a typical run we are seeing a join that really explodes. When joining a 600MB table by a 400MB table we see the join try to allocate over 260GB just for the gather map. (The row count on average will increase 2,000x).
We run into issues in part because this alloc request is so large that we end up spilling a lot of data to host memory, or even disk, when we could avoid doing it if we knew the join would explode, so we could split the input batches up.
Part of the problem is that this is an equi-join with an inequality join as well. So the 2,000x increase in size is before filtering. After the filter it is only a 200x explosion in size. So the AST work for Joins in CUDF will be a huge win for us. But even then the gather map for this join would be 26GB, which is larger than can fit on a typical 16GB GPU.
The idea is to take rapidsai/cudf#8237 and check the size against a budget for the gather map. If the size exceeds that budget, then we use the result to be able to cut the input batches down into smaller pieces that should fit.
Note that for Left-semi and Left-anti joins the gather map will never be larger than the number of rows in the left table, and they will always build on the right side. So with that we already know our limit, it should be reasonable, so this feature does not apply here.
For full outer joins we need complete sets of keys for both sides. With that we would need to do something more complicated. Probably starting with a sort merge join #2354 and then if we have a customer where the join is still exploding we would need to slice it up into smaller chunks that honor key boundaries.
The text was updated successfully, but these errors were encountered: