[FEA] Support a sort merge join as a fallback on the GPU #2354
Labels
duplicate
This issue or pull request already exists
feature request
New feature or request
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
This work should happen after the basic out of core join is implemented that just allows for chunking of the output of an exploding join #20.
Hash joins are always going to be faster than sort merge joins. Because there is no need to sort the data. But sort merge joins are really the only option in some cases when we want to do a join for data that is large on both sides of the join. Also a FullOuter join cannot work unless all of the data for a given key is a part of the current join, and a sort merge join can handle that, where as a hash join cannot unless all of the data for both sides is in memory.
The idea is that we start off trying to do a regular hash join. We read in the build side table until it hits a size limit (probably the target batch size). If the build table is too large then we switch over to doing a sort merge join.
In the case of a FullOuter join we also need to batch the stream side (because there really is no stream side). It would be similar that we would read in the stream side table until we got all of the data or we hit a size limit. If we go over the size limit, then we switch over to a sort merge join.
To switch to a sort merge join we will create a sorting iterator for each side (build side and stream side). We will also need to insert into it any data that we have already read/cached. After that we get a batch from each side, and look at the last row join keys in each batch. We then do a lower bound on both batches on both sets of keys to see how much we can keep to do the join (really it should be the lowest bound for the lowest keys, but instead of trying to compare keys on the CPU we can just let the GPU do it and compare the offsets). We then split the batches based off of the lowest point for either set of keys, save the data that is not ready (everything above that cutoff) so it can be combined with the next batch we read and do the join on the data that is ready (everything below the cutoff).
In some cases we might end up with an empty table on one side or the other. That is OK. We just do the join and go on. If we end up with an empty table on both sides that means we have a key that is large enough it is taking up an entire batch. Just skip the join and continue to read the next batch, and hopefully we have enough memory/rows to make it work.
If we start to run into a lot of cases where we have keys that do not fit in a single batch, we might be able to add in some other optimizations. For all of the joins except FullOuter if we have a full set of keys on the build side we can do the join without waiting for all of the keys on the stream side to be there. But then we have to also make sure that we do not throw out that data on the build side until all of the corresponding keys have been seen on the stream side.
For InnerLike joins we can also do the equivalent of a cross join for a given key. do a join for each batch on the left hand side pared with each batch on the right hand side.
But both of these optimizations are complicated and hopefully unnecessary.
The text was updated successfully, but these errors were encountered: