Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support a sort merge join as a fallback on the GPU #2354

Closed
revans2 opened this issue May 6, 2021 · 2 comments
Closed

[FEA] Support a sort merge join as a fallback on the GPU #2354

revans2 opened this issue May 6, 2021 · 2 comments
Labels
duplicate This issue or pull request already exists feature request New feature or request reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented May 6, 2021

This work should happen after the basic out of core join is implemented that just allows for chunking of the output of an exploding join #20.

Hash joins are always going to be faster than sort merge joins. Because there is no need to sort the data. But sort merge joins are really the only option in some cases when we want to do a join for data that is large on both sides of the join. Also a FullOuter join cannot work unless all of the data for a given key is a part of the current join, and a sort merge join can handle that, where as a hash join cannot unless all of the data for both sides is in memory.

The idea is that we start off trying to do a regular hash join. We read in the build side table until it hits a size limit (probably the target batch size). If the build table is too large then we switch over to doing a sort merge join.

In the case of a FullOuter join we also need to batch the stream side (because there really is no stream side). It would be similar that we would read in the stream side table until we got all of the data or we hit a size limit. If we go over the size limit, then we switch over to a sort merge join.

To switch to a sort merge join we will create a sorting iterator for each side (build side and stream side). We will also need to insert into it any data that we have already read/cached. After that we get a batch from each side, and look at the last row join keys in each batch. We then do a lower bound on both batches on both sets of keys to see how much we can keep to do the join (really it should be the lowest bound for the lowest keys, but instead of trying to compare keys on the CPU we can just let the GPU do it and compare the offsets). We then split the batches based off of the lowest point for either set of keys, save the data that is not ready (everything above that cutoff) so it can be combined with the next batch we read and do the join on the data that is ready (everything below the cutoff).

In some cases we might end up with an empty table on one side or the other. That is OK. We just do the join and go on. If we end up with an empty table on both sides that means we have a key that is large enough it is taking up an entire batch. Just skip the join and continue to read the next batch, and hopefully we have enough memory/rows to make it work.

If we start to run into a lot of cases where we have keys that do not fit in a single batch, we might be able to add in some other optimizations. For all of the joins except FullOuter if we have a full set of keys on the build side we can do the join without waiting for all of the keys on the stream side to be there. But then we have to also make sure that we do not throw out that data on the build side until all of the corresponding keys have been seen on the stream side.

For InnerLike joins we can also do the equivalent of a cross join for a given key. do a join for each batch on the left hand side pared with each batch on the right hand side.

But both of these optimizations are complicated and hopefully unnecessary.

@revans2
Copy link
Collaborator Author

revans2 commented Apr 8, 2022

We could also re-hash the data instead of sorting it. The question really is up to how much work/effort is it to re-hash the input data vs sorting the input data.

@mattahrens
Copy link
Collaborator

Replaced by #7471

@sameerz sameerz added the duplicate This issue or pull request already exists label Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists feature request New feature or request reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

3 participants