[FEA] Expose size estimation for join #8237

revans2 · 2021-05-13T16:48:39Z

Is your feature request related to a problem? Please describe.
We have started to work on joins that are too large to fit in GPU memory. The first step in this was to get back the gather maps from the join instead of materializing the results directly, and then be able to output the results in batches that are the appropriate size. For the most part this is working really well (All of the NDS queries except for 1 we are able to run at scale factor 3000). For query 72, however, it is not great. In a typical run we are seeing a join that really explodes. When joining a 600MB table by a 400MB table we see the join try to allocate over 260GB just for the gather map. (The row count on average will increase 2,000x).

We run into issues in part because this alloc request is so large that we end up spilling a lot of data to host memory, or even disk, when we could avoid doing it if we knew the join would explode, so we could split the input batches up.

Part of the problem is that this is an equi-join with an inequality join as well. So the 2,000x increase in size is before filtering. After the filter it is only a 200x explosion in size. So the AST work at #8214 will be a huge win for us. But even then on this join the gather map would be 26GB, which is larger than can fit on a typical 16GB GPU.

Describe the solution you'd like
We would like to be able to expose an API in cudf::hash_join that can return an accurate size estimate similar to cudf::detail::estimate_join_output_size. Talking to @jrhemstad he said that we might want to have the output size be exact instead of the estimate it is now.

Then also have the join APIs in cudf::hash_join optionally take in the output of the size API so they don't have to re-compute it in the common case.

This way we can get the output size of the join ahead of time and depending on how large it is decide to go ahead with the join, with no performance penalty, or if the gather map is larger than we have a memory budget for, we can split up the incoming batches into smaller pieces, and we can use the output size of the join to estimate how big each batch should be.

Describe alternatives you've considered
Catch the out of memory error when it cannot allocate the gather map and do size estimation ourselves by doing a count for each key, followed by min, max, and average on those keys to inform us on how big to split each incoming batch.

The text was updated successfully, but these errors were encountered:

jrhemstad · 2021-05-13T16:54:41Z

What I'd like to see here is 3 new APIs for the hash_join class, one for each type of join that returns the exact number of matches with the specified probe table, e.g.:

std::size_t inner_join_size(cudf::table_view const& probe, null_equality compare_nulls = null_equality::EQUAL, rmm::cuda_stream_view stream = rmm::cuda_stream_default,

Then for each of the existing join APIs, add a std::optional<std::size_t> to allow the user to specify the output size:

  std::pair<std::unique_ptr<rmm::device_uvector<size_type>>,
            std::unique_ptr<rmm::device_uvector<size_type>>>
  inner_join(cudf::table_view const& probe,
             null_equality compare_nulls         = null_equality::EQUAL,
             std::optional<std::size_t> output_size = {},
             rmm::cuda_stream_view stream        = rmm::cuda_stream_default,
             rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource()) const;

If output_size.has_value(), we'll use that value for the output size. It is undefined behavior if output_size is smaller than the actual output size. Otherwise, we'll compute it.

Like Bobby said, I want to use this opportunity to just get rid of the size estimation logic all together. We should just always compute an exact value, whether the user is requesting it explicitly via inner_join_size or we're computing it internally.

jrhemstad · 2021-05-13T17:01:43Z

CC @magnatelee I recall long ago you were asking for the same thing.

Addresses #8237 This PR adds 3 join size APIs (`hash_join::inner_join_size`, `hash_join::left_join_size` and `hash_join::full_join_size`) into `hash_join` class, one for each type of join that returns the exact number of matches with the specified probe table. It completely removed the deprecated size estimation logic in the current implementation. Also, this PR updates the existing join APIs by adding an optional `output_size` as an argument. If `output_size.has_value()`, we take that value directly for further computation. Otherwise, the target join will internally invoke its corresponding size function. `TODO`: the current `full_join_size` uses a 2-step algorithm similar to what's used in `hash_join::full_join`. It duplicates certain computations with `full_join` also thus should be refactored during `cuco` integration. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Robert Maynard (https://github.com/robertmaynard) - Robert (Bobby) Evans (https://github.com/revans2) URL: #8453

beckernick · 2021-06-23T14:32:44Z

@PointKernel based on the discussion, it looks like this was fully addressed by #8453 . Can this be closed?

PointKernel · 2021-06-23T14:34:44Z

Thanks for the reminder. Let me close this.

revans2 added feature request New feature or request Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels May 13, 2021

revans2 mentioned this issue May 13, 2021

[FEA] Split stream side join batches into smaller sizes if the join explodes NVIDIA/spark-rapids#2408

Closed

jlowe added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels May 13, 2021

revans2 mentioned this issue May 18, 2021

[FEA] Use CUDF API for getting join output size NVIDIA/spark-rapids#2440

Open

jrhemstad assigned PointKernel Jun 1, 2021

PointKernel mentioned this issue Jun 7, 2021

Add accurate hash join size functions #8453

Merged

PointKernel closed this as completed Jun 23, 2021

jlowe mentioned this issue Jul 30, 2021

[FEA] Expose size estimation for conditional joins #8918

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Expose size estimation for join #8237

[FEA] Expose size estimation for join #8237

revans2 commented May 13, 2021

jrhemstad commented May 13, 2021

jrhemstad commented May 13, 2021

beckernick commented Jun 23, 2021

PointKernel commented Jun 23, 2021

[FEA] Expose size estimation for join #8237

[FEA] Expose size estimation for join #8237

Comments

revans2 commented May 13, 2021

jrhemstad commented May 13, 2021

jrhemstad commented May 13, 2021

beckernick commented Jun 23, 2021

PointKernel commented Jun 23, 2021