Update cuDF merge benchmark #867

pentschev · 2022-08-04T20:28:41Z

This adds several bugfixes and improvements:

Correctly assign CUDA_VISIBLE_DEVICES;
Create CUDA contexts before initializing UCX;
Ensure merge uses the "key" column and asserts expected result size;
Allow running the benchmark for multiple iterations;
Allow running multiple warmup iterations;
Prints results for each iteration;
Allow enabling garbage collection after each itearation.

pentschev · 2022-08-05T09:00:30Z

Since this benchmark is broken due to changes in CUDA context handling, it would be good if this still makes 0.27 (22.08), despite the ongoing burndown. As this is only a benchmark it should not impact release in any way, as this is not executed anywhere as part of that process.

wence- · 2022-08-05T09:42:39Z

Since this benchmark is broken due to changes in CUDA context handling

Can you explain what you mean by this?

wence-

Minor nitpicks, otherwise looks good.

benchmarks/cudf-merge.py

ucp/utils.py

pentschev · 2022-08-05T12:35:07Z

Since this benchmark is broken due to changes in CUDA context handling

Can you explain what you mean by this?

cuDF creates a CUDA context at import time, which wasn't the case in the past, and is the reason we have rapidsai/dask-cuda#379 in Dask-CUDA, if we don't set CUDA_VISIBLE_DEVICES then a context will always be created on device 0. Added to that, we now rely on UCX to ensure IB<->GPU mapping automatically, but that requires the CUDA context to be created before import ucp.

Change clock to `time.monotonic()`, to prevent issues with clock going backwards in some systems.

benchmarks/cudf-merge.py

wence- · 2022-08-05T14:07:15Z

FWIW, the only niggle I have left here is that the --chunks-per-dev argument is a bit misnamed. The total number of workers started up is chunks-per-dev * num_devices (so when running with all GPUs and more than chunks-per-dev > 1 we get errors because we try and build 2 (or more) RMM pools per device and things break).

pentschev · 2022-08-05T14:15:25Z

FWIW, the only niggle I have left here is that the --chunks-per-dev argument is a bit misnamed. The total number of workers started up is chunks-per-dev * num_devices (so when running with all GPUs and more than chunks-per-dev > 1 we get errors because we try and build 2 (or more) RMM pools per device and things break).

The RMM pool being a problem is expected with the default value, that's when the user will have to adjust --rmm-init-pool-size. So --chunks-per-dev still sounds reasonable, it would be misnamed if it was --chunks-per-process or --chunks-per-worker. Do you have a proposal on how to make this better?

wence- · 2022-08-05T16:27:21Z

FWIW, the only niggle I have left here is that the --chunks-per-dev argument is a bit misnamed. The total number of workers started up is chunks-per-dev * num_devices (so when running with all GPUs and more than chunks-per-dev > 1 we get errors because we try and build 2 (or more) RMM pools per device and things break).

The RMM pool being a problem is expected with the default value, that's when the user will have to adjust --rmm-init-pool-size. So --chunks-per-dev still sounds reasonable, it would be misnamed if it was --chunks-per-process or --chunks-per-worker. Do you have a proposal on how to make this better?

I had naively thought that we would get num_devices workers and chunks-per-dev would say "how many chunks does each worker have". But actually, each worker only ever has a single chunk, and we always have num_devices * chunks_per_device workers (i.e. we oversubscribe the GPUs).

Since this isn't a substantive change from the current behaviour, let's not pollute this PR with too many additional pieces (sorry, this is my fault for noticing!) and leave as-is for now.

benchmarks/cudf-merge.py

Co-authored-by: Lawrence Mitchell <[email protected]>

This reverts commit 2159d1d.

pentschev · 2022-08-10T11:48:25Z

I think all changes are in now, could you check one last time/approve the PR @wence- ?

wence- · 2022-08-10T12:08:18Z

@gpucibot merge

wence- · 2022-08-10T12:09:57Z

I am happy to merge this, though does it need further approval to go to 0.27 (rather than 0.28) ?

pentschev · 2022-08-10T12:30:59Z

I am happy to merge this, though does it need further approval to go to 0.27 (rather than 0.28) ?

UCX-Py doesn't follow exactly the same conditions as the remaining of RAPIDS, and given there are no other required reviewers, we should be good to merge. Also gpucibot has no power here. :)

wence- · 2022-08-10T12:36:53Z

OK, let's wait for tests.

pentschev · 2022-08-10T12:53:24Z

For the very small benchmark in CI, the assertion failed:

$ python cudf-merge.py --chunks-per-dev 4 --chunk-size 10000 --rm
m-init-pool-size 2097152
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/datasets/pentschev/miniconda3/envs/rn-220803/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/datasets/pentschev/miniconda3/envs/rn-220803/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/datasets/pentschev/miniconda3/envs/rn-220803/lib/python3.8/site-packages/ucp/utils.py", line 131, in _worker_process
    ret = loop.run_until_complete(run())
  File "/datasets/pentschev/miniconda3/envs/rn-220803/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/datasets/pentschev/miniconda3/envs/rn-220803/lib/python3.8/site-packages/ucp/utils.py", line 126, in run
    return await func(rank, eps, args)
  File "/datasets/pentschev/src/ucx-py/benchmarks/cudf-merge.py", line 216, in worker
    assert abs(len(ret) - expected_len) <= expected_len_err

Maximum error was 30, and the actual value was 31. Since this is not critical, I increased the tolerance to 2% now.

pentschev · 2022-08-10T13:08:34Z

And it seems I was wrong. We have to ask for approval.

msadang

LGTM

pentschev added 7 commits August 4, 2022 01:14

Fix data size calculation

2859852

Add --iter/--warmup-iter arguments

bd1783c

Fix setting proper CUDA devices for each worker

801c87c

Create CUDA context on worker initialization

d28e4eb

Add --collect-garbage argument

68c6a6b

Add print_multi util function

6b972af

Print results per iteration

6e6184f

pentschev marked this pull request as ready for review August 5, 2022 08:57

pentschev requested a review from a team as a code owner August 5, 2022 08:57

wence- requested changes Aug 5, 2022

View reviewed changes

Compute overall bandwidth using harmonic mean

b385060

pentschev added 2 commits August 5, 2022 06:15

Remove Dask dependency

f8969f2

Change clock to `time.monotonic()`, to prevent issues with clock going backwards in some systems.

Add "/ GPU" to bandwidth/throughput description

2159d1d

wence- reviewed Aug 5, 2022

View reviewed changes

benchmarks/cudf-merge.py Outdated Show resolved Hide resolved

Fix merge() call and assert result within expected range

7cc5156

wence- reviewed Aug 9, 2022

View reviewed changes

benchmarks/cudf-merge.py Outdated Show resolved Hide resolved

wence- reviewed Aug 9, 2022

View reviewed changes

benchmarks/cudf-merge.py Outdated Show resolved Hide resolved

pentschev and others added 2 commits August 9, 2022 16:03

Fix assertion

80fa636

Co-authored-by: Lawrence Mitchell <[email protected]>

Revert "Add "/ GPU" to bandwidth/throughput description"

ae884a6

This reverts commit 2159d1d.

wence- approved these changes Aug 10, 2022

View reviewed changes

Increase expected tolerance to 2%

0c1d478

msadang approved these changes Aug 10, 2022

View reviewed changes

quasiben approved these changes Aug 10, 2022

View reviewed changes

msadang merged commit e9e81f8 into rapidsai:branch-0.27 Aug 10, 2022

pentschev deleted the cudf-merge-update branch August 24, 2022 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cuDF merge benchmark #867

Update cuDF merge benchmark #867

pentschev commented Aug 4, 2022 •

edited

Loading

pentschev commented Aug 5, 2022

wence- commented Aug 5, 2022

wence- left a comment

pentschev commented Aug 5, 2022

wence- commented Aug 5, 2022

pentschev commented Aug 5, 2022

wence- commented Aug 5, 2022

pentschev commented Aug 10, 2022

wence- commented Aug 10, 2022

wence- commented Aug 10, 2022

pentschev commented Aug 10, 2022

wence- commented Aug 10, 2022

pentschev commented Aug 10, 2022

pentschev commented Aug 10, 2022

msadang left a comment

Update cuDF merge benchmark #867

Update cuDF merge benchmark #867

Conversation

pentschev commented Aug 4, 2022 • edited Loading

pentschev commented Aug 5, 2022

wence- commented Aug 5, 2022

wence- left a comment

Choose a reason for hiding this comment

pentschev commented Aug 5, 2022

wence- commented Aug 5, 2022

pentschev commented Aug 5, 2022

wence- commented Aug 5, 2022

pentschev commented Aug 10, 2022

wence- commented Aug 10, 2022

wence- commented Aug 10, 2022

pentschev commented Aug 10, 2022

wence- commented Aug 10, 2022

pentschev commented Aug 10, 2022

pentschev commented Aug 10, 2022

msadang left a comment

Choose a reason for hiding this comment

pentschev commented Aug 4, 2022 •

edited

Loading