Update NVBench fixture to use new hooks, fix pinned memory segfault. #15492

alliepiper · 2024-04-09T22:10:37Z

Description

NVBench recently exposed new hooks for modifying its main implementation. Updated cudf to use these.

Also noticed that the host pinned-pool memory resource option caused the test to segfault, since the function-scope static holding the pool outlived the CUDA context. Refactored the fixture a bit to ensure that the pool is destroyed before the context.

Note that this currently overrides the rapids-cmake version for NVBench. Rapids-cmake should be updated and the override removed before this is merged (ping @robertmaynard).

cc: @jrhemstad @davidwendt

The function-scope-static pinned mr was getting destroyed after the CUDA context. Moving to a function-scope variable and ensuring that the pinned mr is not registered with cudf when the fixture is destroyed ensures that the pool is freed while the context is valid.

bdice · 2024-04-09T22:14:07Z

cpp/cmake/thirdparty/patches/nvbench_override.json

-        }
-      ]
+      "git_url": "https://github.com/NVIDIA/nvbench.git",
+      "git_tag": "5ee8811a1ac5a90f73a4dc52ab8572c25724a0e8"


This tag should be updated in https://github.com/rapidsai/rapids-cmake/blob/8b1a1e0e2302ec5a6cfeed762c4f281268e7adca/rapids-cmake/cpm/versions.json#L84 rather than here, if these changes can be applied across RAPIDS.

@bdice what's the process for making PR's that depend on rapids-cmake changes?

I don't suppose there's a way to make a cudf PR target the rapids-cmake PR? Or do we need to wait for the rapids-cmake changes to get merged first?

You can open a PR to rapids-cmake and use that for testing cudf (or other RAPIDS repos that would be affected by the change). Once the rapids-cmake PR is known to be safe, it can be merged (with further changes in each repo merged concurrently, or admin-merged by ops if needed).

See example here: https://github.com/rapidsai/cudf/pull/14704/files#diff-4cf10ebc4636f2671b1909aa0f64141af8b02c1d102764dbbfb34cd493246fb1R29-R30

tl;dr Add something like this in cudf's rapids_config.cmake that points to your fork/branch of rapids-cmake:

set(rapids-cmake-repo bdice/rapids-cmake) set(rapids-cmake-branch cccl-2.3.0)

This is copied from the rapids-cmake README: https://github.com/rapidsai/rapids-cmake?tab=readme-ov-file#overriding-rapidscmake

rapidsai/rapids-cmake#584 is now active to merge these changes upstream.
But once merged it will break cudf benchmarks since our current patch will fail to apply.

So to ensure no breaks, we will need to merge this PR first, merge the rapids-cmake PR, and make a follow-up that removes the nvbench_override.json ( and related CMake call ).

We should use the same commit in this PR and the rapids-cmake PR:

Suggested change

"git_tag": "5ee8811a1ac5a90f73a4dc52ab8572c25724a0e8"

"git_tag": "555d628e9b250868c9da003e4407087ff1982e8e"

Updated the SHA. I'll leave this change active following Rob's suggestion above.

@robertmaynard I think we are ready to merge this so we can start the merge dance you described above.

Sounds great. 👍

cpp/benchmarks/fixture/nvbench_main.cpp

cpp/benchmarks/fixture/nvbench_fixture.hpp

cpp/benchmarks/fixture/nvbench_main.cpp

davidwendt · 2024-04-17T18:28:30Z

/ok to test

alliepiper · 2024-04-17T21:37:24Z

Thanks for updating this @davidwendt -- I made a couple more changes to address some of the other comments.

robertmaynard · 2024-04-23T21:11:39Z

/merge

Updates nvbench for rapidsai/cudf#15492 Authors: - Allison Piper (https://github.com/alliepiper) Approvers: None URL: #584

The override is no longer necessary as rapids-cmake now uses the same version that was set by the override. Refs rapidsai/rapids-cmake#584, #15492 Authors: - Allison Piper (https://github.com/alliepiper) - Bradley Dice (https://github.com/bdice) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice) URL: #15633

alliepiper added 4 commits April 9, 2024 21:29

Update to use new NVBench main() hooks.

d9780be

Remove old patch for nvbench fixtures.

261d4f5

Temporarily override nvbench version for testing.

66fdd94

alliepiper requested review from a team as code owners April 9, 2024 22:10

alliepiper requested review from vyasr and pmattione-nvidia April 9, 2024 22:10

github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Apr 9, 2024

bdice reviewed Apr 9, 2024

View reviewed changes

davidwendt reviewed Apr 10, 2024

View reviewed changes