Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NVBench fixture to use new hooks, fix pinned memory segfault. #15492

Merged
merged 7 commits into from
Apr 23, 2024

Conversation

alliepiper
Copy link
Contributor

Description

NVBench recently exposed new hooks for modifying its main implementation. Updated cudf to use these.

Also noticed that the host pinned-pool memory resource option caused the test to segfault, since the function-scope static holding the pool outlived the CUDA context. Refactored the fixture a bit to ensure that the pool is destroyed before the context.

Note that this currently overrides the rapids-cmake version for NVBench. Rapids-cmake should be updated and the override removed before this is merged (ping @robertmaynard).

cc: @jrhemstad @davidwendt

The function-scope-static pinned mr was getting destroyed after the CUDA context.

Moving to a function-scope variable and ensuring that the pinned mr is not registered with cudf when the fixture is destroyed ensures that the pool is freed while the context is valid.
@alliepiper alliepiper requested review from a team as code owners April 9, 2024 22:10
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Apr 9, 2024
}
]
"git_url": "https://github.com/NVIDIA/nvbench.git",
"git_tag": "5ee8811a1ac5a90f73a4dc52ab8572c25724a0e8"
Copy link
Contributor

@bdice bdice Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tag should be updated in https://github.com/rapidsai/rapids-cmake/blob/8b1a1e0e2302ec5a6cfeed762c4f281268e7adca/rapids-cmake/cpm/versions.json#L84 rather than here, if these changes can be applied across RAPIDS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice what's the process for making PR's that depend on rapids-cmake changes?

I don't suppose there's a way to make a cudf PR target the rapids-cmake PR? Or do we need to wait for the rapids-cmake changes to get merged first?

Copy link
Contributor

@bdice bdice Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can open a PR to rapids-cmake and use that for testing cudf (or other RAPIDS repos that would be affected by the change). Once the rapids-cmake PR is known to be safe, it can be merged (with further changes in each repo merged concurrently, or admin-merged by ops if needed).

See example here: https://github.com/rapidsai/cudf/pull/14704/files#diff-4cf10ebc4636f2671b1909aa0f64141af8b02c1d102764dbbfb34cd493246fb1R29-R30

tl;dr Add something like this in cudf's rapids_config.cmake that points to your fork/branch of rapids-cmake:

set(rapids-cmake-repo bdice/rapids-cmake)
set(rapids-cmake-branch cccl-2.3.0)

This is copied from the rapids-cmake README: https://github.com/rapidsai/rapids-cmake?tab=readme-ov-file#overriding-rapidscmake

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rapidsai/rapids-cmake#584 is now active to merge these changes upstream.
But once merged it will break cudf benchmarks since our current patch will fail to apply.

So to ensure no breaks, we will need to merge this PR first, merge the rapids-cmake PR, and make a follow-up that removes the nvbench_override.json ( and related CMake call ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the same commit in this PR and the rapids-cmake PR:

Suggested change
"git_tag": "5ee8811a1ac5a90f73a4dc52ab8572c25724a0e8"
"git_tag": "555d628e9b250868c9da003e4407087ff1982e8e"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the SHA. I'll leave this change active following Rob's suggestion above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertmaynard I think we are ready to merge this so we can start the merge dance you described above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great. 👍

@davidwendt davidwendt added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 17, 2024
@davidwendt
Copy link
Contributor

/ok to test

@alliepiper
Copy link
Contributor Author

Thanks for updating this @davidwendt -- I made a couple more changes to address some of the other comments.

@robertmaynard
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit e6d9b9f into rapidsai:branch-24.06 Apr 23, 2024
70 checks passed
rapids-bot bot pushed a commit to rapidsai/rapids-cmake that referenced this pull request Apr 24, 2024
Updates nvbench for rapidsai/cudf#15492

Authors:
  - Allison Piper (https://github.com/alliepiper)

Approvers: None

URL: #584
@alliepiper alliepiper deleted the nvbench_main_update branch May 1, 2024 18:06
@alliepiper alliepiper mentioned this pull request May 1, 2024
1 task
rapids-bot bot pushed a commit that referenced this pull request May 2, 2024
The override is no longer necessary as rapids-cmake now uses the same version that was set by the override.

Refs rapidsai/rapids-cmake#584, #15492

Authors:
  - Allison Piper (https://github.com/alliepiper)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Bradley Dice (https://github.com/bdice)

URL: #15633
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants