-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update NVBench fixture to use new hooks, fix pinned memory segfault. #15492
Update NVBench fixture to use new hooks, fix pinned memory segfault. #15492
Conversation
The function-scope-static pinned mr was getting destroyed after the CUDA context. Moving to a function-scope variable and ensuring that the pinned mr is not registered with cudf when the fixture is destroyed ensures that the pool is freed while the context is valid.
} | ||
] | ||
"git_url": "https://github.com/NVIDIA/nvbench.git", | ||
"git_tag": "5ee8811a1ac5a90f73a4dc52ab8572c25724a0e8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tag should be updated in https://github.com/rapidsai/rapids-cmake/blob/8b1a1e0e2302ec5a6cfeed762c4f281268e7adca/rapids-cmake/cpm/versions.json#L84 rather than here, if these changes can be applied across RAPIDS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdice what's the process for making PR's that depend on rapids-cmake changes?
I don't suppose there's a way to make a cudf PR target the rapids-cmake PR? Or do we need to wait for the rapids-cmake changes to get merged first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can open a PR to rapids-cmake and use that for testing cudf (or other RAPIDS repos that would be affected by the change). Once the rapids-cmake PR is known to be safe, it can be merged (with further changes in each repo merged concurrently, or admin-merged by ops if needed).
See example here: https://github.com/rapidsai/cudf/pull/14704/files#diff-4cf10ebc4636f2671b1909aa0f64141af8b02c1d102764dbbfb34cd493246fb1R29-R30
tl;dr Add something like this in cudf's rapids_config.cmake
that points to your fork/branch of rapids-cmake:
set(rapids-cmake-repo bdice/rapids-cmake)
set(rapids-cmake-branch cccl-2.3.0)
This is copied from the rapids-cmake README: https://github.com/rapidsai/rapids-cmake?tab=readme-ov-file#overriding-rapidscmake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rapidsai/rapids-cmake#584 is now active to merge these changes upstream.
But once merged it will break cudf benchmarks since our current patch will fail to apply.
So to ensure no breaks, we will need to merge this PR first, merge the rapids-cmake PR, and make a follow-up that removes the nvbench_override.json
( and related CMake call ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the same commit in this PR and the rapids-cmake PR:
"git_tag": "5ee8811a1ac5a90f73a4dc52ab8572c25724a0e8" | |
"git_tag": "555d628e9b250868c9da003e4407087ff1982e8e" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the SHA. I'll leave this change active following Rob's suggestion above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertmaynard I think we are ready to merge this so we can start the merge dance you described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great. 👍
/ok to test |
Thanks for updating this @davidwendt -- I made a couple more changes to address some of the other comments. |
/merge |
Updates nvbench for rapidsai/cudf#15492 Authors: - Allison Piper (https://github.com/alliepiper) Approvers: None URL: #584
The override is no longer necessary as rapids-cmake now uses the same version that was set by the override. Refs rapidsai/rapids-cmake#584, #15492 Authors: - Allison Piper (https://github.com/alliepiper) - Bradley Dice (https://github.com/bdice) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice) URL: #15633
Description
NVBench recently exposed new hooks for modifying its
main
implementation. Updated cudf to use these.Also noticed that the host pinned-pool memory resource option caused the test to segfault, since the function-scope static holding the pool outlived the CUDA context. Refactored the fixture a bit to ensure that the pool is destroyed before the context.
Note that this currently overrides the rapids-cmake version for NVBench. Rapids-cmake should be updated and the override removed before this is merged (ping @robertmaynard).
cc: @jrhemstad @davidwendt