-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ptx usage to account for PTX ISA availability #1359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one nit.
Overall comment/question on testability:
- Does our use of
__CUDA_MINIMUM_ARCH__
disable the features when compiled for multiple architectures (say SM80+SM90)? - Should we test this in CI with all major architectures? Or would our use of
__CUDA_MINIMUM_ARCH__
defeat the purpose?
Wait.. We wouldn't be able to catch this anyway. I don't see 11.8 in the CI matrix? |
We would not, we need to expand out test matrix for that |
c75420d
to
4744c02
Compare
4744c02
to
ea8a188
Compare
181060f
to
1be7dee
Compare
libcudacxx/include/cuda/std/detail/libcxx/include/__cccl/ptx_isa.h
Outdated
Show resolved
Hide resolved
52af97b
to
704364b
Compare
We want this to be globally available
63a957e
to
47f524a
Compare
I have decided to punt on the CI enhancements, as those are really icky to get right- Also we most likely wont backport those, so just reducing this to a pure product PR seems fine |
@@ -50,7 +50,7 @@ _LIBCUDACXX_BEGIN_NAMESPACE_CUDA_DEVICE_EXPERIMENTAL | |||
// capability 9.0 and above. The check for (!defined(__CUDA_MINIMUM_ARCH__)) is | |||
// necessary to prevent cudafe from ripping out the device functions before | |||
// device compilation begins. | |||
#if (!defined(__CUDA_MINIMUM_ARCH__)) || (defined(__CUDA_MINIMUM_ARCH__) && 900 <= __CUDA_MINIMUM_ARCH__) | |||
#ifdef __cccl_lib_experimental_ctk12_cp_async_exposure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function below are not strictly speaking part of the experimental exposure, but the check for the feature is currently the same as the check for availability of cp.async.bulk
would be. Not a blocker imho, just want to note this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. It makes sense to disable the mbarrier.expect_tx and cp.async.bulk tests on nvcc 11, as they aren't supported there.
I think the architecture conditional code is now properly guarded by both PTX ISA version and NV_IF_TARGET + linker error hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed on of the tests was failing. Suggested a fix.
libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/barrier.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good now.
Can we merge this PR? |
Backport failed for Please cherry-pick the changes locally. git fetch origin branch/2.3.x
git worktree add -d .worktree/backport-1359-to-branch/2.3.x origin/branch/2.3.x
cd .worktree/backport-1359-to-branch/2.3.x
git checkout -b backport-1359-to-branch/2.3.x
ancref=$(git merge-base c8dde0ec2e42573069b1add37dfb83c5fc7a1673 555ac64435e6f0d175a09c34c4bdad9fa0ead91d)
git cherry-pick -x $ancref..555ac64435e6f0d175a09c34c4bdad9fa0ead91d |
Currently we only guard those instructions based on the available architecture. However, it is also valid to compile with an old toolkit for a new machine. Consequently we need to strengthen our checks against available PTX ISA
Currently we only guard those instructions based on the available architecture. However, it is also valid to compile with an old toolkit for a new machine. Consequently we need to strengthen our checks against available PTX ISA
…#1421) * Fix ptx usage to account for PTX ISA availability (#1359) Currently we only guard those instructions based on the available architecture. However, it is also valid to compile with an old toolkit for a new machine. Consequently we need to strengthen our checks against available PTX ISA * Do not use VLAs in `cp_async_bulk_tensor_*` tests VLAs are a compiler extension and are correctly errored out by some compilers. As we always know the exact size of the array anyway just swtich to a `cuda::std::array` Fixes nvbug4476664 * Use proper shared memory size Authored-by: Allard Hendriksen <[email protected]> * Fix incorrect linker issue * Ensure runfail tests do not fail without execution * Ensure that __cccl_ptx_isa properly guards feature flags
We encountered CI failures when trying to update rmm.
It turns out that it is indeed valid to use an older CTK to build on a new Hopper machine.
That means we cannot solely guard ptx on the available architectures, but also need to account for the avilability of PTX ISA.
To ensure that this information is globally available, we move the ptx_isa detection into
__cccl_config
and use the appropriate feature test macros in the code