Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
-real
a cmake command? What does this do?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://cmake.org/cmake/help/git-stage/prop_tgt/CUDA_ARCHITECTURES.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-real
and-virtual
are special keywords that can be used withCMAKE_CUDA_ARCHITECTURES
to provide abstractions around different CUDA compilers code generation API.For nvcc:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I see the output of this command whenI see above now.CMAKE_CUDA_ARCHITECTURES
is unset?We want SASS for all architectures we support, right? If we only include SASS ("-real"/) for 80, then users with anything but Ampere GPUs will experience looooong load/import times due to PTX-JIT to their present architecture. We do need to include PTX, but only for those who have GPUs we don't officially support (e.g. forward compatibility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I had it backwards. The -real is appended to all but the last entry. I thought it was only being appended to the last entry. All good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. The code above is sneaky, as what we do is remove the 'newest' and only apply
-real
to any existing values. So input70,80
becomes70-real, 80
and input80
becomes80