Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Build only compute for the newest arch in CMAKE_CUDA_ARCHITECTURES #706

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions cmake/Modules/SetGPUArchs.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,13 @@ if(CMAKE_CUDA_ARCHITECTURES STREQUAL "")
evaluate_gpu_archs(CMAKE_CUDA_ARCHITECTURES)
endif(CMAKE_CUDA_ARCHITECTURES STREQUAL "")

# CMake architecture list entry of "80" means to build compute and sm. What we want is for the
# newest arch only to build that way while the rest built only for sm.
list(SORT CMAKE_CUDA_ARCHITECTURES ORDER ASCENDING)
list(POP_BACK CMAKE_CUDA_ARCHITECTURES latest_arch)
list(TRANSFORM CMAKE_CUDA_ARCHITECTURES APPEND "-real")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is -real a cmake command? What does this do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-real and -virtual are special keywords that can be used with CMAKE_CUDA_ARCHITECTURES to provide abstractions around different CUDA compilers code generation API.

For nvcc:

input compiler invocation
80 --generate-code=arch=compute_80,code=[sm_80,compute_80]
80-virtual --generate-code=arch=compute_80,code=compute_80
80-real --generate-code=arch=compute_80,code=sm_80

Copy link
Member

@harrism harrism Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I see the output of this command when CMAKE_CUDA_ARCHITECTURES is unset? I see above now.

We want SASS for all architectures we support, right? If we only include SASS ("-real"/) for 80, then users with anything but Ampere GPUs will experience looooong load/import times due to PTX-JIT to their present architecture. We do need to include PTX, but only for those who have GPUs we don't officially support (e.g. forward compatibility).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I had it backwards. The -real is appended to all but the last entry. I thought it was only being appended to the last entry. All good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want SASS for all architectures we support, right? If we only include SASS ("-real"/) for 80, then users with anything but Ampere GPUs will experience looooong load/import times

You are correct. The code above is sneaky, as what we do is remove the 'newest' and only apply -real to any existing values. So input 70,80 becomes 70-real, 80 and input 80 becomes 80

list(APPEND CMAKE_CUDA_ARCHITECTURES ${latest_arch})

set(CMAKE_CUDA_ARCHITECTURES
${CMAKE_CUDA_ARCHITECTURES}
PARENT_SCOPE)