-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get SM count with cudaDeviceGetAttribute in KernelHardwareInfo #927
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hwu36
approved these changes
Apr 28, 2023
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 5, 2023
Summary: A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Differential Revision: D45603657 fbshipit-source-id: 1e0d47ceab6b26ac2d923157ecc88cd259e513d4
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 5, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Differential Revision: D45603657 fbshipit-source-id: fa3ae9f35fe21331b5370be4923fe3731ce97985
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 5, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Differential Revision: https://internalfb.com/D45603657 fbshipit-source-id: 4a5d13eacdc3901f54a93b8c2012db64ca4ecb4c
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 5, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Differential Revision: D45603657 fbshipit-source-id: 1127bcad40c6757c377e5b44a3262e660f0e481a
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 7, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Reviewed By: chenyang78 Differential Revision: D45603657 fbshipit-source-id: e488b16e071b42562612b66142dc2857a94fedaf
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 7, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Differential Revision: https://internalfb.com/D45603657 fbshipit-source-id: 04d5dd0666cfa52ee49879947ccc62f1eea01c8c
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 7, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Differential Revision: https://internalfb.com/D45603657 fbshipit-source-id: 5642bb57bf7d17cdee2bcfca28ebbd8206d31435
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 7, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Reviewed By: chenyang78 Differential Revision: D45603657 fbshipit-source-id: c12a695093307be40238c09734056d55c2d9a4a2
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 7, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Differential Revision: https://internalfb.com/D45603657 fbshipit-source-id: 97764449416251fd1818e42517706531745275fe
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 7, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Reviewed By: chenyang78 Differential Revision: D45603657 fbshipit-source-id: 568f53f0d65b81be3af80e2bfd536c4bb7145425
aakhundov
added a commit
to aakhundov/AITemplate-1
that referenced
this pull request
May 7, 2023
Summary: Pull Request resolved: facebookincubator#662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Differential Revision: https://internalfb.com/D45603657 fbshipit-source-id: 8b980000c928d56c9a1bc5bcb974b36ff429b9e4
facebook-github-bot
pushed a commit
to facebookincubator/AITemplate
that referenced
this pull request
May 7, 2023
Summary: Pull Request resolved: #662 A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [#920](NVIDIA/cutlass#920) and [#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration. Reviewed By: chenyang78 Differential Revision: D45603657 fbshipit-source-id: 6b64f6ee0b9f87c2f379144d0fa568487aef8076
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently in the
KernelHardwareInfo
class,cudaGetDeviceProperties
is called here just to get the number of SMs.cudaGetDeviceProperties
is quite heavy and can take from ~1 to 10s of milliseconds to complete. As it is called under the hood of everyGemmUniversalAdapter::initialize
when using SM90 TMA warp-specialized cooperative and pingpong kernel schedules, this makes the use cases where the arguments must be re-initialized on every kernel invocation quite inefficient.In this PR, the
cudaGetDeviceProperties
call is replaced with much more lightweightcudaDeviceGetAttribute
to get thecudaDevAttrMultiProcessorCount
only. This takes in the order of 10ns and is virtually invisible in the trace.