Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get SM count with cudaDeviceGetAttribute in KernelHardwareInfo #927

Merged
merged 1 commit into from
Apr 28, 2023

Conversation

aakhundov
Copy link
Contributor

@aakhundov aakhundov commented Apr 26, 2023

Currently in the KernelHardwareInfo class, cudaGetDeviceProperties is called here just to get the number of SMs. cudaGetDeviceProperties is quite heavy and can take from ~1 to 10s of milliseconds to complete. As it is called under the hood of every GemmUniversalAdapter::initialize when using SM90 TMA warp-specialized cooperative and pingpong kernel schedules, this makes the use cases where the arguments must be re-initialized on every kernel invocation quite inefficient.

In this PR, the cudaGetDeviceProperties call is replaced with much more lightweight cudaDeviceGetAttribute to get the cudaDevAttrMultiProcessorCount only. This takes in the order of 10ns and is virtually invisible in the trace.

@hwu36 hwu36 merged commit fe2f491 into NVIDIA:main Apr 28, 2023
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 5, 2023
Summary: A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Differential Revision: D45603657

fbshipit-source-id: 1e0d47ceab6b26ac2d923157ecc88cd259e513d4
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 5, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Differential Revision: D45603657

fbshipit-source-id: fa3ae9f35fe21331b5370be4923fe3731ce97985
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 5, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Differential Revision: https://internalfb.com/D45603657

fbshipit-source-id: 4a5d13eacdc3901f54a93b8c2012db64ca4ecb4c
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 5, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Differential Revision: D45603657

fbshipit-source-id: 1127bcad40c6757c377e5b44a3262e660f0e481a
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 7, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Reviewed By: chenyang78

Differential Revision: D45603657

fbshipit-source-id: e488b16e071b42562612b66142dc2857a94fedaf
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 7, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Differential Revision: https://internalfb.com/D45603657

fbshipit-source-id: 04d5dd0666cfa52ee49879947ccc62f1eea01c8c
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 7, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Differential Revision: https://internalfb.com/D45603657

fbshipit-source-id: 5642bb57bf7d17cdee2bcfca28ebbd8206d31435
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 7, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Reviewed By: chenyang78

Differential Revision: D45603657

fbshipit-source-id: c12a695093307be40238c09734056d55c2d9a4a2
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 7, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Differential Revision: https://internalfb.com/D45603657

fbshipit-source-id: 97764449416251fd1818e42517706531745275fe
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 7, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Reviewed By: chenyang78

Differential Revision: D45603657

fbshipit-source-id: 568f53f0d65b81be3af80e2bfd536c4bb7145425
aakhundov added a commit to aakhundov/AITemplate-1 that referenced this pull request May 7, 2023
Summary:
Pull Request resolved: facebookincubator#662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [facebookincubator#920](NVIDIA/cutlass#920) and [facebookincubator#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Differential Revision: https://internalfb.com/D45603657

fbshipit-source-id: 8b980000c928d56c9a1bc5bcb974b36ff429b9e4
facebook-github-bot pushed a commit to facebookincubator/AITemplate that referenced this pull request May 7, 2023
Summary:
Pull Request resolved: #662

A few issues in the CUTLASS codebase blocking the integration of the CUTLASS 3.x SM90 kernels in AITemplate have been fixed upstream (see, e.g., the merged PRs [#920](NVIDIA/cutlass#920) and [#927](NVIDIA/cutlass#927)). The CUTLASS version is synced with the upstream to proceed with the SM90 integration.

Reviewed By: chenyang78

Differential Revision: D45603657

fbshipit-source-id: 6b64f6ee0b9f87c2f379144d0fa568487aef8076
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants