Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-generation of CUTLASS Extension Kernel Templates #2932

Closed
wants to merge 1 commit into from

Conversation

manishucsd
Copy link
Contributor

@manishucsd manishucsd commented Aug 2, 2024

Summary:
This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Conform with CUTLASS's device-side API to allow us to sweep all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966

Copy link

netlify bot commented Aug 2, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 38e72ff
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66ccb4d75419ee000818598e
😎 Deploy Preview https://deploy-preview-2932--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60171966

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request Aug 22, 2024
Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request Aug 23, 2024
Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request Aug 23, 2024
Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request Aug 23, 2024
Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request Aug 23, 2024
Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966
Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Reviewed By: ipiszy

Differential Revision: D60171966
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60171966

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in de845bf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants