New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Auto-generation of CUTLASS Extension Kernel Templates #2932

Closed

manishucsd wants to merge 1 commit into pytorch:main from manishucsd:export-D60171966

Contributor

manishucsd commented Aug 2, 2024 •

edited

Loading

Summary:
This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Conform with CUTLASS's device-side API to allow us to sweep all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966

facebook-github-bot added the cla signed label

netlify bot commented Aug 2, 2024 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`38e72ff`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66ccb4d75419ee000818598e
😎 Deploy Preview	https://deploy-preview-2932--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Contributor

facebook-github-bot commented Aug 2, 2024

This pull request was exported from Phabricator. Differential Revision: D60171966

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Aug 22, 2024

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd force-pushed the export-D60171966 branch from acc1311 to daa1c8f Compare

August 22, 2024 22:41

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request


          Auto-generation of CUTLASS Extension Kernel Templates (pytorch#2932)

daa1c8f

Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966

Contributor

facebook-github-bot commented Aug 23, 2024

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request


          Auto-generation of CUTLASS Extension Kernel Templates (pytorch#2932)

02e06b4

Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966

manishucsd force-pushed the export-D60171966 branch from daa1c8f to 02e06b4 Compare

August 23, 2024 03:22

Contributor

facebook-github-bot commented Aug 23, 2024

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request


          Auto-generation of CUTLASS Extension Kernel Templates (pytorch#2932)

0286e0a

Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966

manishucsd force-pushed the export-D60171966 branch from 02e06b4 to 0286e0a Compare

August 23, 2024 03:28

Contributor

facebook-github-bot commented Aug 23, 2024

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd force-pushed the export-D60171966 branch from 0286e0a to e7e0859 Compare

August 23, 2024 17:55

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request


          Auto-generation of CUTLASS Extension Kernel Templates (pytorch#2932)

e7e0859

Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966

Contributor

facebook-github-bot commented Aug 23, 2024

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd added a commit to manishucsd/FBGEMM that referenced this pull request


          Auto-generation of CUTLASS Extension Kernel Templates (pytorch#2932)

ed73719

Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Differential Revision: D60171966

manishucsd force-pushed the export-D60171966 branch from e7e0859 to ed73719 Compare

August 23, 2024 18:00


          Auto-generation of CUTLASS Extension Kernel Templates (pytorch#2932)

38e72ff

Summary:
X-link: facebookresearch/FBGEMM#33

Pull Request resolved: pytorch#2932

This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :

(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.

Reviewed By: ipiszy

Differential Revision: D60171966

Contributor

facebook-github-bot commented Aug 26, 2024

This pull request was exported from Phabricator. Differential Revision: D60171966

manishucsd force-pushed the export-D60171966 branch from ed73719 to 38e72ff Compare

August 26, 2024 17:01

facebook-github-bot closed this in

de845bf

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Aug 26, 2024

This pull request has been merged in de845bf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged