-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-generation of CUTLASS Extension Kernel Templates #2932
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request was exported from Phabricator. Differential Revision: D60171966 |
This pull request was exported from Phabricator. Differential Revision: D60171966 |
acc1311
to
daa1c8f
Compare
Summary: X-link: facebookresearch/FBGEMM#33 Pull Request resolved: pytorch#2932 This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following : (a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo. Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966 |
Summary: X-link: facebookresearch/FBGEMM#33 Pull Request resolved: pytorch#2932 This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following : (a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo. Differential Revision: D60171966
daa1c8f
to
02e06b4
Compare
This pull request was exported from Phabricator. Differential Revision: D60171966 |
Summary: X-link: facebookresearch/FBGEMM#33 Pull Request resolved: pytorch#2932 This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following : (a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo. Differential Revision: D60171966
02e06b4
to
0286e0a
Compare
This pull request was exported from Phabricator. Differential Revision: D60171966 |
0286e0a
to
e7e0859
Compare
Summary: X-link: facebookresearch/FBGEMM#33 Pull Request resolved: pytorch#2932 This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following : (a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo. Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966 |
Summary: X-link: facebookresearch/FBGEMM#33 Pull Request resolved: pytorch#2932 This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following : (a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo. Differential Revision: D60171966
e7e0859
to
ed73719
Compare
Summary: X-link: facebookresearch/FBGEMM#33 Pull Request resolved: pytorch#2932 This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following : (a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand. (b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases. (c) Confirm with CUTLASS's device-side API to allow use to perturb all the template parameters that CUTLASS allows. (d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo. Reviewed By: ipiszy Differential Revision: D60171966
This pull request was exported from Phabricator. Differential Revision: D60171966 |
ed73719
to
38e72ff
Compare
This pull request has been merged in de845bf. |
Summary:
This diff allows cutlass_extension to use configuration-based auto-instance generation. The diff aims to achieve the following :
(a) Many kernels needs to be instanced varying the template arguments and it is hard to instance them all by hand.
(b) Use and extend OSS NVIDIA scripts for FBGEMM (Meta AI) use cases.
(c) Conform with CUTLASS's device-side API to allow us to sweep all the template parameters that CUTLASS allows.
(d) The bullet (b) and (c) allows us to bring our internal usage close to the NVIDIA/CUTLASS and we can upstream our kernels quickly to NVIDIA/CUTLASS repo.
Differential Revision: D60171966