-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve perf of kernel: _block_bucketize_sparse_features_cuda_kernel1 #2331
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
577c47f
to
4e4fd52
Compare
This pull request was exported from Phabricator. Differential Revision: D53585964 |
4e4fd52
to
8df7b3e
Compare
This pull request was exported from Phabricator. Differential Revision: D53585964 |
8df7b3e
to
efdc754
Compare
This pull request was exported from Phabricator. Differential Revision: D53585964 |
efdc754
to
fa2eb58
Compare
This pull request was exported from Phabricator. Differential Revision: D53585964 |
…pytorch#2331) Summary: As titled, this diffs changes kernel to parallelize the work of _block_bucketize_sparse_features_cuda_kernel1 within a row. The context here is that for IG DV365 models, we have row that is really long which makes the kernel slow. I need to think more about how to improve _block_bucketize_sparse_features_cuda_kernel**2**. Publishing the changes for kernel 1 first because it is significantly faster {F1455708370} Reviewed By: sryap Differential Revision: D53585964
fa2eb58
to
5c08185
Compare
This pull request was exported from Phabricator. Differential Revision: D53585964 |
…pytorch#2331) Summary: As titled, this diffs changes kernel to parallelize the work of _block_bucketize_sparse_features_cuda_kernel1 within a row. The context here is that for IG DV365 models, we have row that is really long which makes the kernel slow. I need to think more about how to improve _block_bucketize_sparse_features_cuda_kernel**2**. Publishing the changes for kernel 1 first because it is significantly faster {F1455708370} Reviewed By: sryap Differential Revision: D53585964
5c08185
to
bb92b3b
Compare
This pull request was exported from Phabricator. Differential Revision: D53585964 |
This pull request has been merged in bbf83f1. |
Summary:
As titled, this diffs changes kernel to parallelize the work of _block_bucketize_sparse_features_cuda_kernel1 within a row.
The context here is that for IG DV365 models, we have row that is really long which makes the kernel slow.
I need to think more about how to improve _block_bucketize_sparse_features_cuda_kernel2. Publishing the changes for kernel 1 first because it is significantly faster
{F1455708370}
Differential Revision: D53585964