New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Improve perf of kernel: _block_bucketize_sparse_features_cuda_kernel1 #2331

Closed

AlbertDachiChen wants to merge 1 commit into pytorch:main from AlbertDachiChen:export-D53585964

Contributor

AlbertDachiChen commented Feb 13, 2024

Summary:
As titled, this diffs changes kernel to parallelize the work of _block_bucketize_sparse_features_cuda_kernel1 within a row.

The context here is that for IG DV365 models, we have row that is really long which makes the kernel slow.

I need to think more about how to improve _block_bucketize_sparse_features_cuda_kernel2. Publishing the changes for kernel 1 first because it is significantly faster

{F1455708370}

Differential Revision: D53585964

netlify bot commented Feb 13, 2024 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`bb92b3b`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/65cfcef38abe6b00085d3940
😎 Deploy Preview	https://deploy-preview-2331--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot added the cla signed label

AlbertDachiChen force-pushed the export-D53585964 branch 2 times, most recently from 577c47f to 4e4fd52 Compare

February 13, 2024 22:11

Contributor

facebook-github-bot commented Feb 13, 2024

This pull request was exported from Phabricator. Differential Revision: D53585964

facebook-github-bot added the fb-exported label

AlbertDachiChen force-pushed the export-D53585964 branch from 4e4fd52 to 8df7b3e Compare

February 13, 2024 22:11

Contributor

facebook-github-bot commented Feb 13, 2024

This pull request was exported from Phabricator. Differential Revision: D53585964

AlbertDachiChen force-pushed the export-D53585964 branch from 8df7b3e to efdc754 Compare

February 15, 2024 15:46

Contributor

facebook-github-bot commented Feb 15, 2024

This pull request was exported from Phabricator. Differential Revision: D53585964

AlbertDachiChen force-pushed the export-D53585964 branch from efdc754 to fa2eb58 Compare

February 15, 2024 15:47

Contributor

facebook-github-bot commented Feb 15, 2024

This pull request was exported from Phabricator. Differential Revision: D53585964

AlbertDachiChen added a commit to AlbertDachiChen/FBGEMM that referenced this pull request


          Improve perf of kernel: _block_bucketize_sparse_features_cuda_kernel1 (…

5c08185

…pytorch#2331)

Summary:

As titled, this diffs changes kernel to parallelize the work of _block_bucketize_sparse_features_cuda_kernel1 within a row.

The context here is that for IG DV365 models, we have row that is really long which makes the kernel slow.

I need to think more about how to improve _block_bucketize_sparse_features_cuda_kernel**2**. Publishing the changes for kernel 1 first because it is significantly faster 

 {F1455708370}

Reviewed By: sryap

Differential Revision: D53585964

AlbertDachiChen force-pushed the export-D53585964 branch from fa2eb58 to 5c08185 Compare

February 16, 2024 21:08

Contributor

facebook-github-bot commented Feb 16, 2024

This pull request was exported from Phabricator. Differential Revision: D53585964


          Improve perf of kernel: _block_bucketize_sparse_features_cuda_kernel1 (…

bb92b3b

…pytorch#2331)

Summary:

As titled, this diffs changes kernel to parallelize the work of _block_bucketize_sparse_features_cuda_kernel1 within a row.

The context here is that for IG DV365 models, we have row that is really long which makes the kernel slow.

I need to think more about how to improve _block_bucketize_sparse_features_cuda_kernel**2**. Publishing the changes for kernel 1 first because it is significantly faster 

 {F1455708370}

Reviewed By: sryap

Differential Revision: D53585964

AlbertDachiChen force-pushed the export-D53585964 branch from 5c08185 to bb92b3b Compare

February 16, 2024 21:09

Contributor

facebook-github-bot commented Feb 16, 2024

This pull request was exported from Phabricator. Differential Revision: D53585964

facebook-github-bot closed this in

bbf83f1

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Feb 20, 2024

This pull request has been merged in bbf83f1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged