-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve bounds_check_indices for VBE #3388
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D66084041 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Summary: X-link: facebookresearch/FBGEMM#477 Instead of over launching thread blocks, use `b_t_map` to launch only necessary thread blocks to increase occupancy for the VBE case Note that `b_t_map` is necessary for the TBE look for the VBE case. It is generated during the TBE forward pass. In this diff, we call `generate_vbe_metdata` twice (before bounds check and before forward look up). These two calls can be fused into one. We will clean this up in the subsequent diffs. Differential Revision: D66084041
ae31edc
to
47a0923
Compare
This pull request was exported from Phabricator. Differential Revision: D66084041 |
Summary: X-link: facebookresearch/FBGEMM#477 Instead of over launching thread blocks, use `b_t_map` to launch only necessary thread blocks to increase occupancy for the VBE case Note that `b_t_map` is necessary for the TBE look for the VBE case. It is generated during the TBE forward pass. In this diff, we call `generate_vbe_metdata` twice (before bounds check and before forward look up). These two calls can be fused into one. We will clean this up in the subsequent diffs. Differential Revision: D66084041
47a0923
to
e00a881
Compare
This pull request was exported from Phabricator. Differential Revision: D66084041 |
e00a881
to
8e8360f
Compare
Summary: X-link: facebookresearch/FBGEMM#477 Instead of over launching thread blocks, use `b_t_map` to launch only necessary thread blocks to increase occupancy for the VBE case Note that `b_t_map` is necessary for the TBE look for the VBE case. It is generated during the TBE forward pass. In this diff, we call `generate_vbe_metdata` twice (before bounds check and before forward look up). These two calls can be fused into one. We will clean this up in the subsequent diffs. Reviewed By: Fiery Differential Revision: D66084041
Summary: X-link: facebookresearch/FBGEMM#477 Instead of over launching thread blocks, use `b_t_map` to launch only necessary thread blocks to increase occupancy for the VBE case Note that `b_t_map` is necessary for the TBE look for the VBE case. It is generated during the TBE forward pass. In this diff, we call `generate_vbe_metdata` twice (before bounds check and before forward look up). These two calls can be fused into one. We will clean this up in the subsequent diffs. Reviewed By: Fiery Differential Revision: D66084041
8e8360f
to
00498a0
Compare
This pull request was exported from Phabricator. Differential Revision: D66084041 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D66084041 |
Summary: X-link: facebookresearch/FBGEMM#477 Instead of over launching thread blocks, use `b_t_map` to launch only necessary thread blocks to increase occupancy for the VBE case Note that `b_t_map` is necessary for the TBE look for the VBE case. It is generated during the TBE forward pass. In this diff, we call `generate_vbe_metdata` twice (before bounds check and before forward look up). These two calls can be fused into one. We will clean this up in the subsequent diffs. Reviewed By: Fiery Differential Revision: D66084041
00498a0
to
fcfad1a
Compare
This pull request was exported from Phabricator. Differential Revision: D66084041 |
Summary: X-link: facebookresearch/FBGEMM#477 Instead of over launching thread blocks, use `b_t_map` to launch only necessary thread blocks to increase occupancy for the VBE case Note that `b_t_map` is necessary for the TBE look for the VBE case. It is generated during the TBE forward pass. In this diff, we call `generate_vbe_metdata` twice (before bounds check and before forward look up). These two calls can be fused into one. We will clean this up in the subsequent diffs. Reviewed By: Fiery Differential Revision: D66084041
fcfad1a
to
faf3e7b
Compare
This pull request was exported from Phabricator. Differential Revision: D66084041 |
Summary: X-link: facebookresearch/FBGEMM#477 Instead of over launching thread blocks, use `b_t_map` to launch only necessary thread blocks to increase occupancy for the VBE case Note that `b_t_map` is necessary for the TBE look for the VBE case. It is generated during the TBE forward pass. In this diff, we call `generate_vbe_metdata` twice (before bounds check and before forward look up). These two calls can be fused into one. We will clean this up in the subsequent diffs. Reviewed By: Fiery Differential Revision: D66084041
faf3e7b
to
e081512
Compare
This pull request was exported from Phabricator. Differential Revision: D66084041 |
Summary: X-link: facebookresearch/FBGEMM#477 Instead of over launching thread blocks, use `b_t_map` to launch only necessary thread blocks to increase occupancy for the VBE case Note that `b_t_map` is necessary for the TBE look for the VBE case. It is generated during the TBE forward pass. In this diff, we call `generate_vbe_metdata` twice (before bounds check and before forward look up). These two calls can be fused into one. We will clean this up in the subsequent diffs. Reviewed By: Fiery Differential Revision: D66084041
e081512
to
9b93585
Compare
This pull request was exported from Phabricator. Differential Revision: D66084041 |
Summary:
Instead of over launching thread blocks, use
b_t_map
to launch onlynecessary thread blocks to increase occupancy for the VBE case
Note that
b_t_map
is necessary for the TBE look for the VBE case. Itis generated during the TBE forward pass. In this diff, we call
generate_vbe_metdata
twice (before bounds check and before forwardlook up). These two calls can be fused into one. We will clean this
up in the subsequent diffs.
Differential Revision: D66084041