Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve bounds_check_indices for VBE #3388

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sryap
Copy link
Contributor

@sryap sryap commented Nov 18, 2024

Summary:
Instead of over launching thread blocks, use b_t_map to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that b_t_map is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass. In this diff, we call
generate_vbe_metdata twice (before bounds check and before forward
look up). These two calls can be fused into one. We will clean this
up in the subsequent diffs.

Differential Revision: D66084041

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

Copy link

netlify bot commented Nov 18, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 9b93585
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/67464236a166170008345b93
😎 Deploy Preview https://deploy-preview-3388--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 18, 2024
Summary:

X-link: facebookresearch/FBGEMM#477

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Differential Revision: D66084041
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 22, 2024
Summary:

X-link: facebookresearch/FBGEMM#477

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Differential Revision: D66084041
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 25, 2024
Summary:

X-link: facebookresearch/FBGEMM#477

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Reviewed By: Fiery

Differential Revision: D66084041
sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 25, 2024
Summary:

X-link: facebookresearch/FBGEMM#477

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Reviewed By: Fiery

Differential Revision: D66084041
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 25, 2024
Summary:

X-link: facebookresearch/FBGEMM#477

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Reviewed By: Fiery

Differential Revision: D66084041
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 25, 2024
Summary:

X-link: facebookresearch/FBGEMM#477

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Reviewed By: Fiery

Differential Revision: D66084041
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

sryap added a commit to sryap/FBGEMM that referenced this pull request Nov 26, 2024
Summary:

X-link: facebookresearch/FBGEMM#477

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Reviewed By: Fiery

Differential Revision: D66084041
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

Summary:

X-link: facebookresearch/FBGEMM#477

Instead of over launching thread blocks, use `b_t_map` to launch only
necessary thread blocks to increase occupancy for the VBE case

Note that `b_t_map` is necessary for the TBE look for the VBE case. It
is generated during the TBE forward pass.  In this diff, we call
`generate_vbe_metdata` twice (before bounds check and before forward
look up).  These two calls can be fused into one.  We will clean this
up in the subsequent diffs.

Reviewed By: Fiery

Differential Revision: D66084041
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66084041

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants