Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sequence TBE CPU via AVX #2195

Closed
wants to merge 1 commit into from
Closed

Conversation

sryap
Copy link
Contributor

@sryap sryap commented Dec 7, 2023

Summary:
Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing at::arange(index_size) as
offfsets). The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.

Differential Revision: D51918878

Copy link

netlify bot commented Dec 7, 2023

Deploy Preview for pytorch-fbgemm-docs canceled.

Name Link
🔨 Latest commit c11f5c7
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/65724152380e5300076beebe

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51918878

sryap pushed a commit to sryap/FBGEMM that referenced this pull request Dec 7, 2023
Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing `at::arange(index_size)` as
offfsets).  The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.

Differential Revision: D51918878
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51918878

sryap pushed a commit to sryap/FBGEMM that referenced this pull request Dec 7, 2023
Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing `at::arange(index_size)` as
offfsets).  The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.

Differential Revision: D51918878
sryap pushed a commit to sryap/FBGEMM that referenced this pull request Dec 7, 2023
Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing `at::arange(index_size)` as
offfsets).  The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.

Differential Revision: D51918878
sryap pushed a commit to sryap/FBGEMM that referenced this pull request Dec 7, 2023
Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation of pooled TBE
by forcing pooling factors of 1 (i.e., passing `at::arange(index_size +
1)` as offfsets).  The performance gained from using the AVX
implementation offsets the overhead incurs in creating the new
offsets.

Reviewed By: jspark1105

Differential Revision: D51918878
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51918878

Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation of pooled TBE
by forcing pooling factors of 1 (i.e., passing `at::arange(index_size +
1)` as offfsets).  The performance gained from using the AVX
implementation offsets the overhead incurs in creating the new
offsets.

Reviewed By: jspark1105

Differential Revision: D51918878
sryap pushed a commit to sryap/FBGEMM that referenced this pull request Dec 7, 2023
Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation of pooled TBE
by forcing pooling factors of 1 (i.e., passing `at::arange(index_size +
1)` as offfsets).  The performance gained from using the AVX
implementation offsets the overhead incurs in creating the new
offsets.

Reviewed By: jspark1105

Differential Revision: D51918878
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51918878

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51918878

sryap pushed a commit to sryap/FBGEMM that referenced this pull request Dec 8, 2023
Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation of pooled TBE
by forcing pooling factors of 1 (i.e., passing `at::arange(index_size +
1)`
as offfsets).  The performance gained from using the AVX
implementation offsets the overhead incurs in creating the new
offsets.

Reviewed By: jspark1105, YazhiGao

Differential Revision: D51918878
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f8bd441.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants