New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Enable sequence TBE CPU via AVX #2195

Closed

sryap wants to merge 1 commit into pytorch:main from sryap:export-D51918878

Contributor

sryap commented Dec 7, 2023

Summary:
Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing at::arange(index_size) as
offfsets). The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.

Differential Revision: D51918878

netlify bot commented Dec 7, 2023 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs canceled.

Name	Link
🔨 Latest commit	`c11f5c7`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/65724152380e5300076beebe

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Dec 7, 2023

This pull request was exported from Phabricator. Differential Revision: D51918878

facebook-github-bot added the fb-exported label

sryap force-pushed the export-D51918878 branch from c1cfda8 to 13764e7 Compare

December 7, 2023 01:41

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Enable sequence TBE CPU via AVX (pytorch#2195)

13764e7

Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing `at::arange(index_size)` as
offfsets).  The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.

Differential Revision: D51918878

Contributor

facebook-github-bot commented Dec 7, 2023

This pull request was exported from Phabricator. Differential Revision: D51918878

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Enable sequence TBE CPU via AVX (pytorch#2195)

2f0d007

Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing `at::arange(index_size)` as
offfsets).  The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.

Differential Revision: D51918878

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Enable sequence TBE CPU via AVX (pytorch#2195)

c2f64ec

Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing `at::arange(index_size)` as
offfsets).  The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.

Differential Revision: D51918878

sryap force-pushed the export-D51918878 branch from 13764e7 to 7538611 Compare

December 7, 2023 18:27

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Enable sequence TBE CPU via AVX (pytorch#2195)

Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation of pooled TBE
by forcing pooling factors of 1 (i.e., passing `at::arange(index_size +
1)` as offfsets).  The performance gained from using the AVX
implementation offsets the overhead incurs in creating the new
offsets.

Reviewed By: jspark1105

Differential Revision: D51918878

Contributor

facebook-github-bot commented Dec 7, 2023

This pull request was exported from Phabricator. Differential Revision: D51918878


          Enable sequence TBE CPU via AVX (pytorch#2195)

c11f5c7

Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation of pooled TBE
by forcing pooling factors of 1 (i.e., passing `at::arange(index_size +
1)` as offfsets).  The performance gained from using the AVX
implementation offsets the overhead incurs in creating the new
offsets.

Reviewed By: jspark1105

Differential Revision: D51918878

sryap force-pushed the export-D51918878 branch from 7538611 to 9f03fc8 Compare

December 7, 2023 22:03

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Enable sequence TBE CPU via AVX (pytorch#2195)

9f03fc8

Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation of pooled TBE
by forcing pooling factors of 1 (i.e., passing `at::arange(index_size +
1)` as offfsets).  The performance gained from using the AVX
implementation offsets the overhead incurs in creating the new
offsets.

Reviewed By: jspark1105

Differential Revision: D51918878

Contributor

facebook-github-bot commented Dec 7, 2023

This pull request was exported from Phabricator. Differential Revision: D51918878

sryap force-pushed the export-D51918878 branch from 9f03fc8 to c11f5c7 Compare

December 7, 2023 22:04

Contributor

facebook-github-bot commented Dec 7, 2023

This pull request was exported from Phabricator. Differential Revision: D51918878

sryap pushed a commit to sryap/FBGEMM that referenced this pull request


          Enable sequence TBE CPU via AVX (pytorch#2195)

5a79051

Summary:

Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation of pooled TBE
by forcing pooling factors of 1 (i.e., passing `at::arange(index_size +
1)`
as offfsets).  The performance gained from using the AVX
implementation offsets the overhead incurs in creating the new
offsets.

Reviewed By: jspark1105, YazhiGao

Differential Revision: D51918878

facebook-github-bot closed this in

f8bd441

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Dec 8, 2023

This pull request has been merged in f8bd441.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged