-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable sequence TBE CPU via AVX #2195
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs canceled.
|
This pull request was exported from Phabricator. Differential Revision: D51918878 |
c1cfda8
to
13764e7
Compare
Summary: Instead of using the ref implementation for sequence embedding on CPU, this diff directs TBE to invoke the AVX implementation by forcing pooling factors of 1 (i.e., passing `at::arange(index_size)` as offfsets). The performance gained from using the AVX implementation offsets the overhead incurs in creating the new offsets. Differential Revision: D51918878
This pull request was exported from Phabricator. Differential Revision: D51918878 |
Summary: Instead of using the ref implementation for sequence embedding on CPU, this diff directs TBE to invoke the AVX implementation by forcing pooling factors of 1 (i.e., passing `at::arange(index_size)` as offfsets). The performance gained from using the AVX implementation offsets the overhead incurs in creating the new offsets. Differential Revision: D51918878
Summary: Instead of using the ref implementation for sequence embedding on CPU, this diff directs TBE to invoke the AVX implementation by forcing pooling factors of 1 (i.e., passing `at::arange(index_size)` as offfsets). The performance gained from using the AVX implementation offsets the overhead incurs in creating the new offsets. Differential Revision: D51918878
13764e7
to
7538611
Compare
Summary: Instead of using the ref implementation for sequence embedding on CPU, this diff directs TBE to invoke the AVX implementation of pooled TBE by forcing pooling factors of 1 (i.e., passing `at::arange(index_size + 1)` as offfsets). The performance gained from using the AVX implementation offsets the overhead incurs in creating the new offsets. Reviewed By: jspark1105 Differential Revision: D51918878
This pull request was exported from Phabricator. Differential Revision: D51918878 |
Summary: Instead of using the ref implementation for sequence embedding on CPU, this diff directs TBE to invoke the AVX implementation of pooled TBE by forcing pooling factors of 1 (i.e., passing `at::arange(index_size + 1)` as offfsets). The performance gained from using the AVX implementation offsets the overhead incurs in creating the new offsets. Reviewed By: jspark1105 Differential Revision: D51918878
7538611
to
9f03fc8
Compare
Summary: Instead of using the ref implementation for sequence embedding on CPU, this diff directs TBE to invoke the AVX implementation of pooled TBE by forcing pooling factors of 1 (i.e., passing `at::arange(index_size + 1)` as offfsets). The performance gained from using the AVX implementation offsets the overhead incurs in creating the new offsets. Reviewed By: jspark1105 Differential Revision: D51918878
This pull request was exported from Phabricator. Differential Revision: D51918878 |
9f03fc8
to
c11f5c7
Compare
This pull request was exported from Phabricator. Differential Revision: D51918878 |
Summary: Instead of using the ref implementation for sequence embedding on CPU, this diff directs TBE to invoke the AVX implementation of pooled TBE by forcing pooling factors of 1 (i.e., passing `at::arange(index_size + 1)` as offfsets). The performance gained from using the AVX implementation offsets the overhead incurs in creating the new offsets. Reviewed By: jspark1105, YazhiGao Differential Revision: D51918878
This pull request has been merged in f8bd441. |
Summary:
Instead of using the ref implementation for sequence embedding on CPU,
this diff directs TBE to invoke the AVX implementation by forcing
pooling factors of 1 (i.e., passing
at::arange(index_size)
asofffsets). The performance gained from using the AVX implementation
offsets the overhead incurs in creating the new offsets.
Differential Revision: D51918878