forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifu 2022 08 10 #15
Merged
Merged
Ifu 2022 08 10 #15
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: Pull Request resolved: pytorch#1226 As title Reviewed By: brad-mengchi Differential Revision: D38248038 fbshipit-source-id: 21e40b5f8d38126b6004a20ddb94637c62177ef0
…nference kernel (pytorch#1224) Summary: Pull Request resolved: pytorch#1224 While this is technically a reference implementation, I'm sure it'll show up somewhere and this at least gives a ~2-3x improvement in perf for parameters I looked at. Reviewed By: jianyuh, yuguo68 Differential Revision: D38186856 fbshipit-source-id: 4c734a06fc7c40d54a3aac5a7a6e422cb8e7e194
Summary: Pull Request resolved: pytorch#1225 Original commit changeset: ce8b7f9658c9 Original Phabricator Diff: D37988742 (pytorch@308dd51) As commented in D37988742 (pytorch@308dd51), for alignment=16 + stride=16 case, it might be still efficient to do contiguous() instead of copy. Long term, we still need to add alignment access support in Vec4T accessor. Reviewed By: xing-liu, sryap Differential Revision: D38218502 fbshipit-source-id: 703804846beb04eadbd1af528e990fbce89e30e4
Summary: Pull Request resolved: pytorch#1229 * support long type for dense_to_jagged cpu op, as mirrored in GPU. Reviewed By: xing-liu Differential Revision: D38290666 fbshipit-source-id: c81e071f5fd38126812487305f63900b6dc7d08f
Summary: Pull Request resolved: pytorch#1233 Extract asmjit/asmjit.h usages into a separate header file and make sure it is not called from PyTorch directly. This Diff/PR is to unblock the FBGEMM third party upgrade in PyTorch (e.g., pytorch/pytorch#82396), where it consistantly reports the <asmjit/asmjit.h> imports error (pytorch#1227): ``` [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/all_microkernels.dir/src/qs8-gemm/gen/4x4-minmax-fp32-scalar-imagic.c.o In file included from /var/lib/jenkins/workspace/caffe2/quantization/server/group_norm_dnnlowp_op_avx2.cc:10: In file included from /var/lib/jenkins/workspace/third_party/fbgemm/include/fbgemm/QuantUtils.h:6: /var/lib/jenkins/workspace/third_party/fbgemm/include/fbgemm/./Utils.h:18:10: fatal error: 'asmjit/asmjit.h' file not found #include <asmjit/asmjit.h> ^~~~~~~~~~~~~~~~~ 1 error generated. make[2]: *** [caffe2/quantization/server/CMakeFiles/caffe2_dnnlowp_avx2_ops.dir/build.make:104: caffe2/quantization/server/CMakeFiles/caffe2_dnnlowp_avx2_ops.dir/group_norm_dnnlowp_op_avx2.cc.o] Error 1 make[2]: *** Waiting for unfinished jobs.... [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/qc8-igemm/gen/4x16c8-minmax-fp32-avx512skx.c.o ``` This is because we don't have the direct asmjit dependency in PyTorch. The workaround is to spin off Utils.h to SimdUtils.h where the original Utils.h included in PyTorch files doesn't have asmjit.h dependency any longer. Reviewed By: jasonjk-park, jiyuanzFB Differential Revision: D38341763 fbshipit-source-id: 77bf92206d350b220dfe13d7a5488713b33e7b47
Summary: Pull Request resolved: pytorch#1234 This patch adds multiple CUDA versions for OSS CI (build only). Previously on the OSS side, we tested only 11.3, but this patch expands it to 11.3, 11.5, and 11.6. This CI is useful. For example, it revealed an issue in `cub_namespace_postfix.cuh`: this change was cherry-picked for CUB 1.13.1 (https://github.com/NVIDIA/cub/blob/1.13.X/cub/util_namespace.cuh), so the `CUB_VERSION` condition should be `CUB_VERSION >= 101301`. Q&A - Why don't we test multiple ROCm versions? - PyTorch provides two versions for ROCm at this point: 5.0 and 5.1.1, but FBGEMM does not support ROCm 5.0. - Why don't we test CUDA 11.2 or older? - I am not sure if FBGEMM supports CUDA <= 11.2. If needed, we can add it. - Why don't we test CUDA 11.4? - PyTorch does not have nightly CUDA 11.4 build for some reasons at present. - Why don't we test CUDA 11.7? - It has some compilation issues. Reviewed By: jianyuh, mjanderson09 Differential Revision: D38359867 fbshipit-source-id: 2a1b4735b0c31e87721ecb77cda1a8d907969afc
Summary: The old error message may be confusing when the error is not that `fbgemm_gpu_py.so` cannot be found, e.g. CPU version of pytorch is installed and `libc10_cuda.so` cannot be found. User could still be able to find out that `fbgemm_gpu_py.so` is not there, as the error message now is: ``` /path/to/fbgemm_gpu/fbgemm_gpu_py.so: cannot open shared object file: No such file or directory ``` Thank you for your time on this PR:) Pull Request resolved: pytorch#1235 Reviewed By: mjanderson09 Differential Revision: D38389494 Pulled By: brad-mengchi fbshipit-source-id: 55738a45a4bb5d9d385e8ff082b1deb9c2f5e7dc
Summary: Pull Request resolved: pytorch#1237 - pytorch has not updated their CUDA 11.5 .whl files with latest pytorch (1.13), while they have updated their CUDA 11.3 .whl files - pytorch built with CUDA 11.3 will work with later CUDA versions, as CUDA is backward compatable - Suggest we set a policy where we support building/installing FBGEMM with pytorch .whl which is built with CUDA 11.3 as this seems to be their preferred build. This guidance in our README and also torchrec README. - This diff updates CI to install pytorch .whl which was built with CUDA 11.3, even when testing FBGEMM with later CUDA versions Reviewed By: shintaro-iwasaki Differential Revision: D38416122 fbshipit-source-id: 5dbba43e7b692a87205a2adc9b7b525dffd91569
Summary: Pull Request resolved: pytorch#1232 Sequence Embedding on CPU - Enabled FP32/FP16/INT8 versions - Scalar embedding implementation ( performance may be slow) Reviewed By: jianyuh Differential Revision: D36049936 fbshipit-source-id: c15c0d9b9fa214939b262cf9c7ec543c4848b4c5
Summary: Pull Request resolved: pytorch#1242 Reviewed By: ajtulloch Differential Revision: D38560260 fbshipit-source-id: 321da1a8894dc336684f9cc945fab658ec776a6c
Summary: Pull Request resolved: pytorch#1243 Reviewed By: jiyuanzFB Differential Revision: D38561337 fbshipit-source-id: 2d24278014d39318f5e6f76f17cf57760e1866c4
LGTM |
liligwu
added a commit
that referenced
this pull request
Feb 8, 2023
* Enable merge_pooled_embeddings op. in ROCm * Enabling the merge pool ops. Co-authored-by: liligwu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No conflict, all tests passed
20220728_IFU_tests.log.txt
.