Ifu 2022 08 10 #15

liligwu · 2022-08-10T22:58:17Z

No conflict, all tests passed
20220728_IFU_tests.log.txt
.

Summary: Pull Request resolved: pytorch#1226 As title Reviewed By: brad-mengchi Differential Revision: D38248038 fbshipit-source-id: 21e40b5f8d38126b6004a20ddb94637c62177ef0

…nference kernel (pytorch#1224) Summary: Pull Request resolved: pytorch#1224 While this is technically a reference implementation, I'm sure it'll show up somewhere and this at least gives a ~2-3x improvement in perf for parameters I looked at. Reviewed By: jianyuh, yuguo68 Differential Revision: D38186856 fbshipit-source-id: 4c734a06fc7c40d54a3aac5a7a6e422cb8e7e194

Summary: Pull Request resolved: pytorch#1225 Original commit changeset: ce8b7f9658c9 Original Phabricator Diff: D37988742 (pytorch@308dd51) As commented in D37988742 (pytorch@308dd51), for alignment=16 + stride=16 case, it might be still efficient to do contiguous() instead of copy. Long term, we still need to add alignment access support in Vec4T accessor. Reviewed By: xing-liu, sryap Differential Revision: D38218502 fbshipit-source-id: 703804846beb04eadbd1af528e990fbce89e30e4

Summary: Pull Request resolved: pytorch#1229 * support long type for dense_to_jagged cpu op, as mirrored in GPU. Reviewed By: xing-liu Differential Revision: D38290666 fbshipit-source-id: c81e071f5fd38126812487305f63900b6dc7d08f

Summary: Pull Request resolved: pytorch#1233 Extract asmjit/asmjit.h usages into a separate header file and make sure it is not called from PyTorch directly. This Diff/PR is to unblock the FBGEMM third party upgrade in PyTorch (e.g., pytorch/pytorch#82396), where it consistantly reports the <asmjit/asmjit.h> imports error (pytorch#1227): ``` [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/all_microkernels.dir/src/qs8-gemm/gen/4x4-minmax-fp32-scalar-imagic.c.o In file included from /var/lib/jenkins/workspace/caffe2/quantization/server/group_norm_dnnlowp_op_avx2.cc:10: In file included from /var/lib/jenkins/workspace/third_party/fbgemm/include/fbgemm/QuantUtils.h:6: /var/lib/jenkins/workspace/third_party/fbgemm/include/fbgemm/./Utils.h:18:10: fatal error: 'asmjit/asmjit.h' file not found #include <asmjit/asmjit.h> ^~~~~~~~~~~~~~~~~ 1 error generated. make[2]: *** [caffe2/quantization/server/CMakeFiles/caffe2_dnnlowp_avx2_ops.dir/build.make:104: caffe2/quantization/server/CMakeFiles/caffe2_dnnlowp_avx2_ops.dir/group_norm_dnnlowp_op_avx2.cc.o] Error 1 make[2]: *** Waiting for unfinished jobs.... [ 40%] Building C object confu-deps/XNNPACK/CMakeFiles/XNNPACK.dir/src/qc8-igemm/gen/4x16c8-minmax-fp32-avx512skx.c.o ``` This is because we don't have the direct asmjit dependency in PyTorch. The workaround is to spin off Utils.h to SimdUtils.h where the original Utils.h included in PyTorch files doesn't have asmjit.h dependency any longer. Reviewed By: jasonjk-park, jiyuanzFB Differential Revision: D38341763 fbshipit-source-id: 77bf92206d350b220dfe13d7a5488713b33e7b47

Summary: Pull Request resolved: pytorch#1234 This patch adds multiple CUDA versions for OSS CI (build only). Previously on the OSS side, we tested only 11.3, but this patch expands it to 11.3, 11.5, and 11.6. This CI is useful. For example, it revealed an issue in `cub_namespace_postfix.cuh`: this change was cherry-picked for CUB 1.13.1 (https://github.com/NVIDIA/cub/blob/1.13.X/cub/util_namespace.cuh), so the `CUB_VERSION` condition should be `CUB_VERSION >= 101301`. Q&A - Why don't we test multiple ROCm versions? - PyTorch provides two versions for ROCm at this point: 5.0 and 5.1.1, but FBGEMM does not support ROCm 5.0. - Why don't we test CUDA 11.2 or older? - I am not sure if FBGEMM supports CUDA <= 11.2. If needed, we can add it. - Why don't we test CUDA 11.4? - PyTorch does not have nightly CUDA 11.4 build for some reasons at present. - Why don't we test CUDA 11.7? - It has some compilation issues. Reviewed By: jianyuh, mjanderson09 Differential Revision: D38359867 fbshipit-source-id: 2a1b4735b0c31e87721ecb77cda1a8d907969afc

Summary: The old error message may be confusing when the error is not that `fbgemm_gpu_py.so` cannot be found, e.g. CPU version of pytorch is installed and `libc10_cuda.so` cannot be found. User could still be able to find out that `fbgemm_gpu_py.so` is not there, as the error message now is: ``` /path/to/fbgemm_gpu/fbgemm_gpu_py.so: cannot open shared object file: No such file or directory ``` Thank you for your time on this PR:) Pull Request resolved: pytorch#1235 Reviewed By: mjanderson09 Differential Revision: D38389494 Pulled By: brad-mengchi fbshipit-source-id: 55738a45a4bb5d9d385e8ff082b1deb9c2f5e7dc

Summary: Pull Request resolved: pytorch#1237 - pytorch has not updated their CUDA 11.5 .whl files with latest pytorch (1.13), while they have updated their CUDA 11.3 .whl files - pytorch built with CUDA 11.3 will work with later CUDA versions, as CUDA is backward compatable - Suggest we set a policy where we support building/installing FBGEMM with pytorch .whl which is built with CUDA 11.3 as this seems to be their preferred build. This guidance in our README and also torchrec README. - This diff updates CI to install pytorch .whl which was built with CUDA 11.3, even when testing FBGEMM with later CUDA versions Reviewed By: shintaro-iwasaki Differential Revision: D38416122 fbshipit-source-id: 5dbba43e7b692a87205a2adc9b7b525dffd91569

Summary: Pull Request resolved: pytorch#1232 Sequence Embedding on CPU - Enabled FP32/FP16/INT8 versions - Scalar embedding implementation ( performance may be slow) Reviewed By: jianyuh Differential Revision: D36049936 fbshipit-source-id: c15c0d9b9fa214939b262cf9c7ec543c4848b4c5

Summary: Pull Request resolved: pytorch#1242 Reviewed By: ajtulloch Differential Revision: D38560260 fbshipit-source-id: 321da1a8894dc336684f9cc945fab658ec776a6c

Summary: Pull Request resolved: pytorch#1243 Reviewed By: jiyuanzFB Differential Revision: D38561337 fbshipit-source-id: 2d24278014d39318f5e6f76f17cf57760e1866c4

…into main

amathews-amd · 2022-08-11T17:17:25Z

LGTM

* Enable merge_pooled_embeddings op. in ROCm * Enabling the merge pool ops. Co-authored-by: liligwu <[email protected]>

jianyuh and others added 16 commits July 28, 2022 20:00

Minor change to quantized comm (pytorch#1226)

f2edac2

Summary: Pull Request resolved: pytorch#1226 As title Reviewed By: brad-mengchi Differential Revision: D38248038 fbshipit-source-id: 21e40b5f8d38126b6004a20ddb94637c62177ef0

support dense_to_jagged long cpu (pytorch#1229)

5f96a6d

Summary: Pull Request resolved: pytorch#1229 * support long type for dense_to_jagged cpu op, as mirrored in GPU. Reviewed By: xing-liu Differential Revision: D38290666 fbshipit-source-id: c81e071f5fd38126812487305f63900b6dc7d08f

tiny fix for pruned array lookup benchmark (pytorch#1242)

6b8b870

Summary: Pull Request resolved: pytorch#1242 Reviewed By: ajtulloch Differential Revision: D38560260 fbshipit-source-id: 321da1a8894dc336684f9cc945fab658ec776a6c

improve the perf for pruned array lookup cpu (pytorch#1243)

73bceef

Summary: Pull Request resolved: pytorch#1243 Reviewed By: jiyuanzFB Differential Revision: D38561337 fbshipit-source-id: 2d24278014d39318f5e6f76f17cf57760e1866c4

Merge remote-tracking branch 'upstream/main' into IFU-2022-08-10

822015c

Merge remote-tracking branch 'origin/main' into IFU-2022-08-10

5202417

Merge branch 'main' of https://github.com/ROCmSoftwarePlatform/FBGEMM …

785a67f

…into main

Merge branch 'main' into IFU-2022-08-10

f1b1c7c

remove blank space in uvm_test.py

9f3bc44

liligwu merged commit 92c5f2e into main Aug 11, 2022

liligwu added a commit that referenced this pull request Feb 8, 2023

Enable merge_pooled_embeddings op. in ROCm (#15)

69abf78

* Enable merge_pooled_embeddings op. in ROCm * Enabling the merge pool ops. Co-authored-by: liligwu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ifu 2022 08 10 #15

Ifu 2022 08 10 #15

liligwu commented Aug 10, 2022

amathews-amd commented Aug 11, 2022

Ifu 2022 08 10 #15

Ifu 2022 08 10 #15

Conversation

liligwu commented Aug 10, 2022

amathews-amd commented Aug 11, 2022