-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] linker complains about duplicate symbol get_cache_idx #1490
Comments
@alexanderguzhva this is indeed strange because the anonymous namespace should be protecting us from this compile error. Does adding |
@cjnolet A kernel cannot be inlined, can it be? |
@alexanderguzhva oddly, I'm not able to reproduce the error when building FAISS. Is there a specific branch I can check out? What version of gcc are you using? I'm on CUDA 11.8 but I can also try 11.4. I verified I'm able to build raft successfully by removing the offending kernel from the anonymous namespace and adding |
A kernel can be inlined, i.e, this works: In general, I think we should prefer inline over an anonymous namespace. With an anonymous namespace, a copy of the kernel is included in each translation unit and also in the final linked object. With inline, duplicate definitions should be deduplicated when linked. @cjnolet : note that the linker in the bug report, |
I cannot reproduce the linker error. Building with clang 11.1 and nvcc 11.8. Both test files have raft::cache::(anonymous namespace)::get_cache_idx defined though.. $ nm -g -C cpp/build/CMakeFiles/UTILS_TEST.dir/test/util/get_cache_idx_1.cu.o | grep get_cache_idx U __cudaRegisterLinkedBinary_667dac6a_18_get_cache_idx_1_cu_eeb3336e_3832672 0000000000000000 D __fatbinwrap_667dac6a_18_get_cache_idx_1_cu_eeb3336e_3832672 0000000000000010 T __device_stub__ZN4raft5cache59_GLOBAL__N__667dac6a_18_get_cache_idx_1_cu_eeb3336e_383267213get_cache_idxEPiiS2_iiS2_S2_Pbi(int*, int, int*, int, int, int*, int*, bool*, int) 0000000000000000 T raft::cache::(anonymous namespace)::get_cache_idx(int*, int, int*, int, int, int*, int*, bool*, int) $ nm -g -C cpp/build/CMakeFiles/UTILS_TEST.dir/test/util/get_cache_idx_2.cu.o | grep get_cache_idx U __cudaRegisterLinkedBinary_74c80384_18_get_cache_idx_2_cu_a658dc80 0000000000000000 D __fatbinwrap_74c80384_18_get_cache_idx_2_cu_a658dc80 0000000000000010 T __device_stub__ZN4raft5cache51_GLOBAL__N__74c80384_18_get_cache_idx_2_cu_a658dc8013get_cache_idxEPiiS2_iiS2_S2_Pbi(int*, int, int*, int, int, int*, int*, bool*, int) 0000000000000000 T raft::cache::(anonymous namespace)::get_cache_idx(int*, int, int*, int, int, int*, int*, bool*, int)
This reverts commit 730b8a5.
@alexanderguzhva, I pushed PR #1492. Could you test if that PR fixes the issue for you? I was unfortunately not able to reproduce your issue, but I think that inlining the kernel should fix any linker errors. |
@ahendriksen #1492 seems to fix the problem. Thanks! |
Related to issue #1490. The `get_cache_idx` kernel is currently defined in an anonymous namespace. In #1490, it was reported that this could lead to linker errors in CUDA 11.4 in combination with (presumably?) clang. I tried to reproduce the linker error in commit 01cf409 but could not, as described in the commit message. This PR attempts to fix the linker errors by marking `get_cache_idx` as inline and by removing the anonymous namespace. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1492
Related to issues rapidsai#1511 and rapidsai#1490. Should perhaps make this static (can remove weak linkage) or hidden (should keep weak linkage). (See issue rapidsai#1722)
Fixes issue #1511. Make get_cache_idx a weak symbol (to allow linking multiple symbols) without marking it inline (to avoid compilation warnings that are promoted to errors in nvcc 12). Related issues: - #1490 - #1722 Related PRs: - #1732 - #1492 Authors: - Allard Hendriksen (https://github.com/ahendriksen) - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: #1733
Describe the bug
Many errors of the following kind:
Expected behavior
No errors
Environment details (please complete the following information):
upstream hash is "a98295b516ef58bc855177077860bab2a2a76d77" (Apr 12 ?)
CUDA 11.4.2
Additional context
The error can be fixed in a hacky way by turning
into
I haven't tried moving it to a source file.
The text was updated successfully, but these errors were encountered: