Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] linker complains about duplicate symbol get_cache_idx #1490

Closed
alexanderguzhva opened this issue May 4, 2023 · 6 comments
Closed

[BUG] linker complains about duplicate symbol get_cache_idx #1490

alexanderguzhva opened this issue May 4, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@alexanderguzhva
Copy link
Contributor

Describe the bug
Many errors of the following kind:

ld.lld: error: duplicate symbol: raft::cache::(anonymous namespace)::get_cache_idx(int*, int, int*, int, int, int*, int*, bool*, int)

Expected behavior
No errors

Environment details (please complete the following information):
upstream hash is "a98295b516ef58bc855177077860bab2a2a76d77" (Apr 12 ?)
CUDA 11.4.2

Additional context
The error can be fixed in a hacky way by turning

__global__ void get_cache_idx(int* keys,
                              int n,
                              int* cached_keys,
                              int n_cache_sets,
                              int associativity,
                              int* cache_time,
                              int* cache_idx,
                              bool* is_cached,
                              int time)

into

template<typename T>
__global__ void get_cache_idx(T* keys,
                              T n,
                              T* cached_keys,
                              T n_cache_sets,
                              T associativity,
                              T* cache_time,
                              T* cache_idx,
                              bool* is_cached,
                              T time)

I haven't tried moving it to a source file.

@alexanderguzhva alexanderguzhva added the bug Something isn't working label May 4, 2023
@cjnolet
Copy link
Member

cjnolet commented May 4, 2023

@alexanderguzhva this is indeed strange because the anonymous namespace should be protecting us from this compile error. Does adding __inline__ to the definition help overcome this error for you? If so, we can remove the anonymous namespace and just use that.

@alexanderguzhva
Copy link
Contributor Author

@cjnolet A kernel cannot be inlined, can it be?

@cjnolet
Copy link
Member

cjnolet commented May 4, 2023

@alexanderguzhva oddly, I'm not able to reproduce the error when building FAISS. Is there a specific branch I can check out? What version of gcc are you using? I'm on CUDA 11.8 but I can also try 11.4.

I verified I'm able to build raft successfully by removing the offending kernel from the anonymous namespace and adding __forceinline__ to the definition. If I create a RAFT PR for this, can you give it a try and see if the error goes away?

@ahendriksen
Copy link
Contributor

A kernel cannot be inlined, can it be?

A kernel can be inlined, i.e, this works: inline __global__ void kernel() {}. It doesn't result in the kernel being literally "inlined" into a function, but it does change the linking behavior (symbol is defined as weak and may be omitted from symbol table by compiler).

In general, I think we should prefer inline over an anonymous namespace. With an anonymous namespace, a copy of the kernel is included in each translation unit and also in the final linked object. With inline, duplicate definitions should be deduplicated when linked.

@cjnolet : note that the linker in the bug report, ld.lld, is LLVM's linker. It might have slightly different behavior than GCC's linker that we use by default. In any case, clang also behaves slightly differently than gcc wrt private symbols. See for example the section "symbols" in this recent guide to linking: https://fabiensanglard.net/dc/cc.php

ahendriksen added a commit to ahendriksen/raft that referenced this issue May 5, 2023
I cannot reproduce the linker error. Building with clang 11.1 and nvcc
11.8. Both test files have raft::cache::(anonymous
namespace)::get_cache_idx defined though..

$ nm -g -C cpp/build/CMakeFiles/UTILS_TEST.dir/test/util/get_cache_idx_1.cu.o | grep get_cache_idx
                 U __cudaRegisterLinkedBinary_667dac6a_18_get_cache_idx_1_cu_eeb3336e_3832672
0000000000000000 D __fatbinwrap_667dac6a_18_get_cache_idx_1_cu_eeb3336e_3832672
0000000000000010 T __device_stub__ZN4raft5cache59_GLOBAL__N__667dac6a_18_get_cache_idx_1_cu_eeb3336e_383267213get_cache_idxEPiiS2_iiS2_S2_Pbi(int*, int, int*, int, int, int*, int*, bool*, int)
0000000000000000 T raft::cache::(anonymous namespace)::get_cache_idx(int*, int, int*, int, int, int*, int*, bool*, int)

$ nm -g -C cpp/build/CMakeFiles/UTILS_TEST.dir/test/util/get_cache_idx_2.cu.o | grep get_cache_idx
                 U __cudaRegisterLinkedBinary_74c80384_18_get_cache_idx_2_cu_a658dc80
0000000000000000 D __fatbinwrap_74c80384_18_get_cache_idx_2_cu_a658dc80
0000000000000010 T __device_stub__ZN4raft5cache51_GLOBAL__N__74c80384_18_get_cache_idx_2_cu_a658dc8013get_cache_idxEPiiS2_iiS2_S2_Pbi(int*, int, int*, int, int, int*, int*, bool*, int)
0000000000000000 T raft::cache::(anonymous namespace)::get_cache_idx(int*, int, int*, int, int, int*, int*, bool*, int)
ahendriksen added a commit to ahendriksen/raft that referenced this issue May 5, 2023
@ahendriksen
Copy link
Contributor

ahendriksen commented May 5, 2023

@alexanderguzhva, I pushed PR #1492. Could you test if that PR fixes the issue for you? I was unfortunately not able to reproduce your issue, but I think that inlining the kernel should fix any linker errors.

@alexanderguzhva
Copy link
Contributor Author

@ahendriksen #1492 seems to fix the problem. Thanks!

rapids-bot bot pushed a commit that referenced this issue May 5, 2023
Related to issue #1490. 

The `get_cache_idx` kernel is currently defined in an anonymous namespace. In #1490, it was reported that this could lead to linker errors in CUDA 11.4 in combination with (presumably?) clang.

I tried to reproduce the linker error in commit 01cf409 but could not, as described in the commit message.

This PR attempts to fix the linker errors by marking `get_cache_idx` as inline and by removing the anonymous namespace.

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1492
ahendriksen added a commit to ahendriksen/raft that referenced this issue Aug 10, 2023
Related to issues rapidsai#1511 and rapidsai#1490.

Should perhaps make this static (can remove weak linkage) or
hidden (should keep weak linkage). (See issue rapidsai#1722)
rapids-bot bot pushed a commit that referenced this issue Aug 11, 2023
Fixes issue #1511. Make get_cache_idx a weak symbol (to allow linking multiple symbols) without marking it inline (to avoid compilation warnings that are promoted to errors in nvcc 12). 

Related issues:
- #1490 
- #1722 

Related PRs:
- #1732 
- #1492

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1733
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

3 participants