Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rapids_cpm_nvbench properly specify usage of external fmt library #376

Conversation

robertmaynard
Copy link
Contributor

@robertmaynard robertmaynard commented Feb 17, 2023

Description

When we are inside a conda env the linker will be set to ld.bfd which will try to resolve all undefined symbols at link time.

Since we could be using a shared library version of fmt we need it on the final link line of consumers. So patch nvbench to understand this requirement.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The cmake-format.json is up to date with these changes.

@robertmaynard robertmaynard added bug Something isn't working non-breaking Introduces a non-breaking change 3 - Ready for Review Ready for review by team labels Feb 17, 2023
@robertmaynard robertmaynard requested a review from a team as a code owner February 17, 2023 19:39
@robertmaynard robertmaynard changed the title rapids_cpm_spdlog properly specify usage of external fmt library rapids_cpm_nvbench properly specify usage of external fmt library Feb 21, 2023
When we are inside a conda env the linker will be set to ld.bfd which will try to resolve all undefined symbols at time.

Since we could be using a shared library version of fmt we need it on the final link line of consumers. So patch nvbench to understand this requirement.
@robertmaynard robertmaynard force-pushed the bug/ensure_nvbench_uses_header_fmt_only branch from 9df6eb3 to c142539 Compare February 21, 2023 13:11
@vyasr
Copy link
Contributor

vyasr commented Feb 22, 2023

I'm afraid I don't understand this change. Won't any linker require that symbols be resolved at link-time (and if linking dynamically, the requirement will be enforced again at runtime by the loader)? IIUC the change in this PR is propagating the fmt requirement from nvbench to its consumers, but I don't understand how the resulting change in terms of generated compile/link commands is related to the choice of linker.

@robertmaynard
Copy link
Contributor Author

robertmaynard commented Feb 22, 2023

I'm afraid I don't understand this change. Won't any linker require that symbols be resolved at link-time (and if linking dynamically, the requirement will be enforced again at runtime by the loader)? IIUC the change in this PR is propagating the fmt requirement from nvbench to its consumers, but I don't understand how the resulting change in terms of generated compile/link commands is related to the choice of linker.

Good question. This PR is resolving an interacation between hidden symbols, undefined symbols and the conda linker ld.bfd.

When compiling things like the libcudf benchmarks we run into the following scenario:

  1. Usage of rmm which requests using the fmt header only library
  2. Usage of nvbench which has used the fmt via shared library

This results in the final executable having a link error like the following:

/x86_64-conda-linux-gnu/bin/ld: uses_fmt: hidden symbol `_ZN3fmt2v96detail18throw_format_errorEPKc' in CMakeFiles/uses_fmt.dir/use_fmt.cpp.o is referenced by DSO

Since we compiled nvbench against the shared library version of fmt, it has recorded that _ZN3fmt2v96detail18throw_format_errorEPKc is
an undefined symbol. When we compile the executable translation unit it records that _ZN3fmt2v96detail18throw_format_errorEPKc is weak and PRIVATE,
due to conda injecting -fvisibility-inlines-hidden on the compilation line.

So when we go to link the final executable the linker notices that we have inconsistent definitions for this symbol. One library says it is PRIVATE and the other says it is PUBLIC and undefined. Since ld.bfd behavior is to reject when we have inconsistent specifications it errors out.

To fix this I have identified two solutions:

  1. Ensure that any translation unit compiled under conda which ingests FMT via both shared lib and header has -fno-visibility-inlines-hidden
    on the compile line.
  2. Ensure that any translation unit compiled under conda which ingests FMT via both shared lib and header has -DFMT_SHARED on the compile line

I went with option 2 as it has the smallest impact on other functions / libraries being used. So to implement 2 I made the nvbench dependency on fmt public so we propagate the FMT_SHARED flag.

@vyasr
Copy link
Contributor

vyasr commented Feb 23, 2023

Got it, that makes sense. What happens in the case where a TU links to both rmm (which links to the fmt::fmt-header-only target) and the patched nvbench (which links to fmt::fmt)? Will the two not end up conflicting?

@robertmaynard
Copy link
Contributor Author

Got it, that makes sense. What happens in the case where a TU links to both rmm (which links to the fmt::fmt-header-only target) and the patched nvbench (which links to fmt::fmt)? Will the two not end up conflicting?

They don't end up conflicting since when fmt give the defines for both header and library mode it goes with library mode.

@robertmaynard
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 7db9ade into rapidsai:branch-23.04 Feb 27, 2023
@robertmaynard robertmaynard deleted the bug/ensure_nvbench_uses_header_fmt_only branch February 27, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants