Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General setup of static outputs vs. shared ones #1809

Open
h-vetinari opened this issue Aug 8, 2022 · 23 comments
Open

General setup of static outputs vs. shared ones #1809

h-vetinari opened this issue Aug 8, 2022 · 23 comments

Comments

@h-vetinari
Copy link
Member

h-vetinari commented Aug 8, 2022

Static builds are already an exception in conda-forge (see CFEP-18), but some do exist for specific use-cases. The micromamba feedstock is a good example of a user of static libs (presumably to stay "micro" and have no runtime dependencies).

Several feedstocks of this kind follow a pattern as follows:

outputs:
  - name: libxyz
    build:
      run_exports:
        - {{ pin_subpackage('libxyz') }}
    [...]
  - name: libxyz-static
    requirements:
      build:
        - [...]
      host:
        - {{ pin_subpackage("libxyz", exact=True) }}
      run:
        - {{ pin_subpackage("libxyz", exact=True) }}

This has some advantages & disadvantages:

  • 👍
    • deduplication of files between static & shared builds
    • shared & static builds are co-installable
      • I'd argue this is actually an anti-pattern, but it can become relevant for dependencies with run-exports, see below
  • 👎
    • libxyz-static pulls in both dynamic and static builds
      • this makes it possible for static builds to "silently" depend on the shared builds
      • in practice this is not a huge issue because run-exports from libxzy don't get picked up though for a -static host dep, so reliance on the dynamic lib would be detected by going boom at runtime
    • CMake integration is impossible for both packages simultaneously
      • either the targets are wrong for the non-static version ("libzstd.a" not found)
      • or the CMake-specific files clobber each other

The CMake issue in particular is quite painful, because an ever-increasing number of packages in C/C++-land come with built-in CMake integration. For example, recent LLVM-builds failed on conda-forge/zstd-feedstock#58. There, I untangled the dependence in conda-forge/zstd-feedstock#62, but this has the following trade-off:

[...] we can have only 3 out of the following 4 (AFAICT):

  1. working CMake integration
  2. no clobbering of CMake files
  3. no manual hacking (of CMake files resp. upstream CMake integration)
  4. zstd & zstd-static co-installable

Currently I've chosen to give up 4. - perhaps an argument can be made that giving up 2. is less harmful despite being against best practice.

The lack of being able to co-install libxyz and libxyz-static would become problematic in the following scenario (quoting @hmaarrfk):

  1. User builds package a, which depends on zstd dynamic. zstd exports a requirement of zstd.
  2. User tries to build package b that depends on a and zstd-static. This user, can no longer build their package.

Package b cannot depend on just zstd-static because the dependency on zstd is controlled by package a.

In this case, that PR was merged (and the existing consumers are not affected by the run-export-induced conflict between dynamic & static libs), but now I've encountered the same in libprotobuf, and fixing conda-forge/libprotobuf-feedstock#68 is not possible without running into the same issue, hence I thought I'd open a wider discussion.

@hmaarrfk sketched out a different approach:

The other alternative, which I don't have time to flesh out, is to return to the previous system, but, to ensure that:

  1. cmake works with the dynamic libraries.
  2. cmake fails to automatically report the static libraries.

This would favor the dynamic library usage, instead of forcing users of cmake+zstd to have the static libraries installed.

to which I said:

Wouldn't that defeat the point of zstd-static? If I use an output named like that, I'd certainly expect the static lib to be picked up, not the dynamic one (i.e. compiling against zstd-static would seem to work, but would actually do the same as compiling against zstd, i.e. create a runtime dependence on zstd)

Summary

  • The setup currently used by several static outputs is incompatible with CMake, which is IMO a bigger downside than getting some file duplication and even co-installability
  • In general, I think it's really iffy to have static and shared builds of the same libs in the same host environment.
  • There are other feedstocks where this setup is not possible/sensible, e.g. Add static builds per cxx_version abseil-cpp-feedstock#35

Proposal: Forbid co-installation of libxyz-static builds with libxyz

This would solve the CMake issues (each output could have its own copy of the respective CMake files without risks of inconsistency or clobbering). The only downside would be that users of -static packages cannot depend on another package (say a) which has been built against a shared version of the same lib. Since most of the static libs live pretty close to the bottom of the stack, I don't think this would be much of a problem. And even so, there would be a solution: build a (perhaps also static?) version of a against libxyz-static.

I think (users of) static libs are special enough that we can inflict this (hypothetical!) pain on them

Thoughts @conda-forge/core?

@h-vetinari h-vetinari changed the title Generally recommended/accepted setup of static outputs vs. shared ones General setup of static outputs vs. shared ones (vs. broken CMake with currently common pattern) Aug 8, 2022
@carterbox
Copy link
Member

carterbox commented Aug 8, 2022

Libraries should use a separate target name for their static and dynamic libs so the CMake files don't clobber each and there is no ambiguity about what is being linked?

https://stackoverflow.com/a/2152157
https://stackoverflow.com/a/29824424

@carterbox
Copy link
Member

Could also create more outputs of the recipe to avoid clobbering. i.e. run_export only libs (no headers or CMake files).

@h-vetinari
Copy link
Member Author

h-vetinari commented Aug 8, 2022

Libraries should use a separate target name for their static and dynamic libs so the CMake files don't clobber each and there is no ambiguity about what is being linked?

I agree with this, however, it doesn't match current widespread practice...

Could also create more outputs of the recipe to avoid clobbering. i.e. run_export only libs (no headers or CMake files).

Could you sketch your idea w.r.t. to the most important aspects of a meta.yaml that'd do what you propose?

@isuruf
Copy link
Member

isuruf commented Aug 8, 2022

In general, I think it's really iffy to have static and shared builds of the same libs in the same host environment.

Having static and shared builds in the same environment has been the standard in Unix environments for decades. I don't know what you mean by iffy here. Please understand that conda is not special and we should do well to learn from the past.

In the case of zstd, the way you fixed is not ideal. What you should do is first build the static build, install and then do the shared lib only. CMake configs will support only shared. In my experience, projects supporting both static and shared libraries at the same time with cmake is rare. Most projects can build only one of those at the same time.

In the case of micromamba, there is no mistake about using static libraries because it explicitly searches for .a files. In other build systems, they usually add -Wl,-Bstatic to ensure static libraries are searched first. In any case, packages that require static building does it carefully and their build systems are usually great at limiting to static builds.

@h-vetinari
Copy link
Member Author

h-vetinari commented Aug 9, 2022

In general, I think it's really iffy to have static and shared builds of the same libs in the same host environment.

Having static and shared builds in the same environment has been the standard in Unix environments for decades. I don't know what you mean by iffy here. Please understand that conda is not special and we should do well to learn from the past.

By iffy I mean:

  • having wrong or broken CMake metadata
  • having to cross your fingers (or be very careful) to get the right artefact picked up
  • in principle, this would allow frankenstein-ian mixes of partially static / partially dynamic linking. If this only affects a handful of rarely used symbols, the corresponding runtime failure might not even be apparent immediately

I think a feedstock should make a specific choice of depending on the shared or static builds of a given dependency, and then only depend on that. Having both in the same environment makes it much harder to tell what's happening under the hood.

In the case of zstd, the way you fixed is not ideal.

Thanks for the feedback. Happy to find a better way to do it, which is why I opened this issue. Already in that issue I had sketched my

preferred solution

which would be:

output contains shared lib contains static lib comment
libxyz ✔️ not co-installable with libxyz-static
libxyz-static ✔️ not co-installable with libxyz

instead of the previous (== state of other static builds in conda-forge):

output contains shared lib contains static lib comment
libxyz ✔️
libxyz-static ✔️ (transitively) ✔️ run-depends on libxyz output;

and the current (after conda-forge/zstd-feedstock#62):

output contains shared lib contains static lib comment
libxyz ✔️
libxyz-static ✔️ ✔️ independent of libxyz output

What you should do is first build the static build, install and then do the shared lib only. CMake configs will support only shared.

IIUC correctly, only the shared builds would have CMake metadata. I dislike this because we'd be "lying" to any consuming feedstock that requests libxyz-static and happens to use CMake, i.e. it would be using the shared lib instead of the explicitly requested static lib.

In the case of micromamba, there is no mistake about using static libraries because it explicitly searches for .a files.

I had no doubt that it was being used correctly there, but IMO we should have more "guard rails" for using these static lib. Or if we decide not to, rename them to have a leading underscore so that they're clearly marked as "here be dragons".

In short, IMO:

  • CMake integration should just work (for all cases)
  • separation of shared/static builds leads to safer & more understandable recipes
  • allowing mixing for the sake of "it was always done like this on unix" is not helpful IMO
  • the "cost" of separating shared & static builds affects only "expert" feedstocks, it's not even clear that it would be an issue in practice, and there'd be a reasonable work-around.

@isuruf
Copy link
Member

isuruf commented Aug 9, 2022

IIUC correctly, only the shared builds would have CMake metadata. I dislike this because we'd be "lying" to any consuming feedstock that requests libxyz-static and happens to use CMake, i.e. it would be using the shared lib instead of the explicitly requested static lib.

We are not lying. there would be no libxyz-static CMake target. only libxyz-shared.

CMake integration should just work (for all cases)

You are going into hypotheticals. Please show one package where having only the shared build in cmake will fail when a project like zstd is built like how I said. Otherwise this conversation is moot.

@h-vetinari
Copy link
Member Author

We are not lying. there would be no libxyz-static CMake target. only libxyz-shared.

Having only the shared target is what worried me, because a feedstock using CMake & containing something like find_package(libxyz) in the upstream CMakeLists.txt, would find the shared builds, even though from the POV of conda & the meta.yaml, we'd be specifying libxyz-static as a host-dep.

You are going into hypotheticals. Please show one package where having only the shared build in cmake will fail when a project like zstd is built like how I said.

I gave two examples in the OP, zstd & libprotobuf (the latter has no cmake integration on unix yet). Lots of feedstocks that are consuming those are CMake-based. If any of those wants to use the static builds for whatever reason (and is not as exceedingly careful as micromamba), it would break. I think https://github.com/conda-forge/onnxruntime-feedstock would be a candidate (using CMake & depending on libprotobuf-static), for example.

Otherwise this conversation is moot.

I don't think this accurate (or fair). So far, the balance of pros/cons is IMO leaning in favour of changing (what are the benefits of the status quo, aside from not having to change something?); one more aspect against the status quo is that even your described solution has a higher integration cost because the build scripts between static/shared would become more involved (or requiring patches) to not pick up the static targets, whereas my proposal just needs cmake {,build,install}.

@isuruf
Copy link
Member

isuruf commented Aug 9, 2022

I gave two examples in the OP, zstd & libprotobuf (the latter has no cmake integration on unix yet). Lots of feedstocks that are consuming those are CMake-based. If any of those wants to use the static builds for whatever reason (and is not as exceedingly careful as micromamba), it would break. I think https://github.com/conda-forge/onnxruntime-feedstock would be a candidate (using CMake & depending on libprotobuf-static), for example.

No, zstd and libprotobuf are not examples. I mean downstream packages where this is needed. micromamba obviosuly doesn't care. So you only have the example of onnxruntime. Please go into detail about why onnxruntime depends on libprotobuf-static.

@isuruf
Copy link
Member

isuruf commented Aug 9, 2022

one more aspect against the status quo is that even your described solution has a higher integration cost because the build scripts between static/shared would become more involved (or requiring patches) to not pick up the static targets, whereas my proposal just needs cmake {,build,install}.

No, it doesn't. You just need cmake {build, install}. No patches necessary. You just need to make libzstd-static depend on libzstd in your solution.

@h-vetinari
Copy link
Member Author

No, zstd and libprotobuf are not examples. I mean downstream packages where this is needed.

I'm well-aware of that, I just said that any dependent package using CMake & wanting a static lib for any reason would be an example.

So you only have the example of onnxruntime. Please go into detail about why onnxruntime depends on libprotobuf-static.

I feel this is moving the goal posts; previously you asked what would break, now you ask why that feedstock needs a static lib. That's a deliberation per feedstock, I don't know this case specifically, or if it could be removed now; but that's besides the point, because it's an example of what I was describing.

No, it doesn't. You just need cmake {build, install}. No patches necessary.

Could you sketch how to do this based on - for example - the zstd feedstock? I'm not sure how one would build the static lib (currently just using cmake and ZSTD_BUILD_{STATIC,SHARED}=ON/OFF) but make sure the corresponding targets don't get installed, or at least I can't think of a way that doesn't involve patching.

You just need to make libzstd-static depend on libzstd in your solution.

How would the CMake files of zstd-static not clobber those of zstd? (though I also did say in the OP that accepting the clobbering might be the least-bad trade-off)

@isuruf
Copy link
Member

isuruf commented Aug 9, 2022

I feel this is moving the goal posts; previously you asked what would break, now you ask why that feedstock needs a static lib. That's a deliberation per feedstock, I don't know this case specifically, or if it could be removed now; but that's besides the point, because it's an example of what I was describing.

Since onnxruntime works just fine now, why are you insisting that you need cmake files to mention static just for onnxruntime to build correctly?

@h-vetinari
Copy link
Member Author

h-vetinari commented Aug 9, 2022

Since onnxruntime works just fine now, why are you insisting that you need cmake files to mention static just for onnxruntime to build correctly?

It currently uses the static lib, but does that without being confused by shared-only CMake files, because those don't exist yet due to conda-forge/libprotobuf-feedstock#68 (that issue needed upstream fixes first which have landed now, but adding the CMake integration in conda-forge for libprotobuf would run into the issue I'm describing here).

But zooming out a bit - why is it controversial that projects might want to consume static libs through CMake? That's a pretty normal thing to be doing (granted, not in conda-forge due to CFEP-18, but the exceptions that exist shouldn't have to be barred from using CMake?)...

Regarding examples, https://github.com/conda-forge/grpc-cpp-feedstock having to consume libabseil-static (on windows, due to C++ ABI issues) through CMake is also one.1

Footnotes

  1. I accept that abseil is obviously special through its ABI-dependence on the C++ version used to compile2, and doesn't need to fit a general scheme, but it does happen to also fit the proposed "separation between static and shared outputs".

  2. Therefore, we cannot have more than one shared lib (e.g. C++17), but need several flavours of static libs (at least C++11/C++14) so that feedstocks can use a compatible ABI for the C++ version they're using to compile themselves.

@isuruf
Copy link
Member

isuruf commented Aug 9, 2022

It currently uses the static lib, but does that without being confused by shared-only CMake files, because those don't exist yet due to conda-forge/libprotobuf-feedstock#68 (that issue needed upstream fixes first which have landed now, but adding the CMake integration in conda-forge for libprotobuf would run into the issue I'm describing here).

So, you are saying that it links statically right now without issue and as soon as shared only CMake files are added, it will link to shared library?

why is it controversial that projects might want to consume static libs through CMake?

Because you give no examples of doing so, but want to change the status quo. A change in status quo needs to be done when there's a real reason and you are not giving any.

Regarding examples, https://github.com/conda-forge/grpc-cpp-feedstock having to conda-forge/grpc-cpp-feedstock#196 libabseil-static (on windows, due to C++ ABI issues) through CMake is also one.1

I don't have time to go through these. Please explain in detail how this works right now and how having shared only Cmake files is an issue.

But zooming out a bit - why is it controversial that projects might want to consume static libs through CMake? That's a pretty normal thing to be doing (granted, not in conda-forge due to CFEP-18, but the exceptions that exist shouldn't have to be barred from using CMake?)...

Because of CFEP-18. We want to encourage shared builds.

Since you are going into hypotheticals, let me do the same. In the case of onnxruntime, your logic there is faulty as well. What if onnxruntime picks up a dependency that depends on shared libprotobuf? Then onnxruntime cannot link against libprotobuf-static because of the conda package conflict that you are introducing artificially.

@h-vetinari
Copy link
Member Author

h-vetinari commented Aug 9, 2022

So, you are saying that it links statically right now without issue and as soon as shared only CMake files are added, it will link to shared library?

Yes, if we do it like you propose (no CMake targets for the static lib), I believe that would happen. That adding CMake targets is beneficial for other reasons should not be in question I presume? (conda-forge/libprotobuf-feedstock#68 is almost 2 years old)

why is it controversial that projects might want to consume static libs through CMake?

Because you give no examples of doing so, but want to change the status quo. A change in status quo needs to be done when there's a real reason and you are not giving any.

I think working CMake integration is not "not giving any [reason]", see also below.

Because of CFEP-18. We want to encourage shared builds.

Yes, I get that, and those feedstock that diverge don't do this for fun. Those are often tricky packages in the first place, and I don't understand what benefit the status quo has that justifies subtly breaking their ability to use native CMake integration.

Please explain in detail how this works right now and how having shared only Cmake files is an issue.

I put this in the footnotes above already, but in short: per abseil version, we cannot have more than one shared lib, but we need at least three different ABIs (per C++ standard version). Static builds make up the missing ones, and cannot be co-installed (due to different ABI). It's a special case as I said, but it literally cannot be handled (safely/sanely) through a static-depending-on-shared setup.

Since you are going into hypotheticals, let me do the same. In the case of onnxruntime, your logic there is faulty as well. What if onnxruntime picks up a dependency that depends on shared libprotobuf? Then onnxruntime cannot link against libprotobuf-static because of the conda package conflict that you are introducing artificially.

Yes, I noted this in the OP, and explained how this is likely not a problem in practice, and a work-around if it turns out to be. Quoted again for convenience:

The only downside would be that users of -static packages cannot depend on another package (say a) which has been built against a shared version of the same lib. Since most of the static libs live pretty close to the bottom of the stack, I don't think this would be much of a problem. And even so, there would be a solution: build a (perhaps also static?) version of a against libxyz-static.

I think (users of) static libs are special enough that we can inflict this (hypothetical!) pain on them

It's of course a fair question if this is actually more painful than not having working CMake integration, and one where I'll gladly admit defeat if there are many affected feedstocks. But at least it would make the failure immediately visible, and presumably entice those feedstock to try building against the shared lib (if their dependency can do it, why not they themselves as well).

@isuruf
Copy link
Member

isuruf commented Aug 9, 2022

Yes, if we do it like you propose (no CMake targets for the static lib), I believe that would happen.

You are not making sense to me at all. How would this happen?

@h-vetinari
Copy link
Member Author

You are not making sense to me at all. How would this happen?

Currently, the CMake invocation in onnxruntime falls back on other means to detect libprotobuf-static (pkgconfig or whatever), but if we were to add native CMake files ($PREFIX/cmake/protobuf-*.cmake, see equivalent on windows), it would prefer those. If those CMake files only refer to the shared build, the wrong library would be picked up AFAICT.

@isuruf
Copy link
Member

isuruf commented Aug 9, 2022

Currently, the CMake invocation in onnxruntime falls back on other means to detect libprotobuf-static (pkgconfig or whatever), but if we were to add native CMake files ($PREFIX/cmake/protobuf-*.cmake, see equivalent on windows), it would prefer those. If those CMake files only refer to the shared build, the wrong library would be picked up AFAICT.

No, that's not how it works. I took the time to research into this (which you could have done too) and onnxruntime uses cmake to find protobuf. The native CMake files are what CMake calls CONFIG mode in find_package and CMake already has in built config files for supporting protobuf (MODULE mode). MODULE mode is the default and irrespective of the presence of the files that you call native CMake files, it will find the static protobuf files because of the option DProtobuf_USE_STATIC_LIBS=ON used by onnxruntime. To confirm this I first built static library and installed into a prefix and then built shared library and installed on top. onnxruntime was able to find the static library with no issues.

Any other examples where you "need" this?

@h-vetinari
Copy link
Member Author

Any other examples where you "need" this?

I don't know why this discussion needs to be so combative. I'm trying to solve (what I perceive to be) a genuine problem - we can disagree on how broken things are, or what the right solutions are, but can we turn down the intensity a bit?

I said I don't know the onnxruntime - I used it as an example I had found. For me the main point is in principle, for you it's specific examples that justify change. We can also disagree about that.

Still, I feel your investigation underscores my point - it took a non-trivial amount of effort by one of the most knowledgeable people in conda-forge to find out whether my (I'd say, at least) plausible scenario would become relevant (and what if CMake didn't have built-in detection like it has for protobuf, or if onnxruntime didn't use DProtobuf_USE_STATIC_LIBS=ON, etc. ...?).

How is all this effort / complexity in understanding a recipe beneficial, when we could have a completely unambiguous setup where that mixing never even becomes a possibility? I understand that changing something is work and engenders certain risks, but I feel also the status quo should be less set in stone, especially when problems are identified.

Beyond that, I still don't know how a patch-free build setup would look like where libxyz-static depends on libxyz and only the shared libs get CMake targets.

No, it doesn't. You just need cmake {build, install}. No patches necessary.

Could you sketch how to do this based on - for example - the zstd feedstock?

@isuruf
Copy link
Member

isuruf commented Aug 11, 2022

I don't know why this discussion needs to be so combative. I'm trying to solve (what I perceive to be) a genuine problem - we can disagree on how broken things are, or what the right solutions are, but can we turn down the intensity a bit?

Sure. Sorry about that. I urge you to take a couple of hours before replying so as to give the impression that you have thought about things before replying.

How is all this effort / complexity in understanding a recipe beneficial, when we could have a completely unambiguous setup where that mixing never even becomes a possibility? I understand that changing something is work and engenders certain risks, but I feel also the status quo should be less set in stone, especially when problems are identified.

There are use-cases where having both static and shared libraries are needed. For example, if you want to link your program against openblas static library (for performance, ABI, etc), but still want to use numpy in your stack (which comes with openblas shared), your suggestion basically makes it impossible for their use-case. On the other hand, your use-case can be worked-around in the build.sh in any feedstock. So, it's a choice between,

  1. Make it possible for users who need both shared and static, but make it inconvenient for people who only need static
  2. Make it convenient for people who only need static, but make it impossible for users who need both shared and static.
    Option 2 is clearly bad.

Beyond that, I still don't know how a patch-free build setup would look like where libxyz-static depends on libxyz and only the shared libs get CMake targets.

You have the solution right there. Add libxyz as a host dep in libxyz-static. For eg: Add - libzstd at https://github.com/conda-forge/zstd-feedstock/blob/main/recipe/meta.yaml#L90

@ngam
Copy link
Contributor

ngam commented Aug 11, 2022

Thank you all for this discussion as I learned a lot from it!

As someone who ends up in combative conversations often because I am not good at this and I am too quick to respond, I just want to say I appreciate all your work, @h-vetinari, and all your feedback, @isuruf.

@isuruf can we enlist you for a review/view in the abseil/grpc migration?

@h-vetinari
Copy link
Member Author

You have the solution right there. Add libxyz as a host dep in libxyz-static. For eg: Add - libzstd at https://github.com/conda-forge/zstd-feedstock/blob/main/recipe/meta.yaml#L90

Needed to be a run-dep as well (to get the headers), but yes, this works (though it's unclear to me which lib gets found by CMake; but I don't care that much anymore, as this approach seems to solve the most issues at once).

@carterbox
Copy link
Member

Is the following an accurate summary of the discussion?

No top-level package. Build shared first output. Build static second output. Static output depends on shared output in both run and host. Use tests to check that shared output excludes static libraries. Static CMake files probably clobber shared CMake files (depends on build config)?

@h-vetinari
Copy link
Member Author

Is the following an accurate summary of the discussion?

This matches my understanding. Thanks for the summary.

@h-vetinari h-vetinari changed the title General setup of static outputs vs. shared ones (vs. broken CMake with currently common pattern) General setup of static outputs vs. shared ones Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants