Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary compatibility between conda and julia #164

Closed
ngam opened this issue Jan 2, 2022 · 17 comments
Closed

binary compatibility between conda and julia #164

ngam opened this issue Jan 2, 2022 · 17 comments

Comments

@ngam
Copy link
Contributor

ngam commented Jan 2, 2022

The issue of separate depots for separate environments is really all about binary compatibility. Basically, anything that has been compiled for one environment cannot reliably be used with another environment because linked libraries may have changed in an ABI incompatible way (cf the recent libunwind problems, but I would expect this with a number of libraries, e.g. for I/O of special file formats like netCDF) and the compilers themselves may be different. If compiled bits are shared, this will lead to surprising segfaults in the future. This kind of problem is not likely to show up in tests that use similar environments created at roughly the same time, but is virtually guaranteed if very different environments with library overlap are used, particularly when those environments are created a long time apart.

To me, it is not clear to what degree compiled bits are an integral part of the Julia depots, but this is my main problem with ~/.julia currently (i.e. conda-forge Julia <=1.6).

Originally posted by @zklaus in #157 (comment)

@mkitti
Copy link
Contributor

mkitti commented Jan 2, 2022

To summarize my other comment, compiled native code is not cached by Julia in the depot.

@ngam
Copy link
Contributor Author

ngam commented Jan 2, 2022

@mkitti, I think the end goal is something like this: JuliaInterop/CxxWrap.jl#309 (comment)

@zklaus zklaus mentioned this issue Jan 2, 2022
8 tasks
@zklaus
Copy link

zklaus commented Jan 2, 2022

@mkitti, allow me to quote your comment in full:

Native machine code is not cached by Julia during normal operation, so I do not expect there to be be ABI issues for compiled code generated from Julia unless one is making direct accalls. What is cached is lowered code and type inference on a per Julia environment basis in $JULIA_DEPOT/compiled. That said this does change between Julia versions and is under active development.

Thr chosen defaults for the 1.7 releases are quite conservative with unique Julia depots and environments for each conda environment. My concern is actually that is too conservative and that we are duplicating too many things.

The use of PackageCompiler.jl, which actually involves native code generation and storage is another matter, but its location of the generated system image shared library is determined by the user.

The qualification "unless one is making ccalls" is very important because that is a very comment thing to do, particularly in library wrappers. Examples include most packages from https://github.com/JuliaGeo many packages from https://github.com/JuliaMath, for example, FFTW and specialfunctions.

Our use case is ESMValTool, a tool for the analysis of climate simulations. The tool itself is written in Python, but we support diagnostics contributed by climate scientists in several languages, namely Python, R, NCL, and Julia. Since all diagnostics deal with the same kind of inputs and outputs, they generally rely on the same underlying libraries for that. With conda, we are making sure that Python, R, and NCL are using the exact same library for netcdf IO, for example. It would be great if we could extend that to Julia.

@ngam
Copy link
Contributor Author

ngam commented Jan 3, 2022

With conda, we are making sure that Python, R, and NCL are using the exact same library for netcdf IO, for example.

Any specific reason why you're doing this? Wouldn't it be easier to just specify a version for them to use, instead of hardcoding to a specific library? Is this netcdf library from conda btw?

I think this could be a little beyond our scope --- ideally, we shouldn't be messing too much with the upstream treatment and their libraries... or maybe we need to allow a future netcdf-julia-feedstock (based on https://github.com/JuliaGeo/NetCDF.jl) to handle this. Anyway, the idea of sharing these essential libraries across conda and julia is very interesting and I am supportive of it generally, but we need to be creative and careful in how we implement and/or open the door for this.

Thanks for the concreting example. This is helpful.

@mkitti
Copy link
Contributor

mkitti commented Jan 3, 2022

The ccall issue is more of a version dependency issue than a compile cache issue. If your definition of a struct is missing a new field in the middle of the struct then the ccall will fail.

If you update the Julia source, then Julia will recognize the compile cache is stale the next time you load the module or if you are using Revise.

The Julia packaging mechanism is pretty robust with regard to the compat section.

The more I think about it we are probably heading towards forking the JLLs to produce conda-forge versions.

@mkitti
Copy link
Contributor

mkitti commented Jan 3, 2022

Any specific reason why you're doing this? Wouldn't it be easier to just specify a version for them to use, instead of hardcoding to a specific library? Is this netcdf library from conda btw?

It would probably be a bad idea to have more than one version of the shared library loaded at once.

@zklaus
Copy link

zklaus commented Jan 3, 2022

With conda, we are making sure that Python, R, and NCL are using the exact same library for netcdf IO, for example.

Any specific reason why you're doing this? Wouldn't it be easier to just specify a version for them to use, instead of hardcoding to a specific library? Is this netcdf library from conda btw?

Sorry for being unclear. We are not doing anything special. That's just how conda-forge works. Let me give you one example.
Via its meta.yaml, our tool depends on iris (pure Python), r-ncdf4 (an R package), and ncl (the NCL package). Both r-ncdf4 and ncl depend on libnetcdf, while iris depends on netcdf4, which in turn depends on libnetcdf.
When installing or upgrading these packages, conda figures out which version of libnetcdf is acceptable to all of them and provides this one library for the entire environment.

Of course, there are many more dependencies that are related in more complicated ways.

@ngam
Copy link
Contributor Author

ngam commented Jan 3, 2022

I see, thanks!

(libnetcdf should actually be called netcdf-c: conda-forge/libnetcdf-feedstock#132, but that's besides the point here.)

So, if I understand things correctly, this shouldn't be an issue in our current setup because we are likely forcing users to redudetntaly install all these libraries in different places (i.e. they get isolated under the julia env path without being ever sharing with the underlying conda env). Basically, in a julia env (which is housed almost entirely under the conda env), everything gets reinstalled again for run-time unless it is something in our meta.yaml here. Then, for other packages from the julia registry, users will use julia's Pkg to figure things out.

Now if you're talking about a julia package from conda-forge, then that's a whole different story.

This is less than ideal, but for now, this is a good enough solution imo. We do want to figure out a way to share things more systematically so that in a situation like yours, the user can simply share the underlying netcdf-c without having to install it for conda and for julia. Follow along what @mkitti (who knows significantly more about julia than me) has been demonstrating in #14 and JuliaInterop/CxxWrap.jl#309.

But in short, I believe this is something we want to address head-on and having you on board here will definitely help us :)

@mkitti
Copy link
Contributor

mkitti commented Jan 4, 2022

If the desire of conda-forge is to replace the binary dependencies in Julia's JLL packages without having duplicate copies, it appears the way forward may be to produce alternate versions of the JLL packages. This allows conda-forge to point packages that depend on the JLL packages to conda-forge binaries without having to download the binaries in the official JLL packages.

The easiest way I can see to automate that process is to fork https://github.com/JuliaPackaging/JLLWrappers.jl under a new name, perhaps CondaForgeJLLWrappers.jl. The main modification would be to modify find_artifact_dir such that it points to $CONDA_PREFIX.
https://github.com/JuliaPackaging/JLLWrappers.jl/blob/1631db3e80c9d36d257adc222e766a22a73df8d1/src/wrapper_generators.jl#L8-L17

For a specific JLL package, such as https://github.com/JuliaBinaryWrappers/libcxxwrap_julia_jll.jl

  1. conda-forge forks that package
  2. Remove Artifacts.toml since this is the mechanism that downloads the binaries.
  3. Replace the JLLWrappers.jl dependency and use CondaForgeJLLWrappers.jl

There may be a way to coordinate with upstream to minimize the amount of forking necessary.

An alternative to this, would be to enhance the Overrides.toml system such that the original binaries are not downloaded if they are overriden. For example, using https://docs.julialang.org/en/v1/stdlib/LazyArtifacts/

@ngam
Copy link
Contributor Author

ngam commented Jan 4, 2022

Ultimately, we either have to let the user do this on an ad-hoc basis (i.e. call a random package and then have it go through a streamlined process to be packaged according to our system, e.g. in the three steps above) or we have to basically copy everything there is --- en masse -- from julia. That's a losing battle as far as I am concerned. We may have to just accept that the two systems (conda and julia) are just doing their own things and we here are just opening a window to the julia world from --- isolated within --- conda.

However, there is an argument to be made about setting up a powerful enough packaging system that julia users are then are persuaded to work within a conda-forge-provided solution. How likely is that? We just don't know, but we could definitely try! Obviously, it helps that we have a lot of infrastructure --- what, like 15000 repos? https://github.com/orgs/conda-forge/repositories --- in conda-forge and so if we can let julia use these packages with minor modifications, that will be our winning chip.

So I think the point here isn't just to have julia packages go through the conda system, but equally to have conda-forge packages go through the julia system as well. I don't think this was the point of this feedstock originally, but obviously this is a key piece...

@mkitti
Copy link
Contributor

mkitti commented Jan 4, 2022

To be clear the main point of contention are binary packages. Julia's binarybuilder.org cross-compilation JLL infrastructure versus conda-forge's container based compilation approach.

Even then the comparison is not quite even. Julia's approach is to embed into the specific operating environment, hence the need to support 45 toolchains, and likely more. This includes several OS-architecture-ABI-libc by Julia major version combinations. For example, my Ubuntu system is described as x86_64-linux-gnu-cxx11-julia_version+1.7.0. Conda-forge's approach is to supply part of the operating environment. Hence one does not need a separate version for gnu libc or musl, or cxx03 or cxx11.

The binary distribution approaches are so different that it's not really a competition. The approaches are complementary. Where the binarybuilder.org approach breaks down is when cross compilation does not easily work. For example, compiling HDF5 is painful in this regard. In this case the HDF5_jll is just repackaged from other build systems, including conda-forge:
https://github.com/JuliaBinaryWrappers/HDF5_jll.jl

For pure Julia packages, I'm not sure if there is much to do. It's just about delivering a tarball or git repository of Julia source files. Since the native code compilation is done on the user's machine, there is less dependence on binaries overall. All I might do here is supply a script that calls Julia's package manager.

From the Julia perspective, there is already Conda.jl:
https://github.com/JuliaPy/Conda.jl
The main use is to install Python dependencies such as Python itself or matplotlib.

@ngam
Copy link
Contributor Author

ngam commented Jan 6, 2022

@zklaus kinda aside:

why does ESMValTool not have equal support for osx (only the python version is fully supported)?

ESMValTool is an interesting case --- on the bleeding edge --- where many different things are at play that regular packages won't be facing any time soon. However, we will want to make sure that we work closely with you to ensure things are smooth on your end. I am also a climate scientist (nominally; atmospheric) but I don't think ESMValTool is widely used (or used at all) where I work. I can also see a potentially smoother solution for your netcdf case. As you know, many centers would have the netcdf libraries installed system-wide or as modules to call. But you basically give the users no option to use their own netcdf and instead they would need to get it from conda-forge, right?

A potential solution is to rely on "pure" netcdf to convert the nc files into something more universal to be processed by python, r, etc. --- but obviously given how crazy nc files from climate data and model output are, this will be pretty hard. Maybe it deserves a separate tool on its own...

@zklaus
Copy link

zklaus commented Jan 10, 2022

@zklaus kinda aside:
why does ESMValTool not have equal support for osx (only the python version is fully supported)?

This is due to the availability of dependencies. We are working to improve that and make missing dependencies available on conda-forge, but since most use of the tool is on Linux based HPC, these improvements happen on an as-needed basis.

ESMValTool is an interesting case --- on the bleeding edge --- where many different things are at play that regular packages won't be facing any time soon.

Interesting. I really have a rather different perception. To me, ESMValTool has reached a certain maturity (which is shown by its operational use at several national climate and meteorological centers) and in terms of dependencies shows exactly what you would expect from a comprehensive and mature tool: a rather complex web of dependencies that need to work together in consistent environments on different platforms; in other words, the definition of a regular package. The contrast to that for me is small, independent packages, as you expect to find at early proto-typing stage.

However, we will want to make sure that we work closely with you to ensure things are smooth on your end. I am also a climate scientist (nominally; atmospheric) but I don't think ESMValTool is widely used (or used at all) where I work.

Cool to meet a fellow climate scientist 😄 Perhaps it would be interesting for you to take a look at ESMValTool. It is now being adopted at the UK MetOffice, of course, used at DLR, and has played a significant role in the production of the IPCC's AR6.

I can also see a potentially smoother solution for your netcdf case. As you know, many centers would have the netcdf libraries installed system-wide or as modules to call.

Right. The problem with that is often the available versions, particularly in terms of reproducibility between different centers.

But you basically give the users no option to use their own netcdf and instead they would need to get it from conda-forge, right?

No, that's not correct. You can install ESMValTool via pip or directly from the source code (python setup.py install) as well. But in that case, you have to take care of the dependencies that cannot be handled by pip/setuptools yourself. Using modules is an option, but it is not trivial to handle all of the dependencies.

A potential solution is to rely on "pure" netcdf to convert the nc files into something more universal to be processed by python, r, etc. --- but obviously given how crazy nc files from climate data and model output are, this will be pretty hard. Maybe it deserves a separate tool on its own...

With the further development of the CF conventions and the CMIP6 data request (and similar documents for other projects), netcdf files really are a rather solid foundation in my view. They are also rather universal in that the format is well documented and can be read and written in basically all languages at this point. Converting and thereby duplicating the data is something I would strongly discourage.

@zklaus
Copy link

zklaus commented Jan 10, 2022

To be clear the main point of contention are binary packages. Julia's binarybuilder.org cross-compilation JLL infrastructure versus conda-forge's container based compilation approach.

Yes, though I think the point of contention is not the means of compilation (conda-forge also uses cross-compilation in certain cases), but rather the approach to runtime dependency handling and dynamic linking.

Even then the comparison is not quite even. Julia's approach is to embed into the specific operating environment, hence the need to support 45 toolchains, and likely more. This includes several OS-architecture-ABI-libc by Julia major version combinations. For example, my Ubuntu system is described as x86_64-linux-gnu-cxx11-julia_version+1.7.0. Conda-forge's approach is to supply part of the operating environment. Hence one does not need a separate version for gnu libc or musl, or cxx03 or cxx11.

In this sense, Julia's approach to packaging seems to be closer to Python's binary wheels, a system that indeed works well for relatively small packages that don't have extensive dependencies and don't need to co-exist with other packages in the same environment.

@ngam
Copy link
Contributor Author

ngam commented Jan 10, 2022

Thanks for all the details. Just one note to make sure:

Interesting. I really have a rather different perception. To me, ESMValTool has reached a certain maturity

By "bleeding edge", I didn't mean immature or like new, but rather innovative in terms of supporting a wide range of languages, etc. --- my impression is that people usually write tools like these for one language only. I think ESMValTool is a pretty impressive piece of work!

@mkitti
Copy link
Contributor

mkitti commented Jan 10, 2022

Yes, though I think the point of contention is not the means of compilation (conda-forge also uses cross-compilation in certain cases), but rather the approach to runtime dependency handling and dynamic linking.

Yggdrasil/BinaryBuilder.org only does cross compilation. One on hand, this simplifies recipes considerably in that often a single recipe will build for a Linux, macOS, and Windows using glibc or musl and across arm and Intel architectures.

On the other hand, when cross compilation is not directly possible such as in the case of HDF5, the main solution is to borrow other binaries.

@ngam
Copy link
Contributor Author

ngam commented Jun 2, 2022

Thanks, everyone. Let's continue the broader discussion in #14

@ngam ngam closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants