-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binary compatibility between conda and julia #164
Comments
To summarize my other comment, compiled native code is not cached by Julia in the depot. |
@mkitti, I think the end goal is something like this: JuliaInterop/CxxWrap.jl#309 (comment) |
@mkitti, allow me to quote your comment in full:
The qualification "unless one is making Our use case is ESMValTool, a tool for the analysis of climate simulations. The tool itself is written in Python, but we support diagnostics contributed by climate scientists in several languages, namely Python, R, NCL, and Julia. Since all diagnostics deal with the same kind of inputs and outputs, they generally rely on the same underlying libraries for that. With conda, we are making sure that Python, R, and NCL are using the exact same library for netcdf IO, for example. It would be great if we could extend that to Julia. |
Any specific reason why you're doing this? Wouldn't it be easier to just specify a version for them to use, instead of hardcoding to a specific library? Is this netcdf library from conda btw? I think this could be a little beyond our scope --- ideally, we shouldn't be messing too much with the upstream treatment and their libraries... or maybe we need to allow a future netcdf-julia-feedstock (based on https://github.com/JuliaGeo/NetCDF.jl) to handle this. Anyway, the idea of sharing these essential libraries across conda and julia is very interesting and I am supportive of it generally, but we need to be creative and careful in how we implement and/or open the door for this. Thanks for the concreting example. This is helpful. |
The If you update the Julia source, then Julia will recognize the compile cache is stale the next time you load the module or if you are using Revise. The Julia packaging mechanism is pretty robust with regard to the compat section. The more I think about it we are probably heading towards forking the JLLs to produce conda-forge versions. |
It would probably be a bad idea to have more than one version of the shared library loaded at once. |
Sorry for being unclear. We are not doing anything special. That's just how conda-forge works. Let me give you one example. Of course, there are many more dependencies that are related in more complicated ways. |
I see, thanks! ( So, if I understand things correctly, this shouldn't be an issue in our current setup because we are likely forcing users to redudetntaly install all these libraries in different places (i.e. they get isolated under the julia env path without being ever sharing with the underlying conda env). Basically, in a julia env (which is housed almost entirely under the conda env), everything gets reinstalled again for run-time unless it is something in our meta.yaml here. Then, for other packages from the julia registry, users will use julia's Now if you're talking about a julia package from conda-forge, then that's a whole different story. This is less than ideal, but for now, this is a good enough solution imo. We do want to figure out a way to share things more systematically so that in a situation like yours, the user can simply share the underlying But in short, I believe this is something we want to address head-on and having you on board here will definitely help us :) |
If the desire of conda-forge is to replace the binary dependencies in Julia's JLL packages without having duplicate copies, it appears the way forward may be to produce alternate versions of the JLL packages. This allows conda-forge to point packages that depend on the JLL packages to conda-forge binaries without having to download the binaries in the official JLL packages. The easiest way I can see to automate that process is to fork https://github.com/JuliaPackaging/JLLWrappers.jl under a new name, perhaps CondaForgeJLLWrappers.jl. The main modification would be to modify For a specific JLL package, such as https://github.com/JuliaBinaryWrappers/libcxxwrap_julia_jll.jl
There may be a way to coordinate with upstream to minimize the amount of forking necessary. An alternative to this, would be to enhance the Overrides.toml system such that the original binaries are not downloaded if they are overriden. For example, using https://docs.julialang.org/en/v1/stdlib/LazyArtifacts/ |
Ultimately, we either have to let the user do this on an ad-hoc basis (i.e. call a random package and then have it go through a streamlined process to be packaged according to our system, e.g. in the three steps above) or we have to basically copy everything there is --- en masse -- from julia. That's a losing battle as far as I am concerned. We may have to just accept that the two systems (conda and julia) are just doing their own things and we here are just opening a window to the julia world from --- isolated within --- conda. However, there is an argument to be made about setting up a powerful enough packaging system that julia users are then are persuaded to work within a conda-forge-provided solution. How likely is that? We just don't know, but we could definitely try! Obviously, it helps that we have a lot of infrastructure --- what, like 15000 repos? https://github.com/orgs/conda-forge/repositories --- in conda-forge and so if we can let julia use these packages with minor modifications, that will be our winning chip. So I think the point here isn't just to have julia packages go through the conda system, but equally to have conda-forge packages go through the julia system as well. I don't think this was the point of this feedstock originally, but obviously this is a key piece... |
To be clear the main point of contention are binary packages. Julia's binarybuilder.org cross-compilation JLL infrastructure versus conda-forge's container based compilation approach. Even then the comparison is not quite even. Julia's approach is to embed into the specific operating environment, hence the need to support 45 toolchains, and likely more. This includes several OS-architecture-ABI-libc by Julia major version combinations. For example, my Ubuntu system is described as x86_64-linux-gnu-cxx11-julia_version+1.7.0. Conda-forge's approach is to supply part of the operating environment. Hence one does not need a separate version for gnu libc or musl, or cxx03 or cxx11. The binary distribution approaches are so different that it's not really a competition. The approaches are complementary. Where the binarybuilder.org approach breaks down is when cross compilation does not easily work. For example, compiling HDF5 is painful in this regard. In this case the HDF5_jll is just repackaged from other build systems, including conda-forge: For pure Julia packages, I'm not sure if there is much to do. It's just about delivering a tarball or git repository of Julia source files. Since the native code compilation is done on the user's machine, there is less dependence on binaries overall. All I might do here is supply a script that calls Julia's package manager. From the Julia perspective, there is already Conda.jl: |
@zklaus kinda aside: why does ESMValTool not have equal support for osx (only the python version is fully supported)? ESMValTool is an interesting case --- on the bleeding edge --- where many different things are at play that regular packages won't be facing any time soon. However, we will want to make sure that we work closely with you to ensure things are smooth on your end. I am also a climate scientist (nominally; atmospheric) but I don't think ESMValTool is widely used (or used at all) where I work. I can also see a potentially smoother solution for your netcdf case. As you know, many centers would have the netcdf libraries installed system-wide or as modules to call. But you basically give the users no option to use their own netcdf and instead they would need to get it from conda-forge, right? A potential solution is to rely on "pure" netcdf to convert the nc files into something more universal to be processed by python, r, etc. --- but obviously given how crazy nc files from climate data and model output are, this will be pretty hard. Maybe it deserves a separate tool on its own... |
This is due to the availability of dependencies. We are working to improve that and make missing dependencies available on conda-forge, but since most use of the tool is on Linux based HPC, these improvements happen on an as-needed basis.
Interesting. I really have a rather different perception. To me, ESMValTool has reached a certain maturity (which is shown by its operational use at several national climate and meteorological centers) and in terms of dependencies shows exactly what you would expect from a comprehensive and mature tool: a rather complex web of dependencies that need to work together in consistent environments on different platforms; in other words, the definition of a regular package. The contrast to that for me is small, independent packages, as you expect to find at early proto-typing stage.
Cool to meet a fellow climate scientist 😄 Perhaps it would be interesting for you to take a look at ESMValTool. It is now being adopted at the UK MetOffice, of course, used at DLR, and has played a significant role in the production of the IPCC's AR6.
Right. The problem with that is often the available versions, particularly in terms of reproducibility between different centers.
No, that's not correct. You can install ESMValTool via pip or directly from the source code (
With the further development of the CF conventions and the CMIP6 data request (and similar documents for other projects), netcdf files really are a rather solid foundation in my view. They are also rather universal in that the format is well documented and can be read and written in basically all languages at this point. Converting and thereby duplicating the data is something I would strongly discourage. |
Yes, though I think the point of contention is not the means of compilation (conda-forge also uses cross-compilation in certain cases), but rather the approach to runtime dependency handling and dynamic linking.
In this sense, Julia's approach to packaging seems to be closer to Python's binary wheels, a system that indeed works well for relatively small packages that don't have extensive dependencies and don't need to co-exist with other packages in the same environment. |
Thanks for all the details. Just one note to make sure:
By "bleeding edge", I didn't mean immature or like new, but rather innovative in terms of supporting a wide range of languages, etc. --- my impression is that people usually write tools like these for one language only. I think ESMValTool is a pretty impressive piece of work! |
Yggdrasil/BinaryBuilder.org only does cross compilation. One on hand, this simplifies recipes considerably in that often a single recipe will build for a Linux, macOS, and Windows using glibc or musl and across arm and Intel architectures. On the other hand, when cross compilation is not directly possible such as in the case of HDF5, the main solution is to borrow other binaries. |
Thanks, everyone. Let's continue the broader discussion in #14 |
The issue of separate depots for separate environments is really all about binary compatibility. Basically, anything that has been compiled for one environment cannot reliably be used with another environment because linked libraries may have changed in an ABI incompatible way (cf the recent libunwind problems, but I would expect this with a number of libraries, e.g. for I/O of special file formats like netCDF) and the compilers themselves may be different. If compiled bits are shared, this will lead to surprising segfaults in the future. This kind of problem is not likely to show up in tests that use similar environments created at roughly the same time, but is virtually guaranteed if very different environments with library overlap are used, particularly when those environments are created a long time apart.
To me, it is not clear to what degree compiled bits are an integral part of the Julia depots, but this is my main problem with
~/.julia
currently (i.e. conda-forge Julia <=1.6).Originally posted by @zklaus in #157 (comment)
The text was updated successfully, but these errors were encountered: