Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use JLLs to provide standard library binaries #35193

Closed
wants to merge 15 commits into from

Conversation

staticfloat
Copy link
Member

This migrates almost all binary dependencies that ship with Julia to be accessible through JLL packages, with their binary dependencies stored in artifacts. This provides a uniform, Pkg-aware method of interfacing with the binaries that Julia ships with, however it comes with its own set of challenges, requiring a very high-surface area pull request:

  • JLL packages (the Julia source code that that provides the client interface for the binary artifacts) are now downloaded as part of Makefile targets within deps/, and installed into usr/share/julia/stdlib. This requires some mildly intricate Makefile shenanigans, as we first determine which JLL packages to download, then parse the bundled Artifacts.toml files in order to determine which artifacts to download and extract. Doing so with high performance was a small challenge, but while the Makefile parsing is slower now than it was before, it's not painful so I'm satisfied that my performance work is sufficient.

  • Binaries can either be downloaded from Yggdrasil output, or can be built locally. In the latter case, we must still provide the same binary interface to the binaries, despite their separate provenance. To handle this, we generate bare-bones JLL packages for these homebrewed binaries. To a client Julia package, the libLLVM_jll that it would import in order to make ccall()'s into libLLVM.so will look more or less identical whether the libLLVM was compiled locally or not; the tree hashes won't match, but that's fine (And actually a nice indicator in case something goes horribly wrong).

  • JLL packages themselves depend on things in Base and Pkg that may not be loaded by the time we need to use the JLL. To fix this, the JLL packages shipped in the standard library get "rewritten", eliminating as much dependence on Pkg and Base as possible. The main difference in functionality being that the system triplet (e.g. x86_64-linux-gnu-libgfortran4-cxx11) gets baked into the JLL packages, meaning that the triplet cannot change without Julia being rebuilt. This is a fine limitation for now.

  • Because __init__() methods don't get called by default during bootstrap, there are a number of manual __init__() calls scattered around the PR, such that libraries are opened and available for inspection during the init process.

  • Libdl is too useful to leave for the stdlib, so it gets moved to Base.

@staticfloat
Copy link
Member Author

This PR is still in a rough state, but I'm working on eliminating unnecessary changes, coalescing commits, etc... I'm at the point where I need to test it on CI though, so bear with me in the short term. :)

deps/suitesparse.mk Outdated Show resolved Hide resolved
@staticfloat
Copy link
Member Author

I ran into something of a roadblock recently; the "rewrite DLL import names as relative" approach doesn't work, because the names must all be implicitly relative to the executable. This wrecks any other executable being able to use them, so it's a bit of a showstopper for us. This means that really, the only way to get this to work is to rely on more traditional means of the main julia.exe executable being able to find its dependencies.

I started with creating a julia.cmd file that exports an appropriate PATH, then launches julia.exe. This works, but it's unsatisfactory because it now means that all windows users must remember to launch the .cmd instead of the .exe, otherwise it fails.

I briefly looked into creating a "wrapper" .exe that bundled the true .exe within itself, but while this works, it makes debugging a nightmare. No bueno.

An even simpler solution is create a launcher .exe that invokes actually-julia.exe and does the same job as the julia.cmd (but named julia.exe so as to not break downstream dependencies). But that's ugly. :)

Looking at ui/repl.c, there's actually not that much that julia.exe does; so my current research angle is to remove the dynamic-link-time dependency of julia.exe on libjulia.dll, compile it as a static executable, but then use dlopen() and friends to load in libjulia. This will allow us to have an executable that can at least start up everywhere, and once we have code execution capabilities, we can do things like push onto PATH and whatnot.

@StefanKarpinski
Copy link
Member

To clarify, this issue with relative loading is Windows-only or everywhere?

@StefanKarpinski
Copy link
Member

Also, statically linking libjulia into the julia executable seems reasonable to me.

@staticfloat
Copy link
Member Author

Update; using static libgcc and a delay-loaded DLL for libjulia works pretty well; one of the finnicky things is that delay-loaded DLLs don't let you import data symbols, so I have had to work around that through judicious application of LoadLibrary() and GetProcAddress(). There's a remaining segfault on Windows related to this issue, I believe, (some other global that I need to import) but progress continues...

@staticfloat
Copy link
Member Author

So much green.

@KristofferC
Copy link
Member

KristofferC commented Apr 21, 2020

Just some random things I noticed while trying out this branch, putting them here for posterity:

  • Startup time of Julia went from 0.13s to 0.4s on Mac. Profiling it seems to be the same reason as Generated executable is slow (doing JIT?) PackageCompiler.jl#292 (comment) (OpenBLAS spinning for a much longer time when starting up)
  • Download size went from 95 MB to 159 MB (uncompressed 470 MB to 625 MB), mostly extra stuff in the jll libraries (executables, static libraries etc).
  • Got this error from pkg when trying to add MKL_jll:
  [e66e0078] ERROR: MethodError: no method matching isless(::VersionNumber, ::Pkg.Types.VersionSpec)
Closest candidates are:
  isless(::Missing, ::Any) at missing.jl:87
  isless(::VersionNumber, ::VersionNumber) at version.jl:175
  isless(::Any, ::Missing) at missing.jl:88
Stacktrace:
 [1] <(::VersionNumber, ::Pkg.Types.VersionSpec) at ./operators.jl:268
 [2] >(::Pkg.Types.VersionSpec, ::VersionNumber) at ./operators.jl:294
 [3] print_diff(::Pkg.Types.Context, ::Pkg.Types.PackageSpec, ::Pkg.Types.PackageSpec) at /Users/kristoffercarlsson/julia/usr/share/julia/stdlib/v1.5/Pkg/src/Operations.jl:1577
...

@staticfloat staticfloat force-pushed the sf/jllstdlibs branch 4 times, most recently from 13a2c93 to d8c4cfc Compare April 22, 2020 22:27
@staticfloat
Copy link
Member Author

This PR is now getting pretty close to ready to go. To address Kristoffer's comments:

  • I've added some simple cleanups to the JLLs, deleting unnecessary things like static libraries and JLLs that are needed only at build time. This reduces the .dmg file size from 159MB to 104MB. There's still some increase, we can squeeze that down in the future, if we really need to. (There are things being shipped like LLVM headers and whatnot that may be useful, so I'm keeping them in for now)

  • Julia startup time is indeed increased, I see a reliable 100ms increase. I see two main reasons for this; the first is that the "meta" JLL packages being generated (BLAS_jll and Libm_jll) use dlpath() within their __init__() methods, and on MacOS dlpath() takes a minimum of 15ms and increases with each new library that is opened. This is because our dlpath() implementation on MacOS was first written, and then re-written by someone who wasn't thinking about performance at all. We now dlopen() every image that is currently loaded into the Julia process every time we try to dlpath() a handle, which is very, very bad for performance. We can speed this up by a few orders of magnitude by just caching handle pointers and essentially memoizing the dlpath() internals. The second reason for slowdown is indeed that BLAS startup is twice as slow, but I haven't been able to figure out why.

  • That MKL_jll thing might be a Pkg bug? I am changing the Pkg gitsha here to something very close to master.

@staticfloat
Copy link
Member Author

With the dlpath() speedup I just merged in, the perf difference has dropped from +100ms to +40ms. OpenBLAS is still taking up a lot more time, looking at it in Instruments the best I can see is that we're spending most of our time waiting for conditions to clear in pthreads. I have no idea what's going on there, but I think it must be something wrong in the OpenBLAS binaries, or it's that something else has gotten slightly slower and results in a cascade of interlocking concurrent operations to overall be slower.

I can't reproduce the MKL_jll error anymore.

@staticfloat
Copy link
Member Author

Alright, with my latest push of updated checksums, this is officially ready to be reviewed! Huzzah!

I would like to ask for a couple different people's help on this;

  • @vtjnash could you take a look at the build system changes? There's an awful lot of me just re-arranging things such that the ordering is correct so that inflates the diff somewhat. I think the things you might find particularly interesting are:
  • @KristofferC or @andreasnoack could you guys take a look at the LinearAlgebra changes I've made? In particular, I tried to decouple LinAlg from the underlying BLAS library a bit (this was necessary to get bootstrap working in an environment where openblas.so wasn't as readily available as it is right now). One of the consequences of this is that you can now do this:
using MKL_jll
lib = joinpath(MKL_jll.artifact_dir, "lib", "libmkl_intel_ilp64.so")
LinearAlgebra.set_blas_lapack_lib(lib, lib)

Of course this might not work completely, but I figure you guys may have some comments about this. :). One downside is that we now do a Ref and dlsym lookup before each BLAS call, but I figure that's generally going to be cheap enough that it's probably fine.

  • @ararslan can you try building this on your FreeBSD systems, both using BB and not using BB? I am curious as to whether this fixes your recently-reported FreeBSD issues.

@ararslan
Copy link
Member

@ararslan can you try building this on your FreeBSD systems, both using BB and not using BB? I am curious as to whether this fixes your recently-reported FreeBSD issues.

Unfortunately no, #34627 is still present on this branch. It also seems this breaks the non-BB build as well; with USE_BINARYBUILDER=0, I get

gmake[1]: *** No rule to make target '/usr/home/alex/julia/usr/manifest/libcurl', needed by '/usr/home/alex/julia/usr/manifest/LibCURL_jll'.  Stop.

@staticfloat staticfloat force-pushed the sf/jllstdlibs branch 2 times, most recently from 71a7fe2 to f5360b9 Compare April 27, 2020 17:47
@vtjnash
Copy link
Member

vtjnash commented Apr 27, 2020

Because __init__() methods don't get called by default during bootstrap, there are a number of manual __init__() calls scattered around the PR, such that libraries are opened and available for inspection during the init process.

I'm also concerned about this, because we also know from past experience that this design causes performance problems. We know it because we've done significant work in the past to delete these types of __init__ methods. For example, #28113. We know it because it can appear in the performance of loading packages that use the artifact system. And it probably reintroduces the issue addressed by #28290 #27894.

Now that you've got a roadmap going here, can you make this into many smaller PRs? 18 commits is just too much for one PR, some of which independently may bring in significant risk and may need careful thought for any unintended interactions (such as adding exe_path_wrapper.c). And, for example, for a first step, we can use the artifacts in the Makefile, but not yet make any changes to Base (just copy/symlink them into the locations expected).

@staticfloat
Copy link
Member Author

@ararslan can I get you to check again? I think I fixed the from-source issues on FreeBSD.

@vtjnash I'll start pulling out pieces from this and opening new PRs.

@ararslan
Copy link
Member

@ararslan can I get you to check again? I think I fixed the from-source issues on FreeBSD.

I started the build prior to the most recent commit here, but the build completed successfully. I'm running tests and so far so good but this system is a bit slow, so it'll be a while.

In order to bundle JLL packages (and their respective binary artifacts)
alongside Julia distributions, we alter the build process to first
download JLL packages, then parse the `Artifacts.toml` file and generate
Makefile targets to download and unpack those artifacts into the stdlib
depot.  This makes heavy use of the Makefile caching infrastructure, as
the TOML parsing must happen after the TOML files have been downloaded.
… backends

We rework LinearAlgebra a bit here, allowing runtime switching of
BLAS/LAPACK backends through the `set_blas_lapack_lib(blas, lapack)`
function.  Note that you cannot load an ILP64 BLAS into a non-ILP64
Julia, and vice-versa; such constants are defined at compile-time.
This cleanup eliminates many of the old code paths involved in loading
the correct compiler support libraries (such as `libgcc_s`) by working
it into the JLL stdlib framework.  Additionally, it eliminates many
FreeBSD-specific code paths, as all platforms now collect the CSLs
before build.
@Keno
Copy link
Member

Keno commented Dec 29, 2020

Are there aspects of this PR remaining beyond what we recently merged?

@staticfloat
Copy link
Member Author

Yes; what has been merged into 1.6 is a large chunk of the work, but there's more that needs to be merged, such as installing the binaries not into <prefix>/lib/julia but instead having them be actual artifacts installed into <prefix>/share/julia/vX.Y/artifacts. As a part of that, we'll need to use the to-be-created facility of private dependencies to bundle a JLLWrappers and Preferences within stdlib.

@DilumAluthge
Copy link
Member

Any updates on this PR?

@vtjnash vtjnash closed this Mar 28, 2022
@vtjnash
Copy link
Member

vtjnash commented Mar 28, 2022

I assume this is out-of-date

@vtjnash vtjnash deleted the sf/jllstdlibs branch March 28, 2022 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants