-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How precompile files are loaded need to change if using multiple projects are going to be pleasant #27418
Comments
We'll need advice and input from @vtjnash on this one. |
Could you consider #28518 when fixing it? Re: implementation, I suppose you can de-duplicate precompile cache by using hash tree? What I mean by that is to generate the path of the precompile file using a hash that depends on its own ref: JuliaPy/pyjulia#173 |
Chiming in that for me this is pretty useful Argument: At least an optional flag for new environments not to share the the precompile cache would be awesome. |
As different system images may contain different versions of packages, I suppose it makes sense for the cache path to depend on (say) the path of the system image as well? I think it also helps to decouple stdlib more from Julia core. |
@StefanKarpinski I don't think implementing what I suggested above #27418 (comment) is difficult. Does this conceptually work? function cache_path_slug(env::Pkg.Types.EnvCache, uuid::Base.UUID)
info = Pkg.Types.manifest_info(env, uuid)
crc = 0x00000000
if haskey(info, "deps")
for dep_uuid in sort(Base.UUID.(values(info["deps"])))
slug = cache_path_slug(env, dep_uuid)
crc = Base._crc32c(slug, crc)
end
end
crc = Base._crc32c(uuid, crc)
if haskey(info, "git-tree-sha1")
crc = Base._crc32c(info["git-tree-sha1"], crc)
end
# crc = _crc32c(unsafe_string(JLOptions().image_file), crc)
return Base.slug(crc, 5)
end
cache_path_slug(Pkg.Types.EnvCache(), Base.identify_package("Compat").uuid) (By "conceptually", I mean that I'm grossing over that probably Some possible flaws I noticed:
|
Related: I was benchmarking julia master vs. a branch using two different directories & builds. The two compete against one another for the ownership of the compiled package files. |
FWIW, we found that using a different DEPOT_PATH for each frequently-used environment is a decent (if cumbersome) work-around until there's a fix. |
That's what I was doing too but recently I ran into a case where, surprisingly, that didn't work. I was rushing and didn't have time to document it, but I will see if I can remember what was involved. |
tangentially adjacent or interwoven? |
This is also the cause of timholy/Revise.jl#205 |
yowza. could we please prioritize this with a milestone? |
Under what conditions can it be guaranteed that one or more precompile files are shareable? If we can nail down the varying inputs to precompilation, it should at least be possible to put in a hack to stop truly unnecessary precompilations, at least until a better mechanism is devised. |
My home dir is typically shared by many different machines (os/proc type). I would need to have different build place for *.ji files related to each machine. DEPOT_PATH defines the location /Users/monty/.julia So i will need /Users/monty/.julia-redhat, /Users/monty/.julia-linux, /Users/monty/.julia-ubuntu14, /Users/monty/.julia-ubuntu16, etc. Is there a better way? |
I'm replying to @jpsamaroo's comment in this discourse thread here since this discussion belongs to here than there. Please read my comment (and the follow-up) and @jpsamaroo's comment for the full context.
I think it does not handle many common cases. For example, if you have But I actually don't know if it is such a bad idea as the first implementation. As switching project trigger precompilation anyway ATM, it is an improvement if |
I'd be interested in elaboration on this "in-memory dependency tree" and how it can solve the issue of dynamic activations. I only consider my "solution" a temporary improvement for certain commons cases anyway, but you're definitely right that it might make other common cases worse instead of better. |
I don't see why we can't just have 1 complile cache directory, per exact stack of enviroments. And sure it woud use more harddrive space, but harddrive space is cheap. |
I think it's not a crazy plan provided that there is a mechanism to switch to the mode that acts like To illustrate what I mean by "precompilation does not work", consider the following setup: Default (named) project
Further assume that packages If you do
this Julia session (hereafter
then this Julia session (hereafter @jpsamaroo This is what I meant by "in-memory dependency tree." The information that |
This is all great thinking. Unfortunately, the current issue is just so much more mundane than all that. We actually already have all of that great "in-memory dependency tree" logic and stacks of caches and more! So what's the problem, since that's clearly not working for the default user experience? Well, at the end of the precompile step, it goes and garbage collects the old files right away. So there's nary a chance for it to survive for even a brief moment to be found later and used. If it only could just stop doing that until some later explicit step (like the brand new |
Right, that's a good point. But we do still need to ensure we know how to locate the previously-generated *.ji files deterministically in a manner that is guaranteed to load the correct ones. Currently it seems this issue is avoided by blowing everything away and starting from scratch the moment any little thing changes with respect to the conditions that generated the previous *.ji files. |
@vtjnash Do you mind let us know where it is implemented? The closest thing I could find was |
@vtjnash It would be great if you could elucidate a little more concretely what needs to change inside of base; I don't quite follow precisely what needs to change. Clearly the naming of precompile files needs to change, and I think what you're saying is that we need a way to determine which precompile files are used and which are not used so that we don't just slowly fill up a disk with stale precompile caches? |
Another perspective; there are situations where having user-control over which precompile file gets loaded is desirable. Let us imagine a user wanting to distribute a docker container with Julia GPU packages pre-installed; the Julia GPU packages need to do some setup when they see a new generation of GPU hardware attached, and so right now in the docker container we are forced to set It would be much preferable if there were some kind of mechanism that allowed packages to expose a user-defined function that gets called to add some salt into the hash; an extremely coarse-grained version could be an environment variable Of course, the problem of how to intelligently garbage collect these files remains. |
Yes, it would be nice to integrate this with package options JuliaLang/Juleps#38 Meanwhile, you can build a patched system image with which you can add arbitrary salt to it via an environment variable. This works because child processes (which precompile Julia packages) inherit environment variables. More precisely, here is the code snippet that does this (used in Base.eval(Base, quote
function package_slug(uuid::UUID, p::Int=5)
crc = _crc32c(uuid)
crc = _crc32c(unsafe_string(JLOptions().image_file), crc)
crc = _crc32c(get(ENV, "JLM_PRECOMPILE_KEY", ""), crc)
return slug(crc, p)
end
end) (You can get this system image by running |
I would very much like a functionality like this for our lab computers, since it would make it possible to have multiple precompiled versions of commonly used libraries. Attached is a simplistic patch that adds the same version slug that is used in |
This was already implemented in 2019. |
Stefan: is there documentation that specifies exactly what parts of the
system should be placed where, or at least an example configuration for
such a centralized read-only setup? Where does the compiled directory go,
what about packages that the user explicitly wants to override etc.
…On Wed, Sep 9, 2020 at 8:33 PM Stefan Karpinski ***@***.***> wrote:
This was already implemented in 2019.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#27418 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN43JUUZ7OTUZHYJOMPDVDSE637ZANCNFSM4FDDVQ7Q>
.
|
Actually, no; loading.jl uses three different slugs:
These parts are then used to place package source code in With this scheme the number of files in the precompiled files is kept low, since new versions of a precompiled
BTW: the previous |
Precompile files are currently stored only based on the UUID of the package.
So if you change your project it is likely that you will have to recompile everything. And then again when you swap back etc.
This will be very annoying for people trying to use multiple packages and people will likely just use one mega project like before.
#26165 also removed any possibility for users to change the precompile path so there is no way to workaround this right now.
We should be smarter how we save precompile file to reduce the amount of recompilation needed. A very simple system is to just use one precompile directory for each project but that might be a bit wasteful since it is theoretically possible to share compilation files between projects.
The text was updated successfully, but these errors were encountered: