-
-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize Pkg.precompile #2018
Changes from 12 commits
9937dea
6fb919d
4708b59
5b0e5ac
7b98298
4b2eef4
84697d7
8f73605
2b1379e
10cbdc0
a4bdd79
52ef225
ab03371
22c4687
dd0ad54
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -895,30 +895,70 @@ end | |
precompile() = precompile(Context()) | ||
function precompile(ctx::Context) | ||
printpkgstyle(ctx, :Precompiling, "project...") | ||
|
||
pkgids = [Base.PkgId(uuid, name) for (name, uuid) in ctx.env.project.deps if !is_stdlib(uuid)] | ||
|
||
num_tasks = parse(Int, get(ENV, "JULIA_NUM_PRECOMPILE_TASKS", string(Sys.CPU_THREADS + 1))) | ||
parallel_limiter = Base.Semaphore(num_tasks) | ||
|
||
man = Pkg.Types.read_manifest(ctx.env.manifest_file) | ||
pkgids = [Base.PkgId(first(dep), last(dep).name) for dep in man if !Pkg.Operations.is_stdlib(first(dep))] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know this filtering was here before, but why is it necessary to filter out stdlibs?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aren't stdlib's always going to be precompiled already, and if you're dev-ing them they'd need to have their uuid removed, so wouldn't identify as stdlibs in that check? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, forgot to say that this is when you compile Julia without them in the sysimg. Perhaps we can instead filter based on if the package is already loaded? That should work for both regular packages and stdlibs. If it is a stdlib that is in the sysimg it doesn't need to precompile, and if it is a regular package that is already loaded in the session it is probably just precompiled from the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But what if they're loaded, and in need of recompiling? Perhaps the filter just isn't needed at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea I am not sure what happens if you try to precompile stdlibs that are loaded though? Since no precompiles files exist, will that spend time on precompiling them anyway? At least we can add the filter I suggested to the stdlibs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #2021 updated with this now (the PkgId version) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought the stdlib check was for some kind of optimization (launching I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thats true, I didn't think about regular packages in the sysimg. But perhaps #2018 (comment) is a good enough approximation of that? It seems pretty strange to (i) load a dependency, (ii) update its version, (iii) pkg> precompile, (iv) restart Julia and expect everything to be precompiled? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although #2021 is looking good, I do like the properness of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@fredrikekre Hmm... That was my expectation, actually. I generally expect |
||
pkg_dep_uuid_lists = [collect(values(last(dep).deps)) for dep in man if !Pkg.Operations.is_stdlib(first(dep))] | ||
filter!.(!is_stdlib, pkg_dep_uuid_lists) | ||
|
||
if ctx.env.pkg !== nothing && isfile( joinpath( dirname(ctx.env.project_file), "src", ctx.env.pkg.name * ".jl") ) | ||
push!(pkgids, Base.PkgId(ctx.env.pkg.uuid, ctx.env.pkg.name)) | ||
push!(pkg_dep_uuid_lists, collect(keys(ctx.env.project.deps))) | ||
end | ||
|
||
precomp_events = Dict{Base.UUID,Base.Event}() | ||
was_recompiled = Dict{Base.UUID,Bool}() | ||
for pkgid in pkgids | ||
precomp_events[pkgid.uuid] = Base.Event() | ||
was_recompiled[pkgid.uuid] = false | ||
end | ||
|
||
# TODO: since we are a complete list, but not topologically sorted, handling of recursion will be completely at random | ||
for pkg in pkgids | ||
|
||
IanButterworth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
function is_stale(paths, sourcepath) | ||
for path_to_try in paths::Vector{String} | ||
staledeps = Base.stale_cachefile(sourcepath, path_to_try, Base.TOMLCache()) | ||
staledeps === true && continue | ||
return false | ||
end | ||
return true | ||
end | ||
|
||
errored = false | ||
@sync for (i, pkg) in pairs(pkgids) | ||
paths = Base.find_all_in_cache_path(pkg) | ||
sourcepath = Base.locate_package(pkg) | ||
sourcepath === nothing && continue | ||
# Heuristic for when precompilation is disabled | ||
occursin(r"\b__precompile__\(\s*false\s*\)", read(sourcepath, String)) && continue | ||
stale = true | ||
for path_to_try in paths::Vector{String} | ||
staledeps = Base.stale_cachefile(sourcepath, path_to_try) | ||
staledeps === true && continue | ||
# TODO: else, this returns a list of packages that may be loaded to make this valid (the topological list) | ||
stale = false | ||
break | ||
end | ||
if stale | ||
Base.compilecache(pkg, sourcepath) | ||
end | ||
|
||
@async begin | ||
for dep_uuid in pkg_dep_uuid_lists[i] | ||
wait(precomp_events[dep_uuid]) | ||
end | ||
if errored # early termination of precompilation session | ||
notify(precomp_events[pkg.uuid]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This implementation of parallelism feels very low-level. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried out a channel based approach, but it still requires the notify system. It works but seems slightly more complicated master...ianshmean:ib/parallel_precomp_chanelbased |
||
return | ||
end | ||
|
||
any_dep_recompiled = any(map(dep_uuid->was_recompiled[dep_uuid], pkg_dep_uuid_lists[i])) | ||
if any_dep_recompiled || is_stale(paths, sourcepath) | ||
Base.acquire(parallel_limiter) | ||
try | ||
Base.compilecache(pkg, sourcepath) | ||
was_recompiled[pkg.uuid] = true | ||
catch err | ||
errored = true | ||
throw(err) | ||
Comment on lines
+954
to
+955
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I couldn't bring JuliaLang/julia#37374 (comment) up again in this PR in time, but did somebody considered the case for OS-specific packages? That is to say, if we have module DownstreamPkg
if Sys.iswindows()
using SomeWindowsOnlyPackage
end
...
end we don't need to compile I think we should report errors only from the importable packages (i.e., the ones in Project.toml). Of course, it'd be much nicer to integrate precompilation machinery in Base itself so that this kind of hack is not necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that what makes it hard to parallelize in Base, because you don't know your actual dependencies up front? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. Perhaps the docstring for
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll let others decide if this is necessary, but it proved simple to implement an option to revert behavior #2021 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah. But it's a problem in Pkg, too (even if you can parse Manifest). I think the problem is that there is no database for "true" dependency tree since you can hide Ultimately, only the precompilation subprocess "knows" the dependencies while it's precompiling a package. So, I think the "correct" way of doing this to use some kind of bidirectional RPC so that the precompilation subprocess can request the precompilation of its dependencies to the parent process. But that's very tedious. I think ignoring compilation error from non-direct dependencies is a decent approach. (I feel like I should try implementing it now that I complained so much 🤣) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Neat idea on non-direct dependencies being allowed to fail.. I implemented it here #2021 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't it be enought to wrap in
here? That should propagate to the subprocess, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's implemented in #2021 but there was a question about |
||
finally | ||
notify(precomp_events[pkg.uuid]) | ||
Base.release(parallel_limiter) | ||
end | ||
else | ||
notify(precomp_events[pkg.uuid]) | ||
end | ||
end | ||
end | ||
nothing | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems kinda excessive to introduce an env variable for this. Its so specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how else to gate this. Suggestions? There are some concerns that with a large core count this could accidentally OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much memory does each worker use approximately? Isn't this the case for every parallel workload that uses memory? Does this scale up to super high core counts, perhaps just setting an upper cap is OK.
I guess we should look at nthreads but everyone runs with that equal to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but most other workloads allow tuning (e.g. via
-t
).There's that, and also Lyndon's comment above that this is more like a multiprocessing thing than a multithreading thing. I also agree that I shouldn't have to limit my computation's thread count to limit the precompilation, and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have it as a normal argument to the
precompile
function then. This is exactly what we already have to limit parallelism in the asynchronous package downloader.Looking at it, funnily enough we do have an env variable for the package downloader but that seems like it was added as a workaround for something:
Pkg.jl/src/Types.jl
Lines 329 to 331 in ede7b07
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although if we at some point want to run this automatically when a package is updated, there is no chance to give this argument.
Perhaps there should be a
.julia/config/PkgConfig.toml
where things like this could be set?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, let's go with this for now. Can always tweak it later.