-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to a better lookup strategy for compile-time preferences in stacked environments before releasing 1.6? #37791
Comments
Just reading through this and thinking about it a little bit I agree that strategy 3 seems best. You shouldn't really use the content of the manifest for anything, except determining what package to load. |
One thing I'm not convinced about this; let's imagine that I have a package In the current design (strategy 1), because In strategy 2, I'm not sure where we would write them out to, (perhaps the global environment?), and then we should continue to read them back out from there. In strategy 3, I'm also not sure where we would write them out to, and assuming the package It's because of this situation (transitive dependencies) that I think we need to pay attention to the Manifest. |
Deciding which In Strategy 2, I think adding the preference for In Strategy 3, I think we can just emit a helpful error message that the user has to run One difficult situation (for any strategies including Strategy 1) is when you have something like |
Personally, I think it will be a bit of a "paradigm" shift if packages themselves start putting a bunch of options into the project file for packages that are not even in the Project file. Is that how the feature is indended to be used. Personally, I thought that options would always come directly from a user (same as how package entries in |
Even if we do our best to adhere to this, (which I'm not sure all packages will; for instance, I would not be surprised if packages such as FFTW decide to store their build configuration here, auto-choosing a good default for the user) there will always be the case where configuring one project will end up configuring others, because choosing "GPU enabled" for Flux should automatically turn on "GPU enabled" for NNlib, for instance.
I'm kind of split; on the one hand, yes, this would work. But on the other, it's kind of just papering over the fact that code loading works based off of Manifests, not Projects. In other words, when I type |
Stefan and I had a mega-discussion about this, and I think we've come up with a somewhat revamped architecture. The pain points addressed are:
Compile-Time DetectionInstead of having to explicitly declare preferences as compile-time sensitive, we can instead flex our compiler muscles do this automatically. We will change the module Frobulator
function get_backend()
# Use macro to automatically get calling Module
# second argument is default value
return @load_preference("backend", "CPU")
end
backend = get_backend()
if backend == "GPU"
# do GPU stuff
else
if backend != "CPU"
@error("no such backend '$(backend)', defaulting to CPU!")
end
# do CPU stuff
end
end # module The benefit of this setup is that we can now define currently_compiling() = ccall(:jl_generating_output, Cint, ()) != 0
function load_preferences(uuid::UUID, key::String; default = nothing, toml_cache::TOMLCache = TOMLCache())
# Re-use definition in `base/loading.jl` so as to not repeat code.
prefs_dict = Base.get_preferences(uuid, toml_cache)
# If we're currently compiling,
if currently_compiling()
push!(Base.COMPILETIME_PREFERENCES[uuid], key)
end
return get(prefs_dict, key, default)
end The
|
Well laid out, @staticfloat! I apologize for not having gotten into this with you earlier and given the feedback before you'd already merged one version of it. The major advantage of this iteration of the design is that neither the consumer of preferences, nor the dictator of preferences needs to be worried about whether they are compile-time or run-time preferences. If the consumer of a preference accesses it during precompilation, then it will automatically invalidate .ji files. Elegant! Another benefit is readability/writability of the preferences. Even if people don't usually produce these files by hand, they will read them and having headers with names like |
Yes!
Yes!
Yes! All of these things read very nicely to me and are very much in line of the high level view I had of how preferences would work. |
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. In a somewhat bizarre turn of events, because we want the `.ji` filename to incorporate the preferences hash, and because we can't know how to generate the hash until after we've precompiled, I had to move the `.ji` filename generation step to _after_ we precompile the `.ji` file. [0]: #37791 (comment)
@staticfloat Thanks for the detailed analysis and superb suggestion! I agree with all the aspects and I'm glad that such a drastic change is within the scope. |
In principle, we could merge
I wonder recursively merging everything could be too aggressive. Why not shallow merge (i.e., I totally agree recursive merge would be useful for many cases but it could be harmful if there is no way to suppress it. If we use recursive merge always unconditionally, we can't have a dictionary with some mutually exclusive patterns. For example, suppose we have [Foo.parallel]
backend = "sequential" [Foo.parallel]
backend = "threads"
ncpu = 4 Merging them yields [Foo.parallel]
backend = "sequential"
ncpu = 4 which may be invalid. "Ignore invalid options" is somewhat a valid strategy but I think it's reasonable to support validation of the preferences (for a better experience of end-users). Another example is [WebFramework.logging]
target = "all"
# level = "INFO" # default [WebFramework.logging]
target = ["MyPkg"]
level = "DEBUG" Merging them yields [WebFramework.logging]
target = "all"
level = "DEBUG" Let's say this means to enable debug-level logging for everything. It may not be what the user wants. |
Notice the |
Edit: I guess it can be done at save-time. |
Yeah, I think any kind of verification should be done within the package. I'm not married to the idea of recursive merging though. Shallow merging would be much, much simpler to implement. It's also much more complicated for the programmer to do things like load their preferences then save them back; how do you know what pieces were inherited from a higher |
Well, I'm pro-always-recursive-merge now :)
I think the behavior of packages should be, in principle, agnostic to the source of preference information. But I think it's reasonable to later add some API for querying the source of information so that you can create a better error/warning message for invalid input (e.g., pointing out file(s) where the invalid parameter is set). |
So imagine the following:
If
By limiting all accesses to be scoped to a top-level key, we ensure that we only ever get a consistent view. |
This is a good point. We could use
I would call this
Clever idea. Maybe too clever, but very clever...
This would be really hard in a language without recursion, but fortunately we have recursion. |
@staticfloat Why not reflect the load API with key to the save API and add (say) save_preferences( # or set_preferences
LoopVectorization,
"LLVM_passes" => ["pass1", "pass2"],
"nested.baz.qux" => "spoon",
) would change the preference from # @/Preferences.toml (State 1)
[LoopVectorization]
force_march = "avx2"
LLVM_passes = ["pass3"]
[LoopVectorization.nested]
foo = "bar" to # @/Preferences.toml (State 2)
[LoopVectorization]
force_march = "avx2" # untouched
LLVM_passes = ["pass1", "pass2"]
[LoopVectorization.nested]
foo = "bar" # untouched
[LoopVectorization.nested.baz]
qux = "spoon" while save_preferences(
LoopVectorization,
Dict(
"LLVM_passes" => ["pass1", "pass2"],
"nested" => Dict("baz" => Dict("qux" => "spoon")),
),
) would change # @/Preferences.toml (State 3)
[LoopVectorization]
LLVM_passes = ["pass1", "pass2"]
[LoopVectorization.nested.baz]
qux = "spoon" |
Hmmm, interesting. So we'd basically avoid a I like it! I'm going to completely drop the |
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. In a somewhat bizarre turn of events, because we want the `.ji` filename to incorporate the preferences hash, and because we can't know how to generate the hash until after we've precompiled, I had to move the `.ji` filename generation step to _after_ we precompile the `.ji` file. [0]: #37791 (comment)
Thinking a bit more about this, my only concern would be if people decide to save preferences in |
Just for clarity, when I suggested merging
So if we are to use # in LoopVectorization
set_passes(passes; kwargs...) =
save_preferences!(
@__MODULE__,
"LLVM_passes" => validate_passes(passes);
kwargs...
) so that the end-user can do Another approach could be to use context variables (i.e., "dynamically scoped" variables) #35833 to control the destination. The end-users can then call julia> Preferences.with_local() do
LoopVectorization.set_passes(["pass1"])
end (maybe with a better syntax sugar). I guess there are gazillion of other ways. |
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. In a somewhat bizarre turn of events, because we want the `.ji` filename to incorporate the preferences hash, and because we can't know how to generate the hash until after we've precompiled, I had to move the `.ji` filename generation step to _after_ we precompile the `.ji` file. [0]: #37791 (comment)
Okay so I just needed to get something that worked so that we could play around with it so I made some unilateral design decisions. The Julia branch is here and the package is here. I decided to go with:
The best documentation right now is the docstring for I'm on vacation now, so I won't be pushing this forward for the next 10 days or so. Feel free to mess around with it and come up with better ergonomics than I have. :) |
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. In a somewhat bizarre turn of events, because we want the `.ji` filename to incorporate the preferences hash, and because we can't know how to generate the hash until after we've precompiled, I had to move the `.ji` filename generation step to _after_ we precompile the `.ji` file. [0]: #37791 (comment)
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. In a somewhat bizarre turn of events, because we want the `.ji` filename to incorporate the preferences hash, and because we can't know how to generate the hash until after we've precompiled, I had to move the `.ji` filename generation step to _after_ we precompile the `.ji` file. [0]: #37791 (comment)
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. In a somewhat bizarre turn of events, because we want the `.ji` filename to incorporate the preferences hash, and because we can't know how to generate the hash until after we've precompiled, I had to move the `.ji` filename generation step to _after_ we precompile the `.ji` file. [0]: #37791 (comment)
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. [0]: JuliaLang/julia#37791 (comment)
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. [0]: JuliaLang/julia#37791 (comment)
Implements the `Preferences` loading framework as outlined in [0]. The most drastic change is that the list of compile-time preferences is no longer sequestered within its own dictionary, but is instead autodetected at compile-time and communicated back to the compiler. This list of compile-time preferences is now embedded as an array of strings that the loader must load, then index into the preferences dictionary with that list to check the preferences hash. In a somewhat bizarre turn of events, because we want the `.ji` filename to incorporate the preferences hash, and because we can't know how to generate the hash until after we've precompiled, I had to move the `.ji` filename generation step to _after_ we precompile the `.ji` file. [0]: #37791 (comment)
Is this addressed by #38044? |
Yes. |
tl;dr: I think we can improve the lookup strategies for compile-time preferences than the current implementation #37595. In particular, can we make it independent of the content of
Manifest.toml
files?Continuing the discussion in #37595 (comment), I think we need to explore different strategies of compile-time preference lookup for stacked environments before 1.6 is out and the spec is frozen.
(@staticfloat I'm opening the issue here since it's about code loading and I think resolving this is a blocker for 1.6. But let me know if you want to move this discussion to Preferences.jl)
cc @fredrikekre @KristofferC
What is the motivation of compile-time preference?
Before discussing how to lookup preferences, I think it would be better to have a shared vision of the use-cases of compile-time preference.
I imagine that a common example would be for choosing some kind of default "backend" such as CPU vs GPU JuliaLang/Pkg.jl#977. IIUC @timholy's ComputationalResources.jl achieves a similar effect with run-time
@eval
. FFTW's deps/build.jl uses a text file~/.julia/prefs/FFTW
to switch the provider of the external library. This can be migrated to the compile-time preferences system. It's also useful for toggling debugging support (in a semi-ad-hoc way). For example, ForwardDiff uses the constantNANSAFE_MODE_ENABLED
for adding debugging instructions.I think another important use-case is for handling machine-specific configuration such as system libraries and hardware properties. For example, previous discussions of package options (JuliaLang/Pkg.jl#458 and JuliaLang/Juleps#38) mentioned that configuring libpython for PyCall as an important use-case. In general, it is useful to be able to use Julia with external libraries with various sources. For example, libpython may come from JLL, OS's package manager, custom build, conda, etc. Such setting is inevitably machine-specific. Thus, recording such information in
Project.toml
that is meant to be shared is a bad idea. At the same time, it is crucial to have per-project per-machine preferences in a self-contained file for reproducibility.Are they good motivations? Can we agree that it's ideal to have (1) pre-project machine-agnostic preferences and (2) per-project per-machine preferences? If so, I think it's necessary to change the current lookup strategy.
Strategies
There are various ways to lookup preferences of stacked environments (i.e.,
Base.load_path()
). To start the conversation, I discuss following threee strategies:Strategy 1: First package hit in
Manifest.toml
files (current implementation as of #37595)The current strategy for finding the preference for a package is to walk through
load_path()
one by one, find a manifest (environment) that includes the package, and look at the corresponding project file.Strategy 2: First preference hit in
Project.toml
filesSearch
Project.toml
files inload_path()
and find the firstProject.toml
file with the preference of the target package.Strategy 3: First package hit in
Project.toml
filesSearch
Project.toml
files inload_path()
and find the firstProject.toml
file with the target package.Example
To illustrate the difference between these strategies, consider the following environment stack (i.e.,
Base.load_path() == [X, Y, Z]
)X
:Project.toml
has packageA
which has packageB
as a dependency (i.e.,B
is inManifest.toml
but not inProject.toml
).Package.toml
has no compile-preferences table.Y
:Project.toml
has the compile-preferences table forB
. However,Project.toml
'sdeps
table does not containB
.Z
:Project.toml
has the compile-preferences table forB
.Project.toml
includesB
indeps
; i.e., the user ranpkg> add B
while activatingZ
.Strategy 1 finds the preferences for
B
inX
(i.e., empty). Strategy 2 finds the preferences forB
inY
. Strategy 3 finds the preferences forB
inZ
.To summarize:
deps
compile-preferences
Manifest.toml
[A, ...]
B
as an indirect dependency[...]
B
's preferencesB
as an indirect dependency[B]
B
's preferencesB
Analysis
As I discussed in #37595 (comment), I think Strategy 1 (First package hit in manifests) is not desirable because the fact that package
A
depends onB
is (usually) an implementation detail. PackageA
's author may silently dropB
from the dependency when bumping v1.1 to v1.2. Then, afterPkg.update
, Strategy 1 would pick up projectY
as the source of preferences. OTOH, with Strategy 2 and 3, it's more explicit for the user to control which environment changes the preference of a given package. I don't think it is ideal to rely on the state ofManifest.toml
since it is a large opaque file to the users and it is often not checked in to the version control system.Strategy 3 has an advantage over Strategy 2 that the compatibility of the recorded preferences can be imposed via the
compat
entry. For example, the package can add thecompat
bound for the given preference support. The only disadvantage for Strategy 3 compared to Strategy 2 I can think of is that the user may end up having "stale" package inProject.toml
that they added just for configuring a transitive dependency.Alternative: shallow-merge all preference tables?
It's also conceivable to aggressively combine preference tables for a given package using
merge(dicts...)
. That is to say, givenand
we'd have
merge(Dict("a" => 10, "c" => 30), Dict("a" => 1, "b" => 2))
(i.e.,Dict("a" => 1, "b" => 2, "c" => 30)
).Since this is "shallow-merge", each package can opt-out this behavior and use Strategy 2/3 by creating sub-table explicitly:
and
As long as the specification is clearly documented, the package authors can use the appropriate behavior.
Opinion
I think Strategy 3 or the shallow-merge variant of Strategy 3 is better.
Appendix: Current implementation
The entry point for the precompilation cache manager is
get_preferences_hash
julia/base/loading.jl
Lines 325 to 348 in 6596f95
julia/base/loading.jl
Lines 1458 to 1484 in 6596f95
The text was updated successfully, but these errors were encountered: