Add `Preferences` subsystem #1835

staticfloat · 2020-05-22T00:52:33Z

There are a number of packages that are ad-hoc storing content within ~/.julia/prefs. Storing some kind of "preferences" or "options" is a generally useful thing, and as we march closer and closer to a reality where we can set package directories themselves to be completely read-only, we will need a place where users can store user configuration that is specific to their own package.

To that end, this introduces Pkg.Preferences, a simple API to serialize Dict objects to TOML and bring them back. This is a minimal interface that is meant to be used in a very straightforward manner, to be used by projects in doing things such as allowing the user to set which execution backend a project should use, etc...

A logical extension of this would be to allow overlaying of the global depot preferences (which this PR implements) with per-project values. A Project.toml could contain a section that defines a Dict for any UUIDs within its manifest, then we recursively merge that dict into the base depot preferences dict. This would allow for powerful per-project customization of packages within the project.

This is the last piece that I foresee us needing before we can contemplate turning package directories to be completely read-only. Between Artifacts, Scratch spaces and Preferences, I think we can build a very compelling story for controlled state management.

These functions provide a nice interface for determining the currently-running package's UUID and VersionNumber, identified by loaded `Module`. We explicitly do not include a lookup by name, in expectation of potential future naming clashes due to having multiple packages with the same name but different UUIDs.

aminya · 2020-05-22T06:56:55Z

I have some questions about this.

What is the interface for a package's user to set a preference for a package?
How can a package developer use the user input?
Does this allow a package to use a different backend based on the user's input? For example, to allow AcuteML to be used by EzXML or WebIO depending on the user's input.

staticfloat · 2020-05-22T17:15:16Z

This PR doesn't set a user API, it's lower-level than that. These are the primitives that a package would use as a backing store, not a user-facing interactive toolkit.

As an example, a package could include a configure() method, then if there are no preferences set upon first run, they call that function which uses TerminalMenus to walk the user through configuration. Alternatively, a package could always choose defaults and rely upon the user to call a set_xyz(choice) function to change something about its behavior.

Think of this as being just a simple datastore; a way for packages to store small, persistent Dicts. What packages do with that Dict, (And how they populate it with user input) is completely up to them.

This implements functionality and tests for a new `Scratch` subsystem in `Pkg`; analogous to the `Artifacts` added in 1.3, this provides an abstraction for a mutable datastore that can be explicitly lifecycled to an owning package, or shared among multiple packages. Closes #796

simonbyrne · 2020-05-22T20:15:00Z

Very nice! Looks ideal for my MPI problems.

Preferences provides a simple package configuration store; packages can store arbitrary configurations into `Dict` objects that get serialized into their active `Project.toml`. Depot-wide preferences can also be stored within the `prefs` folder of a Julia depot, allowing for default values to be passed down to new environments from the system admin.

StefanKarpinski · 2020-05-26T18:22:10Z

I haven't reviewed this in detail but the biggest design consideration for preferences is layering. You want sysadmins and packages to be able to provide good defaults on shared systems, which calls for some kind of layering mechanism. But that leaves a number of questions:

What happens when some system defaults change?
What about when a project is moved to another system?
Should layering occur across the depot path or along the load path?

I've thought about this for a while and I think that each layer should store a full set of choices along with whether that choice was inherited from a lower layer or set at this layer. That way if a choice at a lower layer changes, the user can be prompted for whether to keep the old choice or use the new default.

This also helps address what should happen when a project moves to another system. You have a full snapshot of preferences which can be compared with the choices on the new system: if a value has a non-default value, you use that because it was already an explicit override; if a value has a default value that matches the inherited value on the new system then you can also safely continue to use that default; only when a value that is marked as default has a different inherited value do you need to decide what to do—this is much the same as if a default changes on the same system. In both cases, it makes sense to prompt the user or allow them to batch choose the project or the system.

However, I'm not sure if this choice/inherited combination makes sense for all layers or only the last layer: i.e. the individual project. That's the level where choices get made and where reproducibility matters.

Layering across depots versus layering across the load path feels like different kinds of preference inheritance. Depot patch inheritance is about inheriting choices that a sysadmin made for the system but allowing individual users to decide they want something else. Load path inheritance is about the same user deciding whether they want to use different preferences for different projects.

staticfloat · 2020-05-26T19:31:31Z

I struggled back and forth with myself in responding to this, rewriting my comment 3-4 times before I really grasped the central difficulty I was having. We really shouldn't be packing all of this into the Project.toml. I now think that the preferences that are set for this Project specifically should be placed into Project.toml, whereas the merged preferences from the Project and the environment should be placed into Manifest.toml. This way, we have both what the package author is enforcing (Project.toml), as well as the combined view of package author's guidelines together with environmental defaults (Manifest.toml).

So far, I've avoided any kind of requirement for interactivity here. While packages can certainly define some kind of configure() method that uses TerminalMenus and all that fanciness to configure themselves, the underlying mechanism is very much a datastore; you give it a Dict, you get a Dict back out. While we could check to see if what is recorded in the Manifest.toml matches what we get from doing a recursive merge with the layered depots and the Project.toml, I would rather only do this check at certain well-defined points, such as when we resolve or instantiate. I definitely don't want to break out into an interactive context during operations that would otherwise be expected to be batch scriptable.

tkf · 2020-05-27T00:12:49Z

Is it usable at the top-level scope of packages? That is to say, can it be used as "build-time" (precompilation-time) option? If so, how does it handle precompilation cache invalidation?

I actually have implemented a similar system in #1378. The biggest design decision was to make it nicely work with precompilation. Since cache invalidation works only per-file basis (via include_dependency), I was forced to use indivisual file for each package to store configuration. It's not super nice to use a single file (like Project.toml or Manifest.toml) shared across packages because it'd cause re-precompilation of the whole project if you update the option for only one package. However, the biggest obstacle with this approach was the reproducibility of the configuration because now you have many small files.

I think the best way to go might be to support more finer mechanism for calculating precompilation checksum than per-file. As TOML parser might be going into Base JuliaLang/julia#36018, it might be possible to use shared file for the preference system and use particular sub-dictionary for deciding if the precompilation cache is stale.

staticfloat · 2020-05-27T05:57:31Z

I decided to keep all compile-time stuff outside of this PR and instead put it here, as a separate package: https://github.com/staticfloat/CompileTimePreferences.jl

tkf · 2020-05-27T06:41:48Z

It would be nice if you can explain the reasoning behind the decision. Or, if merging this into Pkg.jl at some point is in your pipeline, I'd appreciate it if you share your plan.

I also note that the compile-time option is the main reason the preference system was requested:
JuliaLang/Juleps#38
#458

StefanKarpinski · 2020-05-27T13:08:04Z

I now think that the preferences that are set for this Project specifically should be placed into Project.toml, whereas the merged preferences from the Project and the environment should be placed into Manifest.toml.

I like that separation. It's true to the nature of these two files. I'm a little unclear if that addresses changing inherited values (of so, how?) or if we still need to record whether something was an inherited value or not.

I don't want to force interactivity, but I do think that when inherited defaults change, some choice generally needs to be made. We can have a default way to resolve that in non-interactive contexts, but users will generally want to choose these things. Otherwise you're permanently privileging either always following changes to inherited values or always ignoring them, neither of which seems like the right choice.

staticfloat · 2020-05-27T21:41:45Z

It would be nice if you can explain the reasoning behind the decision.

Yes, I discussed it a bit on the slack, but I should record it here for permanence. The decision is based more in technical difficulties and ergonomics than in a philosophical dislike of compile-time preferences being a part of this. I'm totally open to merging the two if we can get a nice API meshing, but as it stands, it's difficult. Here's my thought process:

My ideal API is one in which a package can mark some subtree of their preferences as used at compile-time. For instance, I could say something like build_prefs = @load_compile_time_preferences("build"), at the top-level of my package, and this would do three things:

It would load the preferences under the "build" key and return them to build_prefs.
It would use Base.include_dependency() on some sentinel file that contains a hash of the build preferences, to enforce recompilation when that file changed.
It would inform Pkg.Preferences that when preferences are saved, it should hash anything under a subtree that has been marked as build-sensitive and write it out to a sentinel file.

This design requires the system to maintain two separate pieces of information on disk; the Project.toml and the build preferences "sentinel" that forces recompilation. Unfortunately, keeping these two in-sync is difficult: there are many ways that preferences can change:

Through inherited preferences by a depot preference changing
If project A uses package B and a user sets preferences in project A for package B (which is how it works; the preferences are set within the Base.active_project()), then when we save preferences in project A, we need a way to know all the declared compile-time-preferences within all reachable packages in the entire environment.
A user could manually modify their Project.toml!

As I saw it, there were three options, none of them perfect:

Keep all options stored only within Project.toml, generate a hash of the subtree of preferences and write it out to a sentinel location to be used with include_dependency(). This has the cache invalidation problems listed above.
Keep normal options stored within Project.toml, and store compile-time preferences elsewhere, use a separate API to modify them and use include_dependency() directly upon the file. This is essentially what https://github.com/staticfloat/CompileTimePreferences.jl does.
Keep all options stored only within Project.toml and use it in its entirety with include_dependency(). This causes all options to become compile-time options, and will also cause any modification of Project.toml to trigger a recompile.

I think the ideas of injecting dependencies into Project.toml make a lot of sense; but we have to be careful to design this in such a way that it makes sense with the realities of include_dependency() as well.

staticfloat · 2020-05-27T22:06:40Z

I like that separation. It's true to the nature of these two files. I'm a little unclear if that addresses changing inherited values (of so, how?) or if we still need to record whether something was an inherited value or not.

I think that if something is different between Manifest.toml and Project.toml, that implies it was inherited at last resolve. So all inherited preferences that are not overridden by a Project.toml's explicit settings will be determinable by simply looking at what is listed in Manifest.toml and not in `Project.toml.

We can even tell if something has changed in the global environment by comparing preferences that we loaded straight from the Manifest.toml with preferences that we load by recursively merging depot preferences into eachother, then merging that into Project.toml preferences (e.g. rebuilding it "from scratch"). If there's a difference between those two, then we know something has changed, such as a depot preference changing, or us being relocated. It's kind of like running a resolve, and if nothing has changed, there's no need to recompile anything.

I don't want to force interactivity, but I do think that when inherited defaults change, some choice generally needs to be made.

In this case, I think the mental model of "just a key/value datastore" breaks down a bit. This can quickly spiral out of control, but let's briefly touch on some options to see if we like other interfaces better:

Packages are required to define a configure!(new_prefs, old_prefs) method that may or may not go interactive, that will be invoked if we detect that the environment has changed from underneath us. As mentioned previously, we can detect that we have an issue by comparing depot+Project/Manifest differences.
Preferences are not dynamic/implicit like they are now, but instead they are all declaratively defined within Project.toml, and Pkg can do things iterate over them, etc. Package authors would create preferences with explicit merge policies and other metadata attached to them, e.g.

# within Foo's Project.toml

# This defines preferences that belong to this package
[Preferences.libfoo_vendor]
description = "Choice of libfoo vendor"
valid_options = ["libfoo_jll", "custom_libfoo_jll"]
merge_policy = "prefer_depot"
type = "string"
default = "libfoo_jll"

[Preferences.frobulate]
description = "Enables frobulation.  Obviously."
merge_policy = "prefer_project"
type = "boolean"
default = true

[Preferences.nuke_launch_threshold]
description = "Consult the documentation for a full description of this option"
merge_policy = "ask_user"
type = "float"
default = 13.7

# Within OverallProject.toml
[deps]
Foo = "$(foo_uuid)"

[Preferences.$(foo_uuid).libfoo_vendor]
value = "custom_libfoo_jll"

[Preferences.nuke_launch_threshold]
value = 1000.0

With such a system, a configure() method could be written generically, then just walk over all preferences, taking the appropriate action.

tkf · 2020-05-27T22:46:46Z

As I saw it, there were three options, none of them perfect:

I've considered those options, too. I agree they are not perfect. That's why I suggested a different option in my first comment.

Keep all options stored only within Project.toml, generate a hash of the subtree of preferences and write it out to a sentinel location to be used with include_dependency(). This has the cache invalidation problems listed above.

Why not integrate a subset of the preference system to the module loader in Base? It can parse Project.toml and then invalidate the sentinel file. This is a slight variation of what I suggested.

staticfloat · 2020-05-28T18:59:07Z

I can't comment on the feasibility of what we can do to change up what Base does when loading modules. It's true that with more fine-grained control it could be significantly more powerful, but I don't fully understand the tradeoffs there. I don't see where exactly in dump.c these files are actually verified; I see inside write_dependency_list() that things like the path and mtime are written out, but I haven't been able to find where it's used again. In any case, I am worried that needing to parse files at load time could significantly loading down. I'd need to hear from Jameson or Jeff or someone familiar with this code to know if modifying those routines is okay.

tkf · 2020-05-28T19:57:58Z

My understanding is that the stale check is mostly done in the Julia function Base.stale_cachefile. So, I think it's doable if we can use a TOML parser in Base.

staticfloat · 2020-06-02T19:31:17Z

We talked about this on the Pkg call, and we are going to be talking to the compiler team next week about adding a way for precompilation to take sub-trees of the preferences dict and use them in the invalidation process.

staticfloat · 2020-06-16T18:18:28Z

We spoke with the compiler team, and we got the go-ahead to implement parsing of TOML files (Project.toml, Manifest.toml, etc...) during code loading, baking a hash of affected keys into precompile files. All of this is pending JuliaLang/julia#36018 getting merged.

simonbyrne · 2020-06-16T18:27:35Z

baking a hash of affected keys into precompile files.

Could that be made more generic (perhaps a module-level __precompilehash__() function)?

The particular use-case is for MPI we typically want whatever MPI has been loaded via module load ... (which modifies the LD_LIBRARY_PATH), and so would want to retrigger precompilation when that changes.

staticfloat · 2020-06-16T18:30:31Z

Could that be made more generic (perhaps a module-level precompilehash() function)?

Hmmm. I don't think we can trigger recompilation after the module has already been loaded; this would be fixed-functionality, it's not running arbitrary code.

What is module load ...? Is that Julia code?

simonbyrne · 2020-06-16T18:40:23Z

What is module load ...? Is that Julia code?

No, it's a system like http://modules.sourceforge.net/, which modifies local environment variables (typically PATH, MANPATH, LD_LIBRARY_PATH, etc) to make different software builds available. e.g. on our system there are a bunch of different MPI builds, so before using you would do one of

module load openmpi/4.0.3
module load openmpi/4.0.3_cuda10.0
module load mpich/3.3.2

depending which MPI build you wanted.

If you configure MPI.jl with JULIA_MPI_BINARY=system, it will pick up which ever one is loaded, but the challenge is that we would want to retrigger precompilation whenever that changes.

staticfloat · 2020-06-16T19:02:46Z

Is it too un-ergonomic to ask that a single top-level Project.toml be configured to expect a certain MPI?

simonbyrne · 2020-06-16T19:53:07Z

I would have to think about it more, but from a usability perspective, i would say yes. Basically, any preference that depends on some state that is mutable from outside the package manager is likely to have this problem, e.g. if I set my libfoo preference to /usr/lib/libfoo.so, and then my system package manager upgrades libfoo, then I would want some way to invalidate the cache.

The second best option would be a way to have a function that lets you invalidate the cache from within __init__(), and would require a session restart, e.g.

const VERSION = libversion()
function __init__()
    if libversion() != VERSION
        Base.invalidatecache(@__MODULE__)
        error("libfoo version has changed, please restart Julia")
    end
end

tkf · 2020-06-16T20:09:50Z

Wouldn't it be better if users just write down the modules and the project they use in their script?

module load openmpi/4.0.3
export JULIA_PROJECT=PATH/TO/Project.toml

It sounds like a good practice for reproducibility.

But checking the library compatibility in __init__ sounds like a good UI (without invalidatecache) in any case.

simonbyrne · 2020-06-16T20:45:35Z

It sounds like a good practice for reproducibility.

Only if they have access to the same cluster

tkf · 2020-06-16T21:57:37Z

Isn't it a good practice to record what you use in each cluster?

tkf · 2020-09-28T22:43:23Z

@staticfloat Given that JuliaLang/julia#37595 is merged and we (will) have https://github.com/JuliaPackaging/Preferences.jl, I think we can close it now?

staticfloat changed the title ~~Implement Pkg.Preferences~~ Add Preferences subsystem May 22, 2020

DilumAluthge mentioned this pull request May 22, 2020

Project.jl as an extension to Projec.toml #1828

Closed

staticfloat force-pushed the sf/preferences branch from 5f498bb to 3357746 Compare May 22, 2020 23:27

staticfloat force-pushed the sf/preferences branch from 3357746 to 5e2343f Compare May 26, 2020 17:51

simonbyrne mentioned this pull request Jun 4, 2020

More prominently document JULIA_CUDA_USE_BINARYBUILDER JuliaGPU/CUDA.jl#204

Closed

staticfloat mentioned this pull request Jun 20, 2020

Generate multiple packages with the a single builder JuliaPackaging/BinaryBuilder.jl#778

Open

tkf mentioned this pull request Jul 28, 2020

Making a Conda Package for Pyjulia JuliaPy/pyjulia#392

Open

tkf closed this Sep 28, 2020

DilumAluthge deleted the sf/preferences branch January 4, 2021 05:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Preferences` subsystem #1835

Add `Preferences` subsystem #1835

staticfloat commented May 22, 2020

aminya commented May 22, 2020 •

edited

Loading

staticfloat commented May 22, 2020

simonbyrne commented May 22, 2020

StefanKarpinski commented May 26, 2020

staticfloat commented May 26, 2020

tkf commented May 27, 2020

staticfloat commented May 27, 2020

tkf commented May 27, 2020

StefanKarpinski commented May 27, 2020

staticfloat commented May 27, 2020

staticfloat commented May 27, 2020 •

edited

Loading

tkf commented May 27, 2020

staticfloat commented May 28, 2020

tkf commented May 28, 2020

staticfloat commented Jun 2, 2020 •

edited

Loading

staticfloat commented Jun 16, 2020

simonbyrne commented Jun 16, 2020

staticfloat commented Jun 16, 2020

simonbyrne commented Jun 16, 2020

staticfloat commented Jun 16, 2020

simonbyrne commented Jun 16, 2020

tkf commented Jun 16, 2020

simonbyrne commented Jun 16, 2020

tkf commented Jun 16, 2020

tkf commented Sep 28, 2020

Add Preferences subsystem #1835

Add Preferences subsystem #1835

Conversation

staticfloat commented May 22, 2020

aminya commented May 22, 2020 • edited Loading

staticfloat commented May 22, 2020

simonbyrne commented May 22, 2020

StefanKarpinski commented May 26, 2020

staticfloat commented May 26, 2020

tkf commented May 27, 2020

staticfloat commented May 27, 2020

tkf commented May 27, 2020

StefanKarpinski commented May 27, 2020

staticfloat commented May 27, 2020

staticfloat commented May 27, 2020 • edited Loading

tkf commented May 27, 2020

staticfloat commented May 28, 2020

tkf commented May 28, 2020

staticfloat commented Jun 2, 2020 • edited Loading

staticfloat commented Jun 16, 2020

simonbyrne commented Jun 16, 2020

staticfloat commented Jun 16, 2020

simonbyrne commented Jun 16, 2020

staticfloat commented Jun 16, 2020

simonbyrne commented Jun 16, 2020

tkf commented Jun 16, 2020

simonbyrne commented Jun 16, 2020

tkf commented Jun 16, 2020

tkf commented Sep 28, 2020

Add `Preferences` subsystem #1835

Add `Preferences` subsystem #1835

aminya commented May 22, 2020 •

edited

Loading

staticfloat commented May 27, 2020 •

edited

Loading

staticfloat commented Jun 2, 2020 •

edited

Loading