Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When setting JULIA_DEPOT_PATH to /path:, omit the default user depot #51448

Merged
merged 2 commits into from
Dec 22, 2023

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Sep 25, 2023

Summary

This PR slightly changes how the DEPOT_PATH works when overriden using the env var JULIA_DEPOT_PATH, now omitting the default user depot when specifying a path.

Before:

❯ JULIA_DEPOT_PATH=/foo: \
  julia-1.10 -e 'display(DEPOT_PATH)'
4-element Vector{String}:
 "/foo"
 "/home/tim/.julia"
 "/path/to/julia/local/share/julia"
 "/path/to/julia/share/julia"

After:

❯ JULIA_DEPOT_PATH=/foo: \
  julia-pr -e 'display(DEPOT_PATH)'
3-element Vector{String}:
 "/foo"
 "/path/to/julia/local/share/julia"
 "/path/to/julia/share/julia"

Fixes #51439

Motivation

Setting JULIA_DEPOT_PATH is often used to isolate/decouple Julia environments (citation needed). When setting it to a path like /foo/bar:, the resulting depot array however still includes the default user depot (~/.julia), which isn't great for proper isolation.

While it is possible to set it to only /foo/bar instead, which achieves proper isolation, that also excludes the system depot. That wasn't much of a problem in the past, but now that we're moving out stdlibs from the system image, using the bundled cache files is essential for a good experience.

Proposed change

Instead of requiring users to manually craft a LOAD_PATH, I propose to slightly change how the JULIA_DEPOT_PATH env var is interpreted, omitting the default user depot when specifying a path. For example:

# old behavior
❯ JULIA_DEPOT_PATH=/foo: \
  julia-1.10 -e 'display(DEPOT_PATH)'
4-element Vector{String}:
 "/foo"
 "/home/tim/.julia"
 "/path/to/julia/local/share/julia"
 "/path/to/julia/share/julia"

# new behavior
❯ JULIA_DEPOT_PATH=/foo: \
  julia-pr -e 'display(DEPOT_PATH)'
3-element Vector{String}:
 "/foo"
 "/path/to/julia/local/share/julia"
 "/path/to/julia/share/julia"

Notice how ~/.julia is now omitted because I included /foo in the JULIA_DEPOT_PATH.

This does not change any of the other (documented) properties of JULIA_DEPOT_PATH, and should also not affect the ability to layer depots over one another by simply prepending/appending to the environment variable:

# only a path
❯ JULIA_DEPOT_PATH=/foo \
  julia-pr -e 'display(DEPOT_PATH)'
1-element Vector{String}:
 "/foo"

# empty string -> empty depot
❯ JULIA_DEPOT_PATH= \
  julia-pr -e 'display(DEPOT_PATH)'
String[]

# no path specified -> empty string still expands to user depot
❯ JULIA_DEPOT_PATH=: \
  julia-pr -e 'display(DEPOT_PATH)'
3-element Vector{String}:
 "/home/tim/.julia"
 "/path/to/julia/local/share/julia"
 "/path/to/julia/share/julia"

@maleadt maleadt added speculative Whether the change will be implemented is speculative packages Package management and loading labels Sep 25, 2023
@DilumAluthge
Copy link
Member

I also tweaked the processing of JULIA_DEPOT_PATH to not simply bail out when the variable is empty, but ensure it's populated with something useful.

What is the "something useful" in this PR?

IMO, the best thing to do if JULIA_DEPOT_PATH is empty would be to set Base.DEPOT_PATH to [mktempdir()].

@maleadt
Copy link
Member Author

maleadt commented Sep 25, 2023

What is the "something useful" in this PR?

The same behavior as not setting the variable, i.e., ~/.julia. Setting it to a temporary directory would be fine too.

@vchuravy
Copy link
Member

I think what Tim wants is a way to say. Remove the user-depot from the search path.

So instead of the current.

vchuravy@odin ~> JULIA_DEPOT_PATH=/tmp: julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.3 (2023-08-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> DEPOT_PATH
4-element Vector{String}:
 "/tmp"
 "/home/vchuravy/.julia"
 "/home/vchuravy/.julia/juliaup/j" ⋯ 21 bytes ⋯ "x64.linux.gnu/local/share/julia"
 "/home/vchuravy/.julia/juliaup/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia"
julia> DEPOT_PATH
3-element Vector{String}:
 "/tmp"
 "/home/vchuravy/.julia/juliaup/j" ⋯ 21 bytes ⋯ "x64.linux.gnu/local/share/julia"
 "/home/vchuravy/.julia/juliaup/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia"

And IIUC we currently don't have a way to express that,
in contrast to LOAD_PATH where one could write /tmp:@stdlib.

@KristofferC
Copy link
Member

KristofferC commented Sep 25, 2023

I agree that this is a good idea. These are just cache files and if you find one that is valid it is as good as the other and we know with a very high likelihood that the ones for the stdlibs will be valid.

And when we start bundling the binary dependencies we have as proper artifacts we also don't want to reinstall them if someone modifies the depot path like this.

@maleadt
Copy link
Member Author

maleadt commented Sep 25, 2023

I think what Tim wants is a way to say. Remove the user-depot from the search path.

Yes, but without having to specify JULIA_LOAD_PATH="/path/to/depot:@bundled" or so. I mean, I'd be happy with that as well, but I'm thinking that it shouldn't be necessary because we already load code from the bundled depot anyway:

❯ export JULIA_DEPOT_PATH=$(mktemp -d)

# this has to precompile, because we didn't want anything to load from the bundled depot
julia> using Test
[ Info: Precompiling Test [8dfed614-e22c-5e08-85e1-65c5234f0b40]

# ... but it *did* just load the code from the bundled depot anyway
julia> pathof(Test)
"/path/to/julia-1.10.0-beta2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Test/src/Test.jl"

So @DilumAluthge, you mentioned that if I set JULIA_DEPOT_PATH="/foo" I would only want /foo to be used, but as per the above that isn't ever the case. I don't see why you'd only want the source code to be loaded from the bundled depot, but not the caches.

@DilumAluthge
Copy link
Member

Ahh, I see what you're saying - we always load the stdlib source code regardless of the value of the depot path.

@DilumAluthge
Copy link
Member

if you find one that is valid it is as good as the other

But currently (at least on Julia master) we don't actually verify that the stdlib cachefiles are valid, we just skip the verification and accept them.

@vchuravy
Copy link
Member

But currently (at least on Julia master) we don't actually verify that the stdlib cachefiles are valid, we just skip the verification and accept them.

Which makes #49866 ever more pressing, but the question at hand here is orthogonal.
We have an impedance mismatch between JULIA_DEPOT_PATH for environments and JULIA_DEPOT_PATH for artifacts/compiled/packages. For the latter as @KristofferC said there is no harm in re-using them if they are in a standard location, but for the former we need the ability to opt-out.

@vchuravy
Copy link
Member

I think this change is good, but it is technically a minor change and I need in the past at least once use it to undue an overzealous HPC admin puting a v#.# environment into one of the default depots.

@vchuravy vchuravy added needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels Sep 25, 2023
@DilumAluthge
Copy link
Member

but for the former we need the ability to opt-out.

It seems like a pain to need to opt-out of the environments.

@maleadt Is there a way to modify this PR so that it only applies to artifacts/compiled/packages?

@DilumAluthge
Copy link
Member

We have an impedance mismatch between JULIA_DEPOT_PATH for environments and JULIA_DEPOT_PATH for artifacts/compiled/packages.

Could we "split" JULIA_DEPOT_PATH into two separate environment variables, one for the former and one for the latter?

@DilumAluthge DilumAluthge added the minor change Marginal behavior change acceptable for a minor release label Sep 25, 2023
@maleadt
Copy link
Member Author

maleadt commented Sep 26, 2023

@maleadt Is there a way to modify this PR so that it only applies to artifacts/compiled/packages?
...
Could we "split" JULIA_DEPOT_PATH into two separate environment variables, one for the former and one for the latter?

IIUC you're proposing to split DEPOT_PATH? That seems like a much more invasive change that needs more discussion. I also don't think that the bundled depot path deserves an environment variable.

Anyway, I pushed a change like that, introducing BUNDLED_DEPOT_PATH (initialized to the depot that's derived from BINDIR), and using it for looking up cache files. Ideally we only use the bundled depot path to only look for resources we know are bundled (i.e. cache files for stdlibs, and not for all packages), but I'm not sure if code loading knows about that. Conversely, it may also be interesting to remove the bundled directories from the actual DEPOT_PATH, but seeing the docstring that would be a (minor) breaking change.

I'm not sure I like this change; the simplicity of just pushing a bundled depot to the actual DEPOT_PATH seemed much easier to reason about, and doesn't require adapting several callers (code loading, Artifacts.jl, ...) to include BUNDLED_DEPOT_PATH in their look-up now. Why wouldn't we want that? Cachefile validation seems like the only concern, but is orthogonal, and a BUNDLED_DEPOT_PATH that takes precedence only makes that worse.

@vchuravy
Copy link
Member

Currently:

➜  julia JULIA_DEPOT_PATH="" ./julia -e "@show DEPOT_PATH"
DEPOT_PATH = String[]
➜  julia JULIA_DEPOT_PATH=":" ./julia -e "@show DEPOT_PATH"
DEPOT_PATH = ["/home/vchuravy/.julia", "/home/vchuravy/builds/julia/usr/local/share/julia", "/home/vchuravy/builds/julia/usr/share/julia"]

I also tweaked the processing of JULIA_DEPOT_PATH to not simply bail out when the variable is empty, but ensure it's populated with the default configuration (~/.julia). Again, I don't see much use for the current behavior, which results in an entirely broken Julia session:

Does this mean that JULIA_DEPOT_PATH="" ./julia -e "@show DEPOT_PATH" will be ["/home/vchuravy/.julia", "/home/vchuravy/builds/julia/usr/local/share/julia", "/home/vchuravy/builds/julia/usr/share/julia"]?
I think it would be more consistent if it were ["/home/vchuravy/builds/julia/usr/local/share/julia", "/home/vchuravy/builds/julia/usr/share/julia"].

@maleadt
Copy link
Member Author

maleadt commented Sep 26, 2023

Currently:

➜  julia JULIA_DEPOT_PATH="" ./julia -e "@show DEPOT_PATH"
DEPOT_PATH = String[]
➜  julia JULIA_DEPOT_PATH=":" ./julia -e "@show DEPOT_PATH"
DEPOT_PATH = ["/home/vchuravy/.julia", "/home/vchuravy/builds/julia/usr/local/share/julia", "/home/vchuravy/builds/julia/usr/share/julia"]

Yeah, and that is a bit absurd, no? : is just a separator, and otherwise has no special meaning, so essentially using a single empty entry results in no depot, while two empty entries populates it with the default selection.

I think it would be more consistent if it were ["/home/vchuravy/builds/julia/usr/local/share/julia", "/home/vchuravy/builds/julia/usr/share/julia"].

That would end up writing in your build tree, or in BINDIR, as soon as you perform Pkg operations though. I'm not sure that's wanted? It seems more sane to have the first entry be populated with a writable directory, it be the default homedir, or a temporary directory as suggested by @DilumAluthge.

@JBlaschke
Copy link

Hi Folks,

just wanted to weigh in here. I think anything that makes the behaviour of Julia unambiguous and easy to control will get my vote.

I have a question: what are in the stdlibs? Is it a bunch of .so files? Is it a lot of Julia source? I'm trying to place the stdlib somewhere on a scale between statically-linked executable and a python environment. Depending on where Julia lands, we'll need to prioritize deploying Julia in containers or in /tmp (the way we have to for python on HPC).

Also: Julia is relocatable (in contrast to python). This is a feature the I -- and other HPC users -- like very much. How is the DEPOT_PATH automatically populated with the stdlib location? Does that happen at run time (i.e. it would notice if the Julia install is copied to another location?) or would that be stored in a config file somewhere.

Another question: On HPC it's important to overwrite the location of caches, etc (e.g. some systems set the homes to be RO on the computes). Do we continue to have this ability after this change? (I assume so, eg. the BINDIR ultimately comes from uv_exepath, but I want to make sure that this continues to be the case).

And finally, I would like to propose an addition: can we extend Preferences.jl to also control things like cache, artifact, package locations? That way we have the JULIA_DEPOT_PATH is the "main" way to control these things, and fine-grained control via Preferences.

@vchuravy
Copy link
Member

Yeah, and that is a bit absurd, no? : is just a separator, and otherwise has no special meaning,

: is defined as the prepend/append operation for LOAD_PATH/DEPOT_PATH.

I may want to run in a read-only scenario. Which currently I can.

As a note juliaup uses the depot path as well, to obtain a location for install Julia too.

@DilumAluthge
Copy link
Member

As a note juliaup uses the depot path as well, to obtain a location for install Julia too.

Thankfully this will be fixed eventually (soon?). Juliaup will switch to using JULIAUP_DEPOT_PATH and will no longer check JULIA_DEPOT_PATH.

@vchuravy
Copy link
Member

how is the DEPOT_PATH automatically populated with the stdlib location? Does that happen at run time (i.e. it would notice if the Julia install is copied to another location?) or would that be stored in a config file somewhere.

Yeah that is a pure runtime decision. See

function append_default_depot_path!(DEPOT_PATH)

Currently the depot itself is not fully relocatable, but #49866 should be a big step in that direction.

Another question: On HPC it's important to overwrite the location of caches, etc (e.g. some systems set the homes to be RO on the computes). Do we continue to have this ability after this change? (I assume so, eg. the BINDIR ultimately comes from uv_exepath, but I want to make sure that this continues to be the case).

So this is a bit tricky; the idea of this PR is to always add the "bundled with Julia" depots to be added to your depot path. E.g. you will no longer be able to opt-out of it. But you will of course be able to insert a depot or change the user_depot to be located in a different place. As an example if you moved the system depot somewhere else, you could insert that location first.

And finally, I would like to propose an addition: can we extend Preferences.jl to also control things like cache, artifact, package locations? That way we have the JULIA_DEPOT_PATH is the "main" way to control these things, and fine-grained control via Preferences.

Yeah that's a long-term goal of ours, but the problem is that a lot of these environment flags are read early and sometimes even from C so the whole preference loading infrastructure would need to be able to be already present. So there is a bit of a ordering and boot-strapping challenge.

@KristofferC
Copy link
Member

I'm not sure of the exact state of this PR (and maybe it has been discussed) but if you are running a centralized installed Julia that also has registries installed in it, you might not want to use those registries if you set DEPOT_PATH=/tmp/my_depot. So that is something that exists in the DEPOT_PATH that is more than just a cache, it actually changes behavior.

@JBlaschke
Copy link

JBlaschke commented Sep 27, 2023

Thanks for clarifying all. I'll keep and eye on this and run some experiments.

For the record: my questions are motivated by more than just the interaction with centralized depots. Rather HPC is possibly heading towards a state where running code on the compute nodes from the home directory is becoming untenable. The usual compromise is that home directories are just read only (eg. Summit, or NERSC + Containers). But I've had conversations which go along the lines of "just tar it all up and use sbcast".

So the other thing I've been thinking of is: rather than asking users to stop thinking of $HOME as the place where "you keep your stuff", I am exploring the possibility of automatically staging applications to /tmp.

Hence I'll have to find some time to do testing.

@maleadt
Copy link
Member Author

maleadt commented Sep 28, 2023

My conclusion from the comments here and further offline discussion is that JULIA_DEPOT_PATH='' selecting an empty depot is useful, as are the semantics that one empty string is different from two empty strings, so I've put all that back. The only remaining change is now that an empty string in JULIA_DEPOT_PATH (again, when JULIA_DEPOT_PATH itself isn't just the empty string) now expands to the bundled (bindir-based) depot, excluding the default, homedir-based one:

❯ JULIA_DEPOT_PATH=/foo: \
  julia-pr -e 'display(DEPOT_PATH)'
3-element Vector{String}:
 "/foo"
 "/path/to/julia/local/share/julia"
 "/path/to/julia/share/julia"

❯ JULIA_DEPOT_PATH=/foo: \
  julia-1.10 -e 'display(DEPOT_PATH)'
4-element Vector{String}:
 "/foo"
 "/home/tim/.julia"
 "/path/to/julia/local/share/julia"
 "/path/to/julia/share/julia"

So it retains all of the following:

❯ JULIA_DEPOT_PATH=/foo \
  julia-pr -e 'display(DEPOT_PATH)'
1-element Vector{String}:
 "/foo"

❯ JULIA_DEPOT_PATH= \
  julia-pr -e 'display(DEPOT_PATH)'
String[]

❯ JULIA_DEPOT_PATH=: \
  julia-pr -e 'display(DEPOT_PATH)'
3-element Vector{String}:
 "/home/tim/.julia"
 "/path/to/julia/local/share/julia"
 "/path/to/julia/share/julia"

That also means this PR only partially fixes #51439, as just setting JULIA_DEPOT_PATH to a path (omitting the empty entry) will not result in the bundled caches being used, but I don't see a way to fix that while remaining compatible with all of the current semantics of the environment var (without special-casing code loading to use bundled cache files, which I think will only come to bite us when we start bundling more things and expect the bundled depot to function like, well, an actual depot). That doesn't feel very satisfying, since we will still load source code from the bundled depot (i.e. regardless of the configured depot path), but maybe that will change in the future once stdlibs become upgradable.

TL;DR: use JULIA_DEPOT_PATH=/foo: to use a custom depot and still support loading bundled cache files.

@JBlaschke
Copy link

This looks good -- thanks @maleadt

@maleadt maleadt changed the title RFC: When using a custom depot, still look in bundled directories. RFC: When setting JULIA_DEPOT_PATH, exclude the default user depot Sep 28, 2023
@maleadt maleadt removed needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels Sep 28, 2023
@maleadt maleadt changed the title RFC: When setting JULIA_DEPOT_PATH, exclude the default user depot RFC: When setting JULIA_DEPOT_PATH to a path, omit the default user depot Sep 28, 2023
@maleadt
Copy link
Member Author

maleadt commented Sep 28, 2023

Now includes docs and news. I've updated the top post with a new explanation, so this is ready for review again.

@maleadt
Copy link
Member Author

maleadt commented Oct 17, 2023

Could this get another round of review? @vchuravy @KristofferC
The PR has been simplified to the point where I don't think it should be controversial. It doesn't solve the 'problem' of JULIA_DEPOT_PATH=/foo not finding the bundled precompilation caches for stdlibs, but at least it makes it possible to recommend JULIA_DEPOT_PATH=/foo: (without that getting polluted by the homedir depot).

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would like @StefanKarpinski to chime in since I feel he has the best mental model here.

Instead of expanding to the default (homedir-based) + bundled (bindir-based) depot,
only use the bundled depot, so the env var can be used to select a different depot
while still being able to load bundled cache files.
@DilumAluthge DilumAluthge changed the title RFC: When setting JULIA_DEPOT_PATH to a path, omit the default user depot RFC: When setting JULIA_DEPOT_PATH to /path:, omit the default user depot Oct 19, 2023
@maleadt maleadt changed the title RFC: When setting JULIA_DEPOT_PATH to /path:, omit the default user depot When setting JULIA_DEPOT_PATH to /path:, omit the default user depot Oct 24, 2023
@maleadt
Copy link
Member Author

maleadt commented Oct 26, 2023

@StefanKarpinski Can you review?

@fredrikekre
Copy link
Member

So to get the previous behavior you do something like JULIA_DEPOT_PATH=/foo:~/.julia: then?

@maleadt
Copy link
Member Author

maleadt commented Oct 26, 2023

So to get the previous behavior you do something like JULIA_DEPOT_PATH=/foo:~/.julia: then?

Yes, JULIA_DEPOT_PATH=/foo: now becomes JULIA_DEPOT_PATH=/foo:~/.julia:.
JULIA_DEPOT_PATH=/foo always set the DEPOT_PATH to only /foo, and that's unchanged.

@vchuravy vchuravy added this to the 1.11 milestone Nov 13, 2023
@maleadt
Copy link
Member Author

maleadt commented Dec 22, 2023

I keep on running into this (with PkgEval, creduce_julia, ... where I need isolated depots, which currently causes excessive compilation now that packages are being moved out of the system image). Given the approvals, let's try this out. I'll keep an eye on any fallout, but I don't think this should cause any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor change Marginal behavior change acceptable for a minor release packages Package management and loading speculative Whether the change will be implemented is speculative
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pkg/REPL excision causes precompilation when using custom DEPOT_PATH
6 participants