Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: Pkg3-style package loading #25333

Closed
wants to merge 11 commits into from
Closed

wip: Pkg3-style package loading #25333

wants to merge 11 commits into from

Conversation

StefanKarpinski
Copy link
Member

This works for julia --compiled-modules=no but fails in the presence of module precompilation, e.g. with the following output:

julia> import SHA
77394: find_package(Main, SHA)
77394: find_package_in_project(/Users/stefan/projects/Pkg3.jl/Project.toml, SHA)
77394: find_package_in_manifest(/Users/stefan/projects/Pkg3.jl/Manifest.toml, ea8e919c-243c-51af-8825-aaa63cd721ce, IOStream(<file /Users/stefan/projects/Pkg3.jl/Manifest.toml>))
77394: find_package(Main, SHA)
77394: find_package_in_project(/Users/stefan/projects/Pkg3.jl/Project.toml, SHA)
77394: find_package_in_manifest(/Users/stefan/projects/Pkg3.jl/Manifest.toml, ea8e919c-243c-51af-8825-aaa63cd721ce, IOStream(<file /Users/stefan/projects/Pkg3.jl/Manifest.toml>))
[ Info: Precompiling module SHA @ Base loading.jl:960
77395: find_package(SHA, Compat)
77395: find_package_in_manifest(/Users/stefan/projects/Pkg3.jl/Manifest.toml, ea8e919c-243c-51af-8825-aaa63cd721ce, Compat)
77395: find_package_in_manifest(/Users/stefan/projects/Pkg3.jl/Manifest.toml, Compat, IOStream(<file /Users/stefan/projects/Pkg3.jl/Manifest.toml>))
77395: find_package(SHA, Compat)
77395: find_package_in_manifest(/Users/stefan/projects/Pkg3.jl/Manifest.toml, ea8e919c-243c-51af-8825-aaa63cd721ce, Compat)
77395: find_package_in_manifest(/Users/stefan/projects/Pkg3.jl/Manifest.toml, Compat, IOStream(<file /Users/stefan/projects/Pkg3.jl/Manifest.toml>))
77394: find_package(Main, Compat)
77394: find_package_in_project(/Users/stefan/projects/Pkg3.jl/Project.toml, Compat)
┌ Warning: Module `SHA` declares `__precompile__(true)` but `require` failed
│ to create a usable precompiled cache file
│   exception = ErrorException("Required dependency Compat not found in current path.")
└ @ Base loading.jl:774
ERROR: Module SHA declares __precompile__(true) but require failed to create a usable precompiled cache file.
Stacktrace:
 [1] error at ./error.jl:33 [inlined]
 [2] _require(::Module, ::Symbol) at ./loading.jl:777
 [3] require(::Module, ::Symbol) at ./loading.jl:626

This is from the Pkg3 package since it's one of the only examples at this point of a project with Pkg3-style project and manifest files. @vtjnash: I think the ball is in your court here – I've tried to get this to work with module precompilation, but the logic of that code is beyond me. The fundamental problem seems to be that while Compat is loaded into SHA in the precompile child process and in the main process in the absence of module precompilation, when precompilation is on, the main process tries to load Compat into Main instead of SHA, which is incorrect and leads to a load failure.

The other major todo is to change the root module key to being a UUID instead of a name when a module is resolved by UUID, but that can be done in a separate PR since it's strictly internal and non-breaking (it adds the ability to have multiple root modules loaded with the same name).

end
end
SHA1(s::Union{String,SubString{String}}) = SHA1(hex2bytes(s))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just internal, we should not export this type or anything and should maybe replace it with something else, but for now it can live here and we can clean this stuff up later.

base/initdefs.jl Outdated
vers = "v$(VERSION.major).$(VERSION.minor)"
if haskey(ENV, "JULIA_LOAD_PATH")
prepend!(LOAD_PATH, split(ENV["JULIA_LOAD_PATH"], @static Sys.iswindows() ? ';' : ':'))
end
push!(LOAD_PATH, abspath(BINDIR, "..", "local", "share", "julia", "site", vers))
push!(LOAD_PATH, abspath(BINDIR, "..", "share", "julia", "site", vers))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently leaves the site paths in here and the second one is where stdlib lives, which we need. However, I think that we should probably have a Project.toml and Manifest.toml for Julia itself which includes these. The installed stdlib packages should probably be in a builtin/standard depot instead, but that's kind of a detail. We can always put relative paths into stdlib from the manifest file.

base/loading.jl Outdated
if info isa Tuple{String,String,UUID}
# horrible hack, but since we don't control module creation, ¯\_(ツ)_/¯
ccall(:jl_set_global, Cvoid, (Any, Any, Any), __toplevel__, ENVINFO, info[2:end])
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we constructed the module object to be loaded and then evaluated code into it, then we could explicitly set the ENVINFO name in there instead of doing nasty stuff like this. I think we should probably do that, but I didn't want to couple this change with a change like that, so I just hacked around it instead.

@@ -471,26 +804,23 @@ include_string(m::Module, txt::String, fname::String) =
include_string(m::Module, txt::AbstractString, fname::AbstractString="string") =
include_string(m, String(txt), String(fname))

function source_path(default::Union{AbstractString,Nothing}="")
function tls_recurse(key::Symbol, default=nothing)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was at some point trying passing ENVINFO via task local storage, which is why I refactored this, but I didn't end up using it. I'm leaving the refactoring in since it's good to separate the TLS recursion from source path lookup anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is deprecated (v0.7) anyways, I'm not sure it matters

base/loading.jl Outdated
const $ENVINFO = $envinfo
# same horrible hack as in _require
ccall(:jl_set_global, Cvoid, (Any, Any, Any),
Base.__toplevel__, $(Meta.quot(ENVINFO)), $envinfo)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows correct new-style module loading in the child precompile process. The failure is later, however, when the precompiled module needs to get loaded into the original parent process, which tries to load it into Main.


Get the root module enclosing `m`: this is the top-level module loaded by
`import` or `using` which ultimately create `m`. This root module is found by
iteratively taking a module's parent until self-parented module is found.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to leave this in the PR – it was something I wrote and then didn't end up using, but it's somewhat useful, so maybe we should leave it in.

@@ -1927,7 +1927,7 @@ static int size_isgreater(const void *a, const void *b)
return *(size_t*)b - *(size_t*)a;
}

static jl_value_t *read_verify_mod_list(ios_t *s, arraylist_t *dependent_worlds, jl_array_t *mod_list)
static jl_value_t *read_verify_mod_list(jl_module_t *from, ios_t *s, arraylist_t *dependent_worlds, jl_array_t *mod_list)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops: should this be called from or into? I'm not sure what this module is for. @JeffBezanson, @vtjnash?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is into since it should verify the module list with respect to the environment the module is loaded into.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't loading a module into an environment, it's just parsing a file and asserting that the header looks valid. @StefanKarpinski You should drop all changes to this file; my earlier PR to loading.jl moved everything into Julia that matters for Pkg3.

function init_load_path(BINDIR = Sys.BINDIR)
load_path = get(ENV, "JULIA_LOAD_PATH", "@|@v#.#.#|@v#.#|@v#|@default|@!v#.#")
Copy link
Member

@vchuravy vchuravy Dec 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other readers see JuliaLang/Pkg.jl#82 for context.
The string "@|@v#.#.#|@v#.#|@v#|@default|@!v#.#" turns into:

LOAD_PATH = Any[
    [ CurrentEnv(),
      NamedEnv(“v$(VERSION.major).$(VERSION.minor).$(VERSION.patch)”),
      NamedEnv(“v$(VERSION.major).$(VERSION.minor)”),
      NamedEnv(“v$(VERSION.major)”),
      NamedEnv(“default”),
      NamedEnv(“v$(VERSION.major).$(VERSION.minor)”, create=true) ]
]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to bikeshed this syntax, but this is what I ended up going with:

  • @ = current project environment found by looking in parent directories
  • @name = named environment if it already exists
  • @!name = named environment even if it doesn't already exist
  • # characters are replaced by the major, minor and patch version numbers in order

I decided not to introduce any escaping since you can always use .juliarc.jl to set the values programmatically. If you have an actual relative path name that starts with a @ you can always write it as ./@...

Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still trying to wrap my head around the implications of the environment. As far as I understand it is now possible to have the name of a package resolve to different packages/versions depending on the environment (and therefore) module the package is being loaded into. This leads me to think that we will need a cache-file per environment.

In general this looks good, even though it would benefit from comments. This is my third read and I am still not sure that I understand environments and their implications fully.

push!(empty!(DEPOT_PATH), map(expanduser, depots))
else
push!(DEPOT_PATH, joinpath(homedir(), ".julia"))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also add two paths under BINDIR similar to the two for LOAD_PATH

      push!(LOAD_PATH, abspath(BINDIR, "..", "local", "share", "julia", "site", vers))
      push!(LOAD_PATH, abspath(BINDIR, "..", "share", "julia", "site", vers))

base/loading.jl Outdated
end
elseif endswith(path, ".toml") # project file
info = find_package_in_project(path, name)
info != nothing && return info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info !== nothing

base/loading.jl Outdated
deps_str = uuid = nothing
for line in eachline(io)
if !in_deps
if ismatch(r"^\s*\[\s*\[\s*\w+\s*\]\s*\]\s*$", line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regexes are repeated for find_package_in_manifest(manifest_file, name, io).
I would suggest factor them out or refactoring the two methods.

base/loading.jl Outdated
for line in eachline(io)
# look for `$name = "$uuid"` entry
m = match(r"^(?:\s*(?:#.*)?|\s*(\w+)\s*=\s*\"(.*)\"\s*)$", line)
m == nothing && return nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m === nothing

Base.require(:Distributed)
Base.require(:Printf)
Base.require(:Future)
Base.require(Base, :Base64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be required into Base?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, aren't they?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes (I'm assuming this does not bind the name Base.Base64, right?)

src/toplevel.c Outdated
@@ -186,6 +190,7 @@ jl_value_t *jl_eval_module_expr(jl_module_t *parent_module, jl_expr_t *ex)
newm->parent = parent_module;
b->value = (jl_value_t*)newm;
jl_gc_wb_binding(b, newm);
// copy parent environment into submodule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems detached or is this a TODO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that comment is a stray, left from a previous iteration where the above code was here.

@@ -1927,7 +1927,7 @@ static int size_isgreater(const void *a, const void *b)
return *(size_t*)b - *(size_t*)a;
}

static jl_value_t *read_verify_mod_list(ios_t *s, arraylist_t *dependent_worlds, jl_array_t *mod_list)
static jl_value_t *read_verify_mod_list(jl_module_t *from, ios_t *s, arraylist_t *dependent_worlds, jl_array_t *mod_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is into since it should verify the module list with respect to the environment the module is loaded into.

@StefanKarpinski
Copy link
Member Author

I am still trying to wrap my head around the implications of the environment. As far as I understand it is now possible to have the name of a package resolve to different packages/versions depending on the environment (and therefore) module the package is being loaded into. This leads me to think that we will need a cache-file per environment.

In essence, in an explicit environment the project file assigns identities to top-level package names via a mapping from names to package UUIDs in the [deps] section of the project file. When loading indirect dependencies via a manifest file, names are mapped to UUIDs via individual [deps] section for each package that is loaded. That means that import A can load one package at the top level and a different thing while loading a dependency. This is necessary to make sure that dependencies keep working no matter what other name clashes may occur in the ecosystem.

Caching absolutely should happen on a per-environment level. However, only the first environment in LOAD_PATH is guaranteed to be loaded intact – the following environments may end up getting dependencies at versions that appear in earlier environments in LOAD_PATH. So all of the caching is really specific to the first environment in that sense.

base/loading.jl Outdated
import Base.Random: UUID

const SlugInt = UInt32 # max p = 4
const chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested and confirmed this is String(collect('A':'Z')) * String(collect('a':'z')) * String(collect('0':'9'))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is 78 characters versus 64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that @vtjnash's way is more obviously free of errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already updated this to String(['A':'Z'; 'a':'z'; '0':'9']) locally, which is shorter and clearer than either.

modpath = find_package(string(modname))
modpath === nothing && return ErrorException("Required dependency $modname not found in current path.")
mod = _require_search_from_serialized(modname, String(modpath))
info = find_package(into, string(modname))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this (this thing that is being found) isn't being loaded into into, it's being loaded into mod

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but also mod and modname aren't Module names, they're UUIDs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except they're not actual UUIDs, they're the 64-bit integers you're calling "UUIDs". Also, why is modname called modname if it's not a module name? Can you throw me a bone here and tell me how to get the module associated with mod instead of just telling me what's wrong with the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The into argument to _require is what's passed in here and as far as I can tell, it results in the correct module being passed in here. I think you may not be understanding what I'm trying to accomplish here. I've tried to explain and we've not gotten anywhere; this all works correctly in the absence of module precompilation, so if you could please make a RP on top of this one that fixes all the precompilation stuff, that would be great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, mod is the name of the module that needs to be created. into is the module into which the binding for mod is to be created when the process is done. So it seems to me that mod is not the module that should be passed as the into argument here. Why do you think it is?

@@ -471,26 +804,23 @@ include_string(m::Module, txt::String, fname::String) =
include_string(m::Module, txt::AbstractString, fname::AbstractString="string") =
include_string(m, String(txt), String(fname))

function source_path(default::Union{AbstractString,Nothing}="")
function tls_recurse(key::Symbol, default=nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is deprecated (v0.7) anyways, I'm not sure it matters

@StefanKarpinski
Copy link
Member Author

Unrelated LAPACK failure and a timeout. The previous run was all green; this is just a rebase.

@StefanKarpinski
Copy link
Member Author

I don't get how this passed any tests. It fails locally since loaded_modules is used but not defined anywhere.

@vtjnash
Copy link
Member

vtjnash commented Jan 3, 2018

None of them ran the tests (since it is not mergeable). Strange that they still post success.

also only store the UUID of each module, not the manifest path
If `path` is `nothing` the package wasn't found
If `uuid` is `nothing` the package has no UUID
@StefanKarpinski
Copy link
Member Author

Superseded by #25455.

@DilumAluthge DilumAluthge deleted the sk/reqfrom branch March 25, 2021 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants