Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module import order in the source code is not maintained #45888

Closed
thautwarm opened this issue Jul 1, 2022 · 6 comments
Closed

module import order in the source code is not maintained #45888

thautwarm opened this issue Jul 1, 2022 · 6 comments

Comments

@thautwarm
Copy link
Member

thautwarm commented Jul 1, 2022

The related discourse post: loading a precompiled module does not respect the module import order.

P.S: not asking resources for this. I'll make a PR if this gets confirmed to be an issue.

Consequences

This issue affects the initialization order of package dependencies. The initialization order is inconsistent with that seen from the source code.

Description

Suppose we are making a Julia package TestLoadOrder which references two dependencies A and B, and the main module is created as follow:

module TestLoadOrder
import A
import B
end

It works when loading the package without precompilation, e.g., loading the package for the first time.

However, the issue may occur when the package TestLoadOrder is loaded from precompiled cache. If we expects A does some initialization tasks to configure how to load B (EDIT: see Real-world concerns below), it might not work because the loading order is NOT consistent with what we see in the source code.

The issue comes from the results of Base.parse_cache_header.

julia/base/loading.jl

Lines 2014 to 2022 in bd7bd5e

modules, (includes, requires), required_modules, srctextpos, prefs, prefs_hash = parse_cache_header(io)
id = isempty(modules) ? nothing : first(modules).first
modules = Dict{PkgId, UInt64}(modules)
# Check if transitive dependencies can be fulfilled
ndeps = length(required_modules)
depmods = Vector{Any}(undef, ndeps)
for i in 1:ndeps
req_key, req_build_id = required_modules[i]

Modules are loaded according to the order in required_modules::Vector{Pair{Base.PkgId, UInt64}} returned from Base.parse_cache_header, but the correct order of user imports seems kept in requires::Vector{Pair{Base.PkgId, Base.PkgId}}. These orders can be different.

As a consequence, the current module loading implementation may cause inconsistency when loading a precompiled module.

Reproduction

Environment:

bash> JULIA_DEPOT_PATH="../fake-depot" julia -q
julia> VERSION
v"1.7.2"

Repro steps:

  1. generate a package TestLoadOrder, and another two packages DepA, DepB as TestLoadOrder's dependencies.

    #= TestLoadOrder.jl =#
    module TestLoadOrder
    import DepA
    import DepB
    end
    
    #= DepA.jl =#
    module DepA
    function __init__()
        println("load A")
    end
    end
    
    #= DepB.jl =#
    module DepB
    function __init__()
        println("load B")
    end
    end
  2. add DepA and DepB to TestLoadOrder/Project.toml's [deps] section, and develop these three packages locally.

  3. open julia REPL, import TestLoadOrder, the output:

    julia> import TestLoadOrder
    [ Info: Precompiling TestLoadOrder [afdb10db-0077-474f-80e2-4859b3571cf1]
    load A
    load B
    load B
    load A
  4. close julia REPL and open again, import TestLoadOrder, now the precompiled caches are used:

    julia> import TestLoadOrder
    load B
    load A

As can bee seen, the initialization order is inconsistent with that seen from the source code.

Real-world concerns

In the Julia ecosystem, there are many packages that read environment configurations before initialization.

For instance, PyCall and PythonCall, which give us pretty smooth interoperability with Python, require configurations to initialize a Python interpreter. These kind developers from PyCall and PythonCall have to consider much about the "non-core" parts, i.e., finding the correct configurations such as Python executable paths/package install locations/in-process library address.

Frankly speaking, these two great packages never provide a solid solution to such configuration, although it is not their fault at all. For instance, recently I need to make a julia package that requires PythonCall(or PyCall), and this package should automatically and dynamically select the Python executable specified in my separate environment. However, I didn't find out how to achieve this with PyCall, and too much configuration outside Julia are neeeded if I use PythonCall (configuration in Julia is invalid due to this issue!). I cannot create a Julia package to do so, due to this issue.

IMHO, these awesome developers should have a chance to ignore on the "non-core" parts, and deliver the configuration job to other dedicated packages. This is now do-able with a __precompile__(false) package, but if we want more like bundling a sysimage, we'd better fix this issue to support more consistent precompilation loading.

@thautwarm
Copy link
Member Author

If this is not an issue, I'd change it to feature request..

@vtjnash
Copy link
Member

vtjnash commented Jul 1, 2022

Definitely not a valid assumption, as these packages need to ensure for themselves that they are loading in the correct order (alternatively instead, delayed initialization till first use is generally better than having __init__ code), but this does happen to be changed by #45861

@thautwarm
Copy link
Member Author

Definitely not a valid assumption, as these packages need to ensure for themselves that they are loading in the correct order.

Not sure if I understand the correct order. I'd call the loading order without any precompiled cache is the "correct" one. If one writes a module that requires A and then B, even if A requires B, the order is actually B -> A, but it is still what I want.

Will it cause issues if I sort required_modules to maintain the package order in requires?

@JeffBezanson
Copy link
Member

We can't guarantee the order of initializing modules that don't have a dependency relationship. For example, a user could independently import B manually, then load your package later. In that case there is no way to ensure A is initialized before B.

@thautwarm
Copy link
Member Author

thautwarm commented Jul 1, 2022

We can't guarantee the order of initializing modules that don't have a dependency relationship. For example, a user could independently import B manually, then load your package later. In that case there is no way to ensure A is initialized before B.

I think this is expected. Maybe I didn't express my idea properly. My request is to make sure the shallow loading order obey user require order. However, the process of the deeper module loading is maintained by themselves, so calling B from A or from outside packages is fine.

If a Julia package provides configurations used by independent packages, it should be the user's duty to firstly call the former package.

@vtjnash vtjnash closed this as completed Jul 5, 2022
@j-fu
Copy link

j-fu commented Jul 22, 2022

May be this is also linked to this issue:
PainterQubits/Unitful.jl#545
There we see different behavior between compiled and non-compiled (--compile=min) code, and between Julia version 1.8 vs older ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants