Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia 1.11.0-1.11.3 hangs while testing/precompiling a basic Test project with julia_args = ["--threads=auto"] #56458

Open
stemann opened this issue Nov 5, 2024 · 18 comments

Comments

@stemann
Copy link

stemann commented Nov 5, 2024

Scheduled testing of a very basic Test project with --threads=auto started hanging three weeks ago - since Oct. 8, when Julia 1.11 was released.

The scheduled testing is using the julia:1 container (which became synonymous with julia:1.11 on Oct. 8) and the hang occurs while precompiling the test-project (when running tests).

The scheduled testing is simply running Pkg.test with --threads=auto for a project with only Test as a dependency: https://gitlab.com/stemann/julia-gitlab-ci-templates/-/tree/master/examples/Sample

Without --threads=auto (using Pkg.test(; coverage = true)), the testing completes without issues (of course): https://gitlab.com/stemann/julia-gitlab-ci-templates/-/jobs/8234833097

"Stack trace" (Julia-style):

  • Pkg.test(; coverage = true, julia_args = ["--threads=auto"])
julia --project -e '
        @info """
          Testing...
          CI_JULIA_TEST_THREADS: $(ENV["CI_JULIA_TEST_THREADS"])
          CI_JULIA_TEST_REPORTS: $(ENV["CI_JULIA_TEST_REPORTS"])
          Sys.CPU_THREADS: $(Sys.CPU_THREADS)
        """
# ...
        if !using_threads || VERSION < v"1.5"
          if using_threads
            ENV["JULIA_NUM_THREADS"] = Sys.CPU_THREADS
          end
          if !using_test_reports
            Pkg.test(; coverage = true)
          else
            TestReports.test(; coverage = true)
          end
        else
          if !using_test_reports
            Pkg.test(; coverage = true, julia_args = ["--threads=auto"])
          else
            TestReports.test(; coverage = true, julia_args = ["--threads=auto"])
          end
        end
      '
@stemann
Copy link
Author

stemann commented Nov 5, 2024

Vaguely related to #56345

@giordano
Copy link
Contributor

giordano commented Nov 5, 2024

Vaguely related to #56345

Does it mean this is fixed on master and #56228 (but that issue was using Distributed, not threads)?

@stemann
Copy link
Author

stemann commented Nov 5, 2024

It seems reproducible on an x86_64 macOS using Docker constrained to two CPU cores (similar to the GitLab CI SaaS agent):

$ docker run -it --rm -v $(pwd):/mnt -w /mnt julia:1 bash -c "while true; do date; sleep 60; done & julia --project -e '@show Sys.CPU_THREADS; using Pkg; Pkg.test(; coverage = true, julia_args = [\"--threads=auto\"])'"
Tue Nov  5 15:21:57 UTC 2024
Sys.CPU_THREADS = 2
  Installing known registries into `~/.julia`
       Added `General` registry to ~/.julia/registries
     Testing Sample
      Status `/tmp/jl_j26BPk/Project.toml`
  [a9065fac] Sample v0.1.0 `/mnt`
  [8dfed614] Test v1.11.0
      Status `/tmp/jl_j26BPk/Manifest.toml`
  [a9065fac] Sample v0.1.0 `/mnt`
  [2a0f44e3] Base64 v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization v1.11.0
  [8dfed614] Test v1.11.0
Precompiling project for configuration --code-coverage=@/mnt --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none --threads=auto...
Tue Nov  5 15:22:57 UTC 2024                         ]  0/1
Tue Nov  5 15:23:57 UTC 2024                         ]  0/1
Tue Nov  5 15:24:57 UTC 2024                         ]  0/1
  Progress [>                                        ]  0/1
  ◒ Sample

Also when constraining to 4 CPU cores:

$ docker run -it --rm -v $(pwd):/mnt -w /mnt julia:1 bash -c "while true; do date; sleep 60; done & julia --project -e '@show Sys.CPU_THREADS; using Pkg; Pkg.test(; coverage = true, julia_args = [\"--threads=auto\"])'"
Tue Nov  5 15:29:46 UTC 2024
Sys.CPU_THREADS = 4
  Installing known registries into `~/.julia`
       Added `General` registry to ~/.julia/registries
     Testing Sample
      Status `/tmp/jl_aowubp/Project.toml`
  [a9065fac] Sample v0.1.0 `/mnt`
  [8dfed614] Test v1.11.0
      Status `/tmp/jl_aowubp/Manifest.toml`
  [a9065fac] Sample v0.1.0 `/mnt`
  [2a0f44e3] Base64 v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization v1.11.0
  [8dfed614] Test v1.11.0
Precompiling project for configuration --code-coverage=@/mnt --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none --threads=auto...
Tue Nov  5 15:30:46 UTC 2024                         ]  0/1
Tue Nov  5 15:31:46 UTC 2024                         ]  0/1
Tue Nov  5 15:32:46 UTC 2024                         ]  0/1
Tue Nov  5 15:33:46 UTC 2024                         ]  0/1
Tue Nov  5 15:34:46 UTC 2024                         ]  0/1
  Progress [>                                        ]  0/1
  ◐ Sample

@stemann stemann changed the title Julia 1.11 hangs while testing/precompiling a basic Test project with --threads=auto Julia 1.11 hangs while testing/precompiling a basic Test project with --threads=auto when running on a machine with 2 CPU cores Nov 5, 2024
@stemann stemann changed the title Julia 1.11 hangs while testing/precompiling a basic Test project with --threads=auto when running on a machine with 2 CPU cores Julia 1.11 hangs while testing/precompiling a basic Test project with --threads=auto when running on a machine with 2 or 4 CPU cores Nov 5, 2024
@stemann
Copy link
Author

stemann commented Nov 5, 2024

Problem seems to be resolved on master:

$ docker run -it --rm -v $(pwd):/mnt -w /mnt debian:bookworm bash -c "apt-get update; apt-get --yes install curl; curl -fsSL https://install.julialang.org | sh -s -- --yes --default-channel nightly; . /root/.bashrc; julia --version; while true; do date; sleep 60; done & julia --project -e '@show Sys.CPU_THREADS; using Pkg; Pkg.test(; coverage = true, julia_args = [\"--threads=auto\"])'"
# ...
julia version 1.12.0-DEV
Tue Nov  5 15:41:55 UTC 2024
Sys.CPU_THREADS = 4
  Installing known registries into `~/.julia`
       Added `General` registry to ~/.julia/registries
    Updating registry at `~/.julia/registries/General.toml`
    Updating `/mnt/Project.toml`
  [8dfed614] ~ Test ⇒ v1.11.0
    Updating `/mnt/Manifest.toml`
  [2a0f44e3] + Base64 v1.11.0
  [b77e0a4c] + InteractiveUtils v1.11.0
  [dc6e5ff7] + JuliaSyntaxHighlighting v1.12.0
  [56ddb016] + Logging v1.11.0
  [d6f4376e] + Markdown v1.11.0
  [9a3f8284] + Random v1.11.0
  [ea8e919c] + SHA v0.7.0
  [9e88b42a] + Serialization v1.11.0
  [f489334b] + StyledStrings v1.11.0
  [8dfed614] ~ Test ⇒ v1.11.0
     Testing Sample
      Status `/tmp/jl_ltFcmE/Project.toml`
  [a9065fac] Sample v0.1.0 `/mnt`
  [8dfed614] Test v1.11.0
      Status `/tmp/jl_ltFcmE/Manifest.toml`
  [a9065fac] Sample v0.1.0 `/mnt`
  [2a0f44e3] Base64 v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [dc6e5ff7] JuliaSyntaxHighlighting v1.12.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization v1.11.0
  [f489334b] StyledStrings v1.11.0
  [8dfed614] Test v1.11.0
Precompiling for configuration --code-coverage=@/mnt --color=yes --check-bounds=yes --warn-overwrite=yes --depwarn=yes --inline=yes --startup-file=no --track-allocation=none --threads=auto
Precompiling packages finished.
  1 dependency successfully precompiled in 1 seconds. 8 already precompiled.
     Testing Running tests...
Test Summary: | Pass  Total  Time
Sample        |    1      1  1.3s
     Testing Sample tests passed 

@stemann stemann changed the title Julia 1.11 hangs while testing/precompiling a basic Test project with --threads=auto when running on a machine with 2 or 4 CPU cores Julia 1.11.0-1.11.1 hangs while testing/precompiling a basic Test project with --threads=auto when running on a machine with 2 or 4 CPU cores Nov 5, 2024
@stemann stemann changed the title Julia 1.11.0-1.11.1 hangs while testing/precompiling a basic Test project with --threads=auto when running on a machine with 2 or 4 CPU cores Julia 1.11.0-1.11.1 hangs while testing/precompiling a basic Test project with julia_args="--threads=auto" Nov 5, 2024
@stemann stemann changed the title Julia 1.11.0-1.11.1 hangs while testing/precompiling a basic Test project with julia_args="--threads=auto" Julia 1.11.0-1.11.1 hangs while testing/precompiling a basic Test project with julia_args = ["--threads=auto"] Nov 5, 2024
@stemann
Copy link
Author

stemann commented Nov 5, 2024

Alright - managed to narrow it down a bit more: The issue seems to not be related to the number of CPU cores available.

The simple invocation works

julia --threads=auto --project -e "using Pkg; Pkg.test(; coverage = true)"

But the "start Julia without threads, and then ask Pkg.test to run a Julia session with threads" approach hangs (on Julia 1.11.0-1.11.1):

julia --project -e "using Pkg; Pkg.test(; coverage = true, julia_args = [\"--threads=auto\"])"

@stemann
Copy link
Author

stemann commented Nov 5, 2024

Using

rm Manifest.toml; docker run -it --rm -v $(pwd):/mnt -w /mnt debian:bookworm bash -c "apt-get update; apt-get --yes install curl; curl -fsSL https://install.julialang.org | sh -s -- --yes --default-channel pr55704; . /root/.bashrc; julia --version; julia -e 'using InteractiveUtils; @show versioninfo()'; while true; do date; sleep 60; done & julia --project -e '@show Sys.CPU_THREADS; using Pkg; Pkg.test(; coverage = true, julia_args = [\"--threads=auto\"])'"

@stemann
Copy link
Author

stemann commented Nov 21, 2024

Pre-compile during test is still hanging on PR #56228 - tested just now after merge of #56228 (using juliaup channel pr56228).

@stemann
Copy link
Author

stemann commented Jan 23, 2025

Still hanging on 1.11.3.

Full MWE:

git clone https://github.com/JuliaLang/Example.jl.git
cd Example.jl
rm -f Manifest.toml
docker run -it --rm -v $(pwd):/mnt -w /mnt debian:bookworm bash -c "apt-get update; apt-get --yes install curl; curl -fsSL https://install.julialang.org | sh -s -- --yes --default-channel 1.11.3; . /root/.bashrc; julia --version; julia -e 'using InteractiveUtils; @show versioninfo()'; while true; do date; sleep 60; done & julia --project -e '@show Sys.CPU_THREADS; using Pkg; Pkg.test(; coverage = true, julia_args = [\"--threads=auto\"])'"

@stemann stemann changed the title Julia 1.11.0-1.11.1 hangs while testing/precompiling a basic Test project with julia_args = ["--threads=auto"] Julia 1.11.0-1.11.3 hangs while testing/precompiling a basic Test project with julia_args = ["--threads=auto"] Jan 23, 2025
@stemann
Copy link
Author

stemann commented Jan 23, 2025

Fixed in nightly (thanks for the tip, @giordano!)

@giordano
Copy link
Contributor

You said it above

@stemann
Copy link
Author

stemann commented Jan 23, 2025

Ha ha - true - and forgot about it :-)

@stemann
Copy link
Author

stemann commented Jan 23, 2025

Is there a good way to figure out which PR/commit fixed the issue?

@giordano
Copy link
Contributor

Git bisection

@stemann
Copy link
Author

stemann commented Jan 24, 2025

Seems like the issue was introduced in the release-1.11 branch between v1.11.0-alpha1, and v1.11.0-alpha2. According to a git bisection, it seems the issue was introduced in in the release-1.11 branch with commit 2d89fee (bump Pkg).

git bisect start
git bisect bad v1.11.0-alpha2
git bisect good v1.11.0-alpha1
git bisect run test.sh

where test.sh (with Example.jl in usr/share/Example.jl):

#!/bin/bash

set -e -x
trap 'exit 125' ERR # Exit with code 125 (equivalent to git bisect skip) if script fails ahead of actual test

make clean
make -j 2

rm -rf ~/.julia/compiled usr/share/Example.jl/Manifest.toml
git log -1 --oneline
./julia --eval '
    using InteractiveUtils
    using Pkg
    versioninfo()
'

set +e
trap - ERR

date -Iseconds
timeout 60s ./julia --project=usr/share/Example.jl --eval '
    @show VERSION
    @show Sys.CPU_THREADS
    using Pkg
    Pkg.test(; julia_args = ["--threads=auto"])
'
RESULT=$? # timeout will exit with code 124 if timeout expires
echo RESULT=$RESULT
date -Iseconds
exit $RESULT

@giordano
Copy link
Contributor

This is the corresponding diff in Pkg: JuliaLang/Pkg.jl@76070d2...e7d740a. Would you be able to bisect that one?

@stemann
Copy link
Author

stemann commented Jan 25, 2025

I'll try to look into bisecting the Pkg changes too.

The following Git bisection of master (julia) looking for the first commit with the issue points to commit c379db7 (Bump the Pkg stdlib from 76070d295 to 1f16df404).

git bisect start
# good: [aecd8fd379a53afa780bc8a8404728b6aa22d6bc] Propagate inbounds in isassigned with CartesianIndex indices (#53305)
git bisect good aecd8fd379a53afa780bc8a8404728b6aa22d6bc
# bad: [c4ab0d46a9c3a470794e5a441176bfdf8ff36e52] Increase build precompilation (#53682)
git bisect bad c4ab0d46a9c3a470794e5a441176bfdf8ff36e52
...
# good: [b18a62d624d4be2f0b30f6d7245a01bdc7c13d09] Fix formatting & typo in methods.md (#53486)
git bisect good b18a62d624d4be2f0b30f6d7245a01bdc7c13d09
# first bad commit: [c379db77135e55b71707704f58d901e78924804d] 🤖 [master] Bump the Pkg stdlib from 76070d295 to 1f16df404 (#53495)

This narrows it down to just three commits: JuliaLang/Pkg.jl@76070d2...1f16df4

Which seems to indicate it could be due to JuliaLang/Pkg.jl#3792 (do not start a new process for precompiling the test env), which is also supported by #53572

@stemann
Copy link
Author

stemann commented Jan 25, 2025

Git bisection of Pkg.jl master in an attempt to find the commit that fixed the issue (swapping the meaning of "good" and "bad") is a bit problematic - using the current julia nightly seems to not demonstrate the issue.

Using julia 1.11 (v1.11.3) is a bit inconclusive as much of Pkg.jl master seem to no longer be usable with 1.11. The last commit to demonstrate the issue is 9c6356fa

git bisect start
# bad: [938e9b24eebf1bb19613e8941dc8700e68249578] app support in Pkg (#3772)
git bisect bad 938e9b24eebf1bb19613e8941dc8700e68249578
# good: [e7e8ce38359330441b1340046add367761035f69] do not start a new process for precompiling the test env (#3792)
git bisect good e7e8ce38359330441b1340046add367761035f69
# skip: [938e9b24eebf1bb19613e8941dc8700e68249578] app support in Pkg (#3772)
git bisect skip 938e9b24eebf1bb19613e8941dc8700e68249578

Having Pkg.jl and Example.jl checked out side-by-side, work dir in Pkg.jl, and using the following test.sh script for running git bisect run test.sh 1.11:

#!/bin/bash

JULIA_VERSION=$1
JULIAUP_VERSION=${JULIA_VERSION/1.12/nightly}

set -e -x
trap 'cd ../Example.jl; git restore Project.toml; cd ../Pkg.jl; git restore Project.toml src/API.jl; exit 125' ERR # Exit with code 125 (equivalent to git bisect skip) if script fails ahead of actual test

sed -i '' -e 's/^uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"$/uuid = "54cfe95a-1eb2-52ea-b672-e2afdf69b78f"/' Project.toml
sed -i '' -e 's/^    julia_args = Cmd(julia_args)$/    @show julia_args = Cmd(julia_args)/' src/API.jl
git log -1 --oneline
git status -vv

rm -rf ~/.julia/compiled/v$JULIA_VERSION/{Example,Pkg}
cd ../Example.jl
rm -f Manifest.toml
git log -1 --oneline
git status -vv
timeout 240s julia +$JULIAUP_VERSION --project --threads=auto --eval '
    using Pkg
    Pkg.develop(; path = normpath(pwd(), "..", "Pkg.jl"))
    Pkg.resolve()
    Pkg.precompile()
'
timeout 240s julia +$JULIAUP_VERSION --project --eval '
    using Pkg
    Pkg.status()
    Pkg.test()
'
rm -rf ~/.julia/compiled/v$JULIA_VERSION/Example

set +e
trap - ERR

date -Iseconds
timeout 30s julia +$JULIAUP_VERSION --project --eval '
    using Pkg
    Pkg.status()
    Pkg.test(; julia_args = ["--threads=auto"])
'
EXIT_CODE=$?
date -Iseconds
git restore Project.toml
cd -

git restore Project.toml src/API.jl

if [ $EXIT_CODE -ne 0 ]; then
    exit 0
else
    exit 1
fi

@stemann
Copy link
Author

stemann commented Jan 25, 2025

Git bisection of the Pkg.jl release-1.11 branch seems to also not be conclusive - based on the script above (edit: from: #56458 (comment)) with a small change to revert the inversion of the exit code:

exit $EXIT_CODE
# if [ $EXIT_CODE -ne 0 ]; then
#     exit 0
# else
#     exit 1
# fi
git bisect start
# bad: [2eb8ae5b8f421fc77295f9cf382132d39e14d16f] [release-1.11] 1.11 backports (#4133)
git bisect bad 2eb8ae5b8f421fc77295f9cf382132d39e14d16f
# good: [76070d295fc4a1f27f852e05400bbc956962e084] Prevent repl crash on invalid command (#3800)
git bisect good 76070d295fc4a1f27f852e05400bbc956962e084

stemann added a commit to stemann/Pkg.jl that referenced this issue Jan 25, 2025
Fixes JuliaLang/julia#56458

Precompilation with threads is not supported: JuliaLang/julia#53572 (comment)

Ensures that tests run with threads, both when calling `julia --project --threads=auto --eval 'using Pkg; Pkg.test()', and when calling `julia --project --eval 'using Pkg; Pkg.test(; julia_args = ["--threads=auto"])'`.
stemann added a commit to stemann/Pkg.jl that referenced this issue Jan 25, 2025
Fixes JuliaLang/julia#56458

Precompilation with threads is not supported: JuliaLang/julia#53572 (comment)

Ensures that tests run with threads, both when calling `julia --project --threads=auto --eval 'using Pkg; Pkg.test()'`, and when calling `julia --project --eval 'using Pkg; Pkg.test(; julia_args = ["--threads=auto"])'`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants