Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible segmentation fault during build on linux/arm64 #228

Open
MilesCranmer opened this issue Nov 15, 2022 · 7 comments
Open

Reproducible segmentation fault during build on linux/arm64 #228

MilesCranmer opened this issue Nov 15, 2022 · 7 comments

Comments

@MilesCranmer
Copy link

MilesCranmer commented Nov 15, 2022

I'm trying to build docker images for PySR (which is built on PyJulia), and the arm64 jobs fail consistently because of a segmentation fault when building Conda.jl. The amd64 jobs are fine.

Here's the traceback:
#15 69.87     Building Conda ─→ `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/6e47d11ea2776bc5627421d59cdcc1296c058071/build.log`
#15 84.11 ERROR: LoadError: Error building `Conda`: 
#15 94.97 
#15 94.97 signal (11): Segmentation fault
#15 94.97 in expression starting at /root/.julia/packages/Conda/x2UxR/deps/build.jl:106
#15 94.97 top-level scope at /root/.julia/packages/Conda/x2UxR/deps/build.jl:106
#15 94.97 jl_toplevel_eval_flex at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/toplevel.c:897
#15 94.97 jl_toplevel_eval_flex at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/toplevel.c:850
#15 94.97 ijl_toplevel_eval_in at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/toplevel.c:965
#15 94.97 eval at ./boot.jl:368 [inlined]
#15 94.97 include_string at ./loading.jl:1428
#15 94.97 _jl_invoke at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
#15 94.97 ijl_apply_generic at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/gf.c:2549
#15 94.97 _include at ./loading.jl:1488
#15 94.97 include at ./client.jl:476
#15 94.97 unknown function (ip: 0x55170ff553)
#15 94.97 _jl_invoke at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
#15 94.97 ijl_apply_generic at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/gf.c:2549
#15 94.97 jl_apply at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
#15 94.97 do_call at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/interpreter.c:126
#15 94.97 eval_value at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/interpreter.c:215
#15 94.97 eval_stmt_value at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/interpreter.c:166 [inlined]
#15 94.97 eval_body at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/interpreter.c:612
#15 94.97 jl_interpret_toplevel_thunk at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/interpreter.c:750
#15 94.97 jl_toplevel_eval_flex at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/toplevel.c:906
#15 94.97 jl_toplevel_eval_flex at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/toplevel.c:850
#15 94.97 ijl_toplevel_eval_in at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/toplevel.c:965
#15 94.97 eval at ./boot.jl:368 [inlined]
#15 94.97 exec_options at ./client.jl:276
#15 94.97 _start at ./client.jl:522
#15 94.97 jfptr__start_49[479](https://github.com/MilesCranmer/PySR/actions/runs/3474728580/jobs/5808212454#step:7:482) at /opt/julia/lib/julia/sys.so (unknown line)
#15 94.97 _jl_invoke at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/gf.c:2367 [inlined]
#15 94.97 ijl_apply_generic at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/gf.c:2549
#15 94.97 jl_apply at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/julia.h:1839 [inlined]
#15 94.97 true_main at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/jlapi.c:575
#15 94.97 jl_repl_entrypoint at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/src/jlapi.c:719
#15 94.97 main at /cache/build/default-armageddon-0/julialang/julia-release-1-dot-8/cli/loader_exe.c:59
#15 94.97 __libc_start_main at /lib/aarch64-linux-gnu/libc.so.6 (unknown line)
#15 94.97 _start at /opt/julia/bin/julia (unknown line)
#15 94.97 _start at /opt/julia/bin/julia (unknown line)
#15 94.97 Allocations: 873[483](https://github.com/MilesCranmer/PySR/actions/runs/3474728580/jobs/5808212454#step:7:486) (Pool: 872903; Big: 580); GC: 1
#15 94.99 Stacktrace:
#15 94.99   [1] pkgerror(msg::String)
#15 95.36     @ Pkg.Types /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Types.jl:67
#15 95.49   [2] (::Pkg.Operations.var"#66#73"{Bool, Pkg.Types.Context, String, Pkg.Types.PackageSpec, String})()
#15 95.67     @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1060
#15 95.67   [3] withenv(::Pkg.Operations.var"#66#73"{Bool, Pkg.Types.Context, String, Pkg.Types.PackageSpec, String}, ::Pair{String, String}, ::Vararg{Pair{String}})
#15 96.24     @ Base ./env.jl:172
#15 96.25   [4] (::Pkg.Operations.var"#107#112"{String, Bool, Bool, Bool, Pkg.Operations.var"#66#73"{Bool, Pkg.Types.Context, String, Pkg.Types.PackageSpec, String}, Pkg.Types.PackageSpec})()
#15 96.25     @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1619
#15 96.25   [5] with_temp_env(fn::Pkg.Operations.var"#107#112"{String, Bool, Bool, Bool, Pkg.Operations.var"#66#73"{Bool, Pkg.Types.Context, String, Pkg.Types.PackageSpec, String}, Pkg.Types.PackageSpec}, temp_env::String)
#15 96.25     @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1[493](https://github.com/MilesCranmer/PySR/actions/runs/3474728580/jobs/5808212454#step:7:496)
#15 96.25   [6] (::Pkg.Operations.var"#105#110"{Dict{String, Any}, Bool, Bool, Bool, Pkg.Operations.var"#66#73"{Bool, Pkg.Types.Context, String, Pkg.Types.PackageSpec, String}, Pkg.Types.Context, Pkg.Types.PackageSpec, String, Pkg.Types.Project, String})(tmp::String)
#15 96.25     @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1582
#15 96.25   [7] mktempdir(fn::Pkg.Operations.var"#105#110"{Dict{String, Any}, Bool, Bool, Bool, Pkg.Operations.var"#66#73"{Bool, Pkg.Types.Context, String, Pkg.Types.PackageSpec, String}, Pkg.Types.Context, Pkg.Types.PackageSpec, String, Pkg.Types.Project, String}, parent::String; prefix::String)
#15 96.26     @ Base.Filesystem ./file.jl:764
#15 96.26   [8] mktempdir(fn::Function, parent::String) (repeats 2 times)
#15 96.26     @ Base.Filesystem ./file.jl:760
#15 96.26   [9] sandbox(fn::Function, ctx::Pkg.Types.Context, target::Pkg.Types.PackageSpec, target_path::String, sandbox_path::String, sandbox_project_override::Pkg.Types.Project; preferences::Dict{String, Any}, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
#15 96.27     @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1540
#15 96.27  [10] build_versions(ctx::Pkg.Types.Context, uuids::Set{Base.UUID}; verbose::Bool)
#15 96.27     @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1041
#15 96.27  [11] build_versions
#15 96.27     @ /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:956 [inlined]
#15 96.27  [12] add(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}, new_git::Set{Base.UUID}; preserve::Pkg.Types.PreserveLevel, platform::Base.BinaryPlatforms.Platform)
#15 96.28     @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1286
#15 96.29  [13] add(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; preserve::Pkg.Types.PreserveLevel, platform::Base.BinaryPlatforms.Platform, kwargs::Base.Pairs{Symbol, Base.PipeEndpoint, Tuple{Symbol}, NamedTuple{(:io,), Tuple{Base.PipeEndpoint}}})
#15 96.58     @ Pkg.API /opt/julia/share/julia/stdlib/v1.8/Pkg/src/API.jl:275
#15 96.59  [14] add(pkgs::Vector{Pkg.Types.PackageSpec}; io::Base.PipeEndpoint, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
#15 96.74     @ Pkg.API /opt/julia/share/julia/stdlib/v1.8/Pkg/src/API.jl:156
#15 96.74  [15] add(pkgs::Vector{Pkg.Types.PackageSpec})
#15 96.75     @ Pkg.API /opt/julia/share/julia/stdlib/v1.8/Pkg/src/API.jl:145
#15 96.75  [16] #add#27
#15 96.75     @ /opt/julia/share/julia/stdlib/v1.8/Pkg/src/API.jl:144 [inlined]
#15 96.75  [17] add
#15 96.75     @ /opt/julia/share/julia/stdlib/v1.8/Pkg/src/API.jl:144 [inlined]
#15 96.75  [18] #add#26
#15 96.75     @ /opt/julia/share/julia/stdlib/v1.8/Pkg/src/API.jl:143 [inlined]
#15 96.75  [19] add(pkg::String)
#15 96.75     @ Pkg.API /opt/julia/share/julia/stdlib/v1.8/Pkg/src/API.jl:143
#15 96.75  [20] top-level scope
#15 96.75     @ /usr/local/lib/python3.10/site-packages/julia/install.jl:118
#15 96.75 in expression starting at /usr/local/lib/python3.10/site-packages/julia/install.jl:73
#15 96.81 Traceback (most recent call last):
#15 96.81   File "<string>", line 1, in <module>
#15 96.81   File "/pysr/pysr/julia_helpers.py", line 79, in install
#15 96.82     julia.install(quiet=quiet)
#15 96.82   File "/usr/local/lib/python3.10/site-packages/julia/tools.py", line 118, in install
#15 96.82     raise PyCallInstallError("Installing", output)
#15 96.82 julia.tools.PyCallInstallError: Installing PyCall failed.
#15 96.82 
#15 96.82 ** Important information from Julia may be printed before Python's Traceback **
#15 96.82 
#15 96.82 Some useful information may also be stored in the build log file
#15 96.82 `~/.julia/packages/PyCall/*/deps/build.log`.

Here's the job result, the dockerfile, and the action file. This same error occurs every time I run the job.

  • Julia version: 1.8.2
  • Python version: 3.10.8
  • OS: ubuntu-latest
  • Base docker image: python:latest (platform=linux/arm64)

The line it's getting a segfault on in build.jl:

for depsfile in ("deps.jl", condadeps)

Any idea what this is? @mkitti would you happen to know?

@MilesCranmer
Copy link
Author

This is the C code where it crashes:

        size_t world = jl_atomic_load_acquire(&jl_world_counter);
        ct->world_age = world;
        if (!has_defs && jl_get_module_infer(m) != 0) {
            (void)jl_type_infer(mfunc, world, 0);
        }
        result = jl_invoke(/*func*/NULL, /*args*/NULL, /*nargs*/0, mfunc); // crashes
        ct->world_age = last_age;

https://github.com/JuliaLang/julia/blob/36034abf26062acad4af9dcec7c4fc53b260dbb4/src/toplevel.c#L897

@MilesCranmer
Copy link
Author

The last PR to change this line where it segfaulted was JuliaLang/julia#31984. @vtjnash @JeffBezanson any advice for how I could debug this? Or is this line unrelated?

@vtjnash
Copy link

vtjnash commented Nov 16, 2022

We are trying to call into the JIT there, and so perhaps LLVM is computing the jump address incorrectly? The stacktrace is not quite precisely clear enough what that value is that it crashed on. LLVM is planning some fixes for that for AARCH64 in JITLink in the upcoming release though.

@MilesCranmer
Copy link
Author

Thanks. Should I raise an issue on the main Julia repo or LLVM?

Here's a minimal dockerfile which gives the same error:

FROM julia:1.8.2
RUN julia -e 'using Pkg; Pkg.add("Conda"); Pkg.build("Conda")'

Another interesting clue is that I can actually build this just fine on my ARM-based laptop (M1). It's only when I try to build the arm64 architecture from an amd64 system (i.e., through docker/QEMU) that this error comes up. Does that offer any insight?

To reproduce this with GitHub actions, you could either build this locally on an x86_64 system, using docker build --platform=linux/arm64 -t test ..

Alternatively, you can create a GitHub action. First, create a Dockerfile in the root directory containing the above. Then, create a workflow file:

name: Docker test
on:
  push:
    branches:
      - "**"
jobs:
  docker:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        arch: [linux/amd64, linux/arm64]
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v2
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Build and push
        uses: docker/build-push-action@v3
        with:
          context: .
          platforms: ${{ matrix.arch }}
          push: false

(This could be combined with https://github.com/csexton/debugger-action to interact with it after failure.)

@vtjnash
Copy link

vtjnash commented Nov 16, 2022

The equivalent issue for M1 was fixed for arm64-darwin in the previous (old) release of LLVM, so that would make sense, so you would likely need to get a version of LLVM master working with Julia master before reporting it.

@schlichtanders
Copy link

I experience the same Segfault when simply precompiling the TimeZones package.
Same setup: multi-architecture build from amd64 host to arm64 target using qemu emulation.

@vtjnash can you point to further issues which could help solving this?

@vtjnash
Copy link

vtjnash commented Apr 24, 2023

JuliaLang/julia#45859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants