Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More sporadic 1.11 hangs #412

Closed
christiangnrd opened this issue Aug 29, 2024 · 4 comments · Fixed by JuliaInterop/ObjectiveC.jl#43
Closed

More sporadic 1.11 hangs #412

christiangnrd opened this issue Aug 29, 2024 · 4 comments · Fixed by JuliaInterop/ObjectiveC.jl#43
Assignees

Comments

@christiangnrd
Copy link
Contributor

Now that 1.11 compilations are fixed, we're back to sporadic hanging of CI.

See example 1, example 2, example 3.

Bonus local output
(Metal) pkg> test
     Testing Metal
      Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_aiC8B7/Project.toml`
  [79e6a3ab] Adapt v4.0.4
  [ab4f0b2a] BFloat16s v0.5.0
  [6e4b80f9] BenchmarkTools v1.5.0
  [0c68f7d7] GPUArrays v10.3.1
  [63c18a36] KernelAbstractions v0.9.25
  [929cbde3] LLVM v9.0.0
  [dde4c033] Metal v1.3.0 `~/.julia/dev/Metal`
  [e86c9b32] ObjectiveC v3.0.0
  [276daf66] SpecialFunctions v2.4.0
  [90137ffa] StaticArrays v1.9.7
  [10745b16] Statistics v1.11.1
  [ade2ca70] Dates v1.11.0
  [8ba89e20] Distributed v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [37e2e46d] LinearAlgebra v1.11.0
  [de0858da] Printf v1.11.0
  [3fa0cd96] REPL v1.11.0
  [9a3f8284] Random v1.11.0
  [2f01184e] SparseArrays v1.11.0
  [8dfed614] Test v1.11.0
      Status `/private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_aiC8B7/Manifest.toml`
  [79e6a3ab] Adapt v4.0.4
  [a9b6321e] Atomix v0.1.0
  [ab4f0b2a] BFloat16s v0.5.0
  [6e4b80f9] BenchmarkTools v1.5.0
  [fa961155] CEnum v0.5.0
  [523fee87] CodecBzip2 v0.8.4
  [ffbed154] DocStringExtensions v0.9.3
  [e2ba6199] ExprTools v0.1.10
  [0c68f7d7] GPUArrays v10.3.1
  [46192b85] GPUArraysCore v0.1.6
  [61eb1bfa] GPUCompiler v0.27.4
  [92d709cd] IrrationalConstants v0.2.2
  [692b3bcd] JLLWrappers v1.5.0
  [682c06a0] JSON v0.21.4
  [63c18a36] KernelAbstractions v0.9.25
  [929cbde3] LLVM v9.0.0
  [2ab3a3ac] LogExpFunctions v0.3.28
  [1914dd2f] MacroTools v0.5.13
  [dde4c033] Metal v1.3.0 `~/.julia/dev/Metal`
  [d8793406] ObjectFile v0.4.2
  [e86c9b32] ObjectiveC v3.0.0
  [69de0a69] Parsers v2.8.1
  [aea7be01] PrecompileTools v1.2.1
  [21216c6a] Preferences v1.4.3
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [6c6a2e73] Scratch v1.2.1
  [276daf66] SpecialFunctions v2.4.0
  [90137ffa] StaticArrays v1.9.7
  [1e83bf80] StaticArraysCore v1.4.3
  [10745b16] Statistics v1.11.1
  [53d494c1] StructIO v0.3.1
  [a759f4b9] TimerOutputs v0.5.24
  [3bb67fe8] TranscodingStreams v0.11.2
  [013be700] UnsafeAtomics v0.2.1
  [d80eeb9a] UnsafeAtomicsLLVM v0.2.1
  [6e34b625] Bzip2_jll v1.0.8+1
  [2e619515] Expat_jll v2.6.2+0
  [f52de702] LLVMDowngrader_jll v0.3.0+1
  [dad2f222] LLVMExtra_jll v0.0.33+0
  [7106de7a] LibMPDec_jll v2.5.1+0
⌅ [e9f186c6] Libffi_jll v3.2.2+1
  [458c3c95] OpenSSL_jll v3.0.14+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [93d3a430] Python_jll v3.10.14+0
  [76ed43ae] SQLite_jll v3.45.3+0
  [ffd25f8a] XZ_jll v5.4.6+0
  [0dad84c5] ArgTools v1.1.2
  [56f22d72] Artifacts v1.11.0
  [2a0f44e3] Base64 v1.11.0
  [ade2ca70] Dates v1.11.0
  [8ba89e20] Distributed v1.11.0
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [4af54fe1] LazyArtifacts v1.11.0
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2 v1.11.0
  [8f399da3] Libdl v1.11.0
  [37e2e46d] LinearAlgebra v1.11.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [a63ad114] Mmap v1.11.0
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.11.0
  [de0858da] Printf v1.11.0
  [9abbd945] Profile v1.11.0
  [3fa0cd96] REPL v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization v1.11.0
  [6462fe0b] Sockets v1.11.0
  [2f01184e] SparseArrays v1.11.0
  [f489334b] StyledStrings v1.11.0
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test v1.11.0
  [cf7118a7] UUIDs v1.11.0
  [4ec0a83e] Unicode v1.11.0
  [e66e0078] CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] LibCURL_jll v8.6.0+0
  [e37daf67] LibGit2_jll v1.7.2+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.6+0
  [14a3606d] MozillaCACerts_jll v2023.12.12
  [4536629a] OpenBLAS_jll v0.3.27+1
  [05823500] OpenLibm_jll v0.8.1+2
  [bea87d4a] SuiteSparse_jll v7.7.0+0
  [83775a58] Zlib_jll v1.2.13+1
  [8e850b90] libblastrampoline_jll v5.11.0+0
  [8e850ede] nghttp2_jll v1.59.0+0
  [3f19e933] p7zip_jll v17.4.0+2
        Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading.
     Testing Running tests...
┌ Info: System information:
│ macOS 14.6.1, Darwin 23.6.0
│ 
│ Toolchain:
│ - Julia: 1.11.0-rc3
│ - LLVM: 16.0.6
│ 
│ Julia packages: 
│ - Metal.jl: 1.3.0
│ - LLVMDowngrader_jll: 0.3.0+1
│ 
│ 1 device:
└ - Apple M2 Max (64.000 KiB allocated)
[ Info: Running 8 tests in parallel. If this is too many, specify the `--jobs` argument to the tests, or set the JULIA_CPU_THREADS environment variable.
                                                  |          | ---------------- CPU ---------------- |
Test                                     (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB) |
metallib                                      (8) |     0.62 |   0.01 |  2.0 |     214.88 |   547.77 |
pool                                          (9) |     1.14 |   0.03 |  2.5 |     325.48 |   595.09 |
      From worker 8:	Starting recording with the Blank template and GPU, Time Profiler, Metal Application, Metal GPU Counters, Metal Resource Events, os_signpost Instruments. Attaching to: julia (23790). 
      From worker 8:	Ctrl-C to stop the recording
      From worker 8:	Stopping recording...
metal                                         (7) |     4.59 |   0.17 |  3.8 |     930.59 |   813.38 |
      From worker 7:	┌ Warning: Skipping script tests
      From worker 7:	└ @ Main ~/.julia/dev/Metal/test/scripts.jl:9
scripts                                       (7) |     0.55 |   0.00 |  0.0 |      78.42 |   822.00 |
      From worker 10:	2024-08-29 10:43:58.916 julia[23792:284856] Metal GPU Frame Capture Enabled
      From worker 8:	Recording completed. Saving output file...
      From worker 8:	Output file saved as: julia_1.trace
      From worker 8:	[ Info: System trace saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_pNVkk4/julia_1.trace; open the resulting trace in Instruments
profiling                                     (8) |     9.36 |   0.01 |  0.1 |     114.46 |   570.70 |
      From worker 10:	[ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_UfApxE/julia_1.gputrace; open the resulting trace in Xcode
      From worker 10:	[ Info: GPU frame capture saved to /private/var/folders/4g/lnkpkf3s4rxd_wbl8vwnqs4r0000gn/T/jl_UfApxE/julia_2.gputrace; open the resulting trace in Xcode
capturing                                    (10) |     9.87 |   0.18 |  1.8 |    1080.20 |   719.06 |
execution                                     (5) |    18.02 |   0.28 |  1.6 |    2198.88 |   766.20 |
mps/matrix                                   (11) |     0.58 |   0.00 |  0.0 |      82.76 |   556.66 |
mps/size                                     (11) |     0.04 |   0.00 |  0.0 |       1.41 |   564.39 |
examples                                      (4) |    23.70 |   0.57 |  2.4 |    2662.96 |  2038.67 |
mps/vector                                   (11) |     0.13 |   0.00 |  0.0 |      12.66 |   565.38 |
random                                        (9) |    24.50 |   0.40 |  1.6 |    3500.97 |   978.94 |
gpuarrays/indexing scalar                     (4) |     6.65 |   0.08 |  1.2 |    1091.20 |  2089.61 |
kernelabstractions                            (6) |    31.11 |   0.65 |  2.1 |    4947.82 |  1138.53 |
mps/linalg                                    (5) |    16.84 |   0.35 |  2.1 |    3096.05 |   976.09 |
mps/copy                                      (8) |    33.03 |   0.75 |  2.3 |    6544.86 |   989.70 |
array                                         (2) |    45.54 |   0.71 |  1.6 |    6176.39 |  1094.27 |
gpuarrays/interface                           (2) |     0.94 |   0.01 |  1.4 |     236.47 |  1142.72 |
gpuarrays/indexing find                       (5) |    15.72 |   0.42 |  2.7 |    5267.93 |  1181.33 |
gpuarrays/math/power                          (4) |    20.75 |   0.48 |  2.3 |    4750.50 |  2305.39 |
gpuarrays/reductions/any all count            (2) |     8.32 |   0.12 |  1.5 |    1691.97 |  1204.00 |
gpuarrays/uniformscaling                      (4) |     4.67 |   0.02 |  0.5 |     502.92 |  2437.33 |
gpuarrays/linalg/mul!/vector-matrix           (6) |    28.20 |   0.41 |  1.5 |    5208.05 |  1341.78 |
gpuarrays/math/intrinsics                     (6) |     0.91 |   0.00 |  0.0 |     107.54 |  1347.39 |
gpuarrays/indexing multidimensional           (8) |    25.80 |   0.45 |  1.7 |    5203.15 |  1279.70 |
gpuarrays/reductions/mapreducedim!_large      (2) |    27.58 |   0.79 |  2.9 |    6800.30 |  1778.45 |
gpuarrays/linalg/norm                         (6) |    28.24 |   0.54 |  1.9 |    6401.18 |  1584.08 |
gpuarrays/reductions/reducedim!              (11) |    67.84 |   1.30 |  1.9 |   12473.56 |  1249.77 |
gpuarrays/vectors                            (11) |     0.18 |   0.00 |  0.0 |      27.23 |  1252.45 |
gpuarrays/linalg/mul!/matrix-matrix           (4) |    38.05 |   0.45 |  1.2 |    5872.34 |  2530.66 |
gpuarrays/statistics                          (8) |    26.77 |   0.40 |  1.5 |    4574.18 |  1491.39 |
gpuarrays/linalg                              (9) |    73.68 |   1.55 |  2.1 |   14299.43 |  1578.47 |
gpuarrays/random                             (11) |     8.23 |   0.08 |  1.0 |    1146.98 |  1366.86 |
gpuarrays/constructors                        (6) |    16.62 |   0.15 |  0.9 |    1804.54 |  1701.91 |
gpuarrays/base                                (4) |    17.28 |   0.44 |  2.6 |    3418.14 |  2573.62 |
gpuarrays/reductions/== isequal               (8) |    25.87 |   0.53 |  2.1 |    5815.18 |  1610.95 |
gpuarrays/reductions/mapreducedim!           (11) |    26.83 |   0.43 |  1.6 |    5249.56 |  1554.47 |
gpuarrays/reductions/minimum maximum extrema  (5) |   105.34 |   2.37 |  2.2 |   22907.86 |  1966.19 |
gpuarrays/reductions/reduce                   (6) |    56.45 |   1.32 |  2.3 |   13019.02 |  2042.95 |
gpuarrays/reductions/mapreduce                (2) |    81.31 |   1.81 |  2.2 |   17933.78 |  2046.83 |
gpuarrays/reductions/sum prod                 (4) |    77.72 |   1.54 |  2.0 |   16463.69 |  3235.77 |
gpuarrays/broadcasting                        (9) |   108.09 |   1.93 |  1.8 |   19730.51 |  2196.94 |
┌ Warning: Test timed out: device/intrinsics
└ @ Main ~/.julia/dev/Metal/test/runtests.jl:247
      From worker 7:	
      From worker 7:	======================================================================================
      From worker 7:	Information request received. A stacktrace will print followed by a 1.0 second profile
      From worker 7:	======================================================================================
      From worker 7:	
      From worker 7:	signal (29): Information request: 29
      From worker 7:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 7:	unknown function (ip: 0x0)
      From worker 7:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 7:	unknown function (ip: 0x0)
      From worker 7:	
      From worker 7:	==============================================================
      From worker 7:	Profile collected. A report will print at the next yield point
      From worker 7:	==============================================================
      From worker 7:	
┌ Warning: Couldn't kill worker 7, killing process 23789 forcefully
└ @ Main ~/.julia/dev/Metal/test/runtests.jl:256
      From worker 7:	
      From worker 7:	[23789] signal 15: Terminated: 15
      From worker 7:	in expression starting at /Users/christian/.julia/dev/Metal/test/device/intrinsics.jl:4
      From worker 7:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 7:	unknown function (ip: 0x0)
      From worker 7:	__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 7:	unknown function (ip: 0x0)
      From worker 7:	Allocations: 35316542 (Pool: 35315514; Big: 1028); GC: 24
device/intrinsics                             (7) |         failed at 2024-08-29T10:52:12.607
Testing finished in 8 minutes, 23 seconds, 33 milliseconds
device/intrinsics: Error During Test at none:1
  Got exception outside of a @test
  ProcessExitedException(7)

Test Summary:                                  | Pass  Error  Broken  Total  Time
  Overall                                      | 9617      1       9   9627      
    metallib                                   |   25                    25      
    pool                                       |    5                     5      
    metal                                      |  131                   131      
    scripts                                    |                          0      
    profiling                                  |    1                     1      
    capturing                                  |   24                    24      
    execution                                  |   35                    35      
    mps/matrix                                 |   75                    75      
    mps/size                                   |    9                     9      
    examples                                   |    4                     4      
    mps/vector                                 |   34                    34      
    random                                     |  817              1    818      
    gpuarrays/indexing scalar                  |  399                   399      
    kernelabstractions                         | 2179              8   2187      
    mps/linalg                                 |   45                    45      
    mps/copy                                   |  144                   144      
    array                                      |  374                   374      
    gpuarrays/interface                        |    7                     7      
    gpuarrays/indexing find                    |   45                    45      
    gpuarrays/math/power                       |   60                    60      
    gpuarrays/reductions/any all count         |  101                   101      
    gpuarrays/uniformscaling                   |   56                    56      
    gpuarrays/linalg/mul!/vector-matrix        |  140                   140      
    gpuarrays/math/intrinsics                  |   10                    10      
    gpuarrays/indexing multidimensional        |   89                    89      
    gpuarrays/reductions/mapreducedim!_large   |   40                    40      
    gpuarrays/linalg/norm                      |  264                   264      
    gpuarrays/reductions/reducedim!            |  160                   160      
    gpuarrays/vectors                          |   10                    10      
    gpuarrays/linalg/mul!/matrix-matrix        |  360                   360      
    gpuarrays/statistics                       |   52                    52      
    gpuarrays/linalg                           |  397                   397      
    gpuarrays/random                           |   52                    52      
    gpuarrays/constructors                     |  832                   832      
    gpuarrays/base                             |   93                    93      
    gpuarrays/reductions/== isequal            |  248                   248      
    gpuarrays/reductions/mapreducedim!         |  260                   260      
    gpuarrays/reductions/minimum maximum extrema |  555                   555      
    gpuarrays/reductions/reduce                |  220                   220      
    gpuarrays/reductions/mapreduce             |  330                   330      
    gpuarrays/reductions/sum prod              |  636                   636      
    gpuarrays/broadcasting                     |  299                   299      
    device/intrinsics                          |           1              1      
    FAILURE

Error in testset device/intrinsics:
Error During Test at none:1
  Got exception outside of a @test
  ProcessExitedException(7)
ERROR: LoadError: Test run finished with errors
in expression starting at /Users/christian/.julia/dev/Metal/test/runtests.jl:410
ERROR: Package Metal errored during testing

Maybe related to #329?

@maleadt
Copy link
Member

maleadt commented Aug 29, 2024

https://buildkite.com/julialang/metal-dot-jl/builds/1187#01919a97-f080-4e37-8f60-7422426ea8ef is curious, as the metal test suite doesn't launch any kernels. So maybe this isn't due to bad codegen?

@christiangnrd
Copy link
Contributor Author

I believe the offending commit is JuliaLang/julia@5f36833 from JuliaLang/julia#50144.

I was able to reliably reproduce the hang by running

for _ in 1:100
    include("test/metal.jl")
end

in two different terminals simultaneously. (Both 1.11)

It always hangs at the second wait_completed

wait_completed(cmdbuf) == MTL.MTLCommandBufferStatusCompleted
of the testset.

I don't know if the offending Julia commit is the cause of the bug or if it just uncovered it, but hopefully this is a good starting point for someone with a better understanding of this part of the codebase to investigate this further.

@christiangnrd
Copy link
Contributor Author

This might also help? Collected a sample of the hung process from activity monitor.

hang.txt

@maleadt
Copy link
Member

maleadt commented Sep 19, 2024

MWE:

using Metal, .MTL

dev = first(devices())
cmdq = MTLCommandQueue(dev)

cmdbuf = MTLCommandBuffer(cmdq)
scheduled = Ref(false)
completed = Ref(false)
on_scheduled(cmdbuf) do buf
    GC.gc(true)
    scheduled[] = true
end
on_completed(cmdbuf) do buf
    GC.gc(true)
    completed[] = true
end
@assert scheduled[] == false
@assert completed[] == false
@assert cmdbuf.status == MTL.MTLCommandBufferStatusNotEnqueued
enqueue!(cmdbuf)
@assert cmdbuf.status == MTL.MTLCommandBufferStatusEnqueued
commit!(cmdbuf)
print("start... ")
wait_completed(cmdbuf)
println("stop")

Hangs on 1.10 and 1.12 on the first iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants