Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on hipStreamDestroy #449

Closed
vchuravy opened this issue Jul 24, 2023 · 4 comments
Closed

Segmentation fault on hipStreamDestroy #449

vchuravy opened this issue Jul 24, 2023 · 4 comments

Comments

@vchuravy
Copy link
Member

[240343] signal (7): Bus error
in expression starting at none:0
__sched_yield at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7f12eeaa3e65)
unknown function (ip: 0x7f12eeaa3f9c)
unknown function (ip: 0x7f12eeaaa6f1)
unknown function (ip: 0x7f12eea104a1)
hipStreamDestroy at /opt/rocm/hip/lib/libamdhip64.so (unknown line)
hipStreamDestroy at /home/vchuravy/.julia/packages/AMDGPU/Dl7nN/src/hip/libhip.jl:96
#12 at /home/vchuravy/.julia/packages/AMDGPU/Dl7nN/src/hip/stream.jl:24
unknown function (ip: 0x7f14f6ffb642)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
run_finalizer at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:417
jl_gc_run_finalizers_in_list at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:507
run_finalizers at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gc.c:553
ijl_atexit_hook at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/init.c:299
jl_repl_entrypoint at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/jlapi.c:718
main at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
unknown function (ip: 0x7f155407384f)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 73091525 (Pool: 73047131; Big: 44394); GC: 76
fish: Job 1, 'julia --project=repr_amd repr_a…' terminated by signal SIGBUS (Misaligned address error)

Using the example in #444 (comment)

@vchuravy
Copy link
Member Author

(repr_amd) pkg> st -m
Status `~/repr_amd/Manifest.toml`
  [21141c5a] AMDGPU v0.5.1
  [621f4979] AbstractFFTs v1.4.0
  [79e6a3ab] Adapt v3.6.2
  [a9b6321e] Atomix v0.1.0
  [fa961155] CEnum v0.4.2
  [ffbed154] DocStringExtensions v0.9.3
  [7da242da] Enzyme v0.11.6
  [f151be2c] EnzymeCore v0.5.1
  [e2ba6199] ExprTools v0.1.9
  [0c68f7d7] GPUArrays v8.8.1
  [46192b85] GPUArraysCore v0.1.5
  [61eb1bfa] GPUCompiler v0.21.4
  [92d709cd] IrrationalConstants v0.2.2
  [692b3bcd] JLLWrappers v1.4.1
  [63c18a36] KernelAbstractions v0.9.8
  [929cbde3] LLVM v6.1.0
  [2ab3a3ac] LogExpFunctions v0.3.24
  [1914dd2f] MacroTools v0.5.10
  [d8793406] ObjectFile v0.4.0
  [aea7be01] PrecompileTools v1.1.2
  [21216c6a] Preferences v1.4.0
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [6c6a2e73] Scratch v1.2.0
  [276daf66] SpecialFunctions v2.3.0
  [90137ffa] StaticArrays v1.6.2
  [1e83bf80] StaticArraysCore v1.4.2
  [53d494c1] StructIO v0.3.0
  [a759f4b9] TimerOutputs v0.5.23
  [a526e669] TimespanLogging v0.1.0
  [013be700] UnsafeAtomics v0.2.1
  [d80eeb9a] UnsafeAtomicsLLVM v0.1.3
  [7cc45869] Enzyme_jll v0.0.78+0
  [dad2f222] LLVMExtra_jll v0.0.23+0
⌃ [86de99a1] LLVM_jll v14.0.6+4
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL v0.6.3
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.9.0
  [de0858da] Printf
  [9abbd945] Profile
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [6462fe0b] Sockets
  [2f01184e] SparseArrays
  [10745b16] Statistics v1.9.0
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll v1.0.2+0
  [d55e3150] LLD_jll v14.0.6+3
  [deac9b47] LibCURL_jll v7.84.0+0
  [29816b5a] LibSSH2_jll v1.10.2+0
  [c8ffd9c3] MbedTLS_jll v2.28.2+0
  [14a3606d] MozillaCACerts_jll v2022.10.11
  [4536629a] OpenBLAS_jll v0.3.21+4
  [05823500] OpenLibm_jll v0.8.1+0
  [bea87d4a] SuiteSparse_jll v5.10.1+6
  [83775a58] Zlib_jll v1.2.13+0
  [8f36deef] libLLVM_jll v14.0.6+3
  [8e850b90] libblastrampoline_jll v5.8.0+0
  [8e850ede] nghttp2_jll v1.48.0+0
  [3f19e933] p7zip_jll v17.4.0+0
Info Packages marked with ⌃ have new versions available and may be upgradable.

@vchuravy
Copy link
Member Author

Rocm 5.4

From #444 (comment)

Do you get a warning about uninitialized global variables? I've seen this error because of them.
No warning

@matinraayai
Copy link
Contributor

@vchuravy can you run this with JULIA_AMDGPU_DISABLE_ARTIFACTS=true and AMD_LOGLEVEL=5 as environment variables?
I'm not able to replicate this on my end. I'm able to get the illegal error in #444.

@pxl-th
Copy link
Member

pxl-th commented Aug 5, 2023

I think this happened before adding HSA_OVERRIDE_GFX_VERSION=10.3.0 env variable.
If this is reproducible, feel free to reopen

@pxl-th pxl-th closed this as completed Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants