Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure during undefined variable error reporting #57057

Closed
maleadt opened this issue Jan 15, 2025 · 3 comments · Fixed by #49933
Closed

Assertion failure during undefined variable error reporting #57057

maleadt opened this issue Jan 15, 2025 · 3 comments · Fixed by #49933
Labels
regression Regression in behavior compared to a previous version
Milestone

Comments

@maleadt
Copy link
Member

maleadt commented Jan 15, 2025

As observed on PkgEval during CUDSS testing: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2025-01/14/CUDSS.primary.log

This is a package that depends on CUDA, which isn't available on PkgEval, so the libcuda global variable as used by the Clang.jl-generated ccalls generates an undefined variable exception. However, that now triggers an assertion:

julia: /source/src/gc-stock.c:721: jl_gc_small_alloc_inner: Assertion `__extension__ ({ __auto_type __atomic_load_ptr = (&ptls->gc_state); __typeof__ (*__atomic_load_ptr) __atomic_load_tmp; __atomic_load (__atomic_load_ptr, &__atomic_load_tmp, (memory_order_relaxed)); __atomic_load_tmp; }) == 0' failed.

[193] signal 6 (-6): Aborted
in expression starting at /home/pkgeval/.julia/packages/CUDSS/L1Cat/test/runtests.jl:15
unknown function (ip: 0x78a467d5cebc) at /lib/x86_64-linux-gnu/libc.so.6
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x78a467cf8394) at /lib/x86_64-linux-gnu/libc.so.6
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
jl_gc_small_alloc_inner at /source/src/gc-stock.c:721
jl_gc_small_alloc_noinline at /source/src/gc-stock.c:783 [inlined]
jl_gc_alloc_ at /source/src/gc-stock.c:797
ijl_new_struct at /source/src/datatype.c:1558
ijl_undefined_var_error at /source/src/rtutils.c:152
macro expansion at /home/pkgeval/.julia/packages/CUDA/L1qZp/lib/utils/call.jl:214 [inlined]
macro expansion at /home/pkgeval/.julia/packages/CUDSS/L1Cat/src/libcudss.jl:229 [inlined]
#cudssGetProperty##0 at /home/pkgeval/.julia/packages/CUDA/L1qZp/lib/utils/call.jl:35 [inlined]

Introduced somewhere in 4250be8...71bfbb3, maybe as part of the binding changes (cc @Keno)?

@maleadt maleadt added the regression Regression in behavior compared to a previous version label Jan 15, 2025
@maleadt maleadt added this to the 1.12 milestone Jan 15, 2025
@vtjnash
Copy link
Member

vtjnash commented Jan 17, 2025

This is a CUDA bug, since @ccall(jl_gc_safe_enter()::Int8) is UB to exist inside a julia program (not a legal function to call ever)

https://github.com/JuliaGPU/CUDA.jl/blob/d07a24572814efef6691005bd33ae7fd5f978f49/lib/utils/call.jl#L214

@vtjnash vtjnash closed this as completed Jan 17, 2025
@vtjnash vtjnash closed this as not planned Won't fix, can't repro, duplicate, stale Jan 17, 2025
@vtjnash vtjnash reopened this Jan 17, 2025
@maleadt
Copy link
Member Author

maleadt commented Jan 17, 2025

@ccall(jl_gc_safe_enter()::Int8) is UB to exist inside a julia program (not a legal function to call ever)

Well, we need it or we deadlock. Could you provide an alternative then, given that we don't have #49933?

cc @vchuravy, x-ref JuliaGPU/CUDA.jl#2262

@vtjnash
Copy link
Member

vtjnash commented Jan 17, 2025

There is no way to do this that isn't UB, other than to finish implementing #49933

giordano pushed a commit that referenced this issue Feb 15, 2025
This has been bouncing around as a idea for a while.
One of the challenges around time-to-safepoint has been Julia code
that is calling libraries. 

Since foreign code will not include safepoints we see increased latency
when one thread is running a foreign-call and another wants to trigger
GC.

The open design question here is:
- Do we expose this as an option the user must "opt-in", e.g. by using a
  keyword arg to `@ccall` or a specific calling-convetion.
- Or do we turn this on for all ccall, except for Julia runtime calls.

There is relativly little code outside the Julia runtime that needs to
be "GC unsafe",
exception are programs that directly use the Julia C-API. Incidentially
`jl_adopt_thread`
and `@cfunction`/`@ccallable` do the right thing and transition to "GC
unsafe", regardless
of what state the thread currently is in.

I still need to figure out how to reliably detect Julia runtime calls,
but I think we can
switch all other calls to "GC safe". We should also consider
optimizations that mark large
regions of code without Julia runtime interactions as "GC safe" in
particular numeric
for-loops.

Closes #57057

---------

Co-authored-by: Gabriel Baraldi <[email protected]>
KristofferC pushed a commit that referenced this issue Feb 15, 2025
This has been bouncing around as a idea for a while.
One of the challenges around time-to-safepoint has been Julia code
that is calling libraries.

Since foreign code will not include safepoints we see increased latency
when one thread is running a foreign-call and another wants to trigger
GC.

The open design question here is:
- Do we expose this as an option the user must "opt-in", e.g. by using a
  keyword arg to `@ccall` or a specific calling-convetion.
- Or do we turn this on for all ccall, except for Julia runtime calls.

There is relativly little code outside the Julia runtime that needs to
be "GC unsafe",
exception are programs that directly use the Julia C-API. Incidentially
`jl_adopt_thread`
and `@cfunction`/`@ccallable` do the right thing and transition to "GC
unsafe", regardless
of what state the thread currently is in.

I still need to figure out how to reliably detect Julia runtime calls,
but I think we can
switch all other calls to "GC safe". We should also consider
optimizations that mark large
regions of code without Julia runtime interactions as "GC safe" in
particular numeric
for-loops.

Closes #57057

---------

Co-authored-by: Gabriel Baraldi <[email protected]>
(cherry picked from commit 85458a0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants