Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia 1.8 CVODE_BDF causes segfaults #367

Closed
bolognam opened this issue Aug 22, 2022 · 23 comments
Closed

Julia 1.8 CVODE_BDF causes segfaults #367

bolognam opened this issue Aug 22, 2022 · 23 comments

Comments

@bolognam
Copy link
Contributor

bolognam commented Aug 22, 2022

I'm consistently getting segfaults in both Windows 10 and Ubuntu 20.04 on Julia 1.8.0 when running my ODE solves compared to Julia 1.7.2. I've tried the latest in Sundials (v4.10.0) as well as the version I was using in Julia 1.7.2 (v4.9.4).

Here's the stack trace in Windows (Sundials v4.9.4):

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0xa2d095f9 -- N_VLinearSum at C:\Users\Michael\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_cvodes.dll 
(unknown line)
in expression starting at C:\Users\Michael\MyPackage\mycode.jl:30
N_VLinearSum at C:\Users\Michael\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_cvodes.dll (unknown line)
cvNlsResidual at C:\Users\Michael\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_cvodes.dll (unknown line)
SUNNonlinSolSolve_Newton at C:\Users\Michael\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_sunnonlinsolnewton.dll (unknown line)
cvStep at C:\Users\Michael\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_cvodes.dll (unknown line)
CVode at C:\Users\Michael\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_cvodes.dll (unknown line)
CVode at C:\Users\Michael\.julia\packages\Sundials\k9hc3\lib\libsundials_api.jl:1523 [inlined]
CVode at C:\Users\Michael\.julia\packages\Sundials\k9hc3\lib\libsundials_api.jl:1528
solver_step at C:\Users\Michael\.julia\packages\Sundials\k9hc3\src\common_interface\solve.jl:1478
#solve!#115 at C:\Users\Michael\.julia\packages\Sundials\k9hc3\src\common_interface\solve.jl:1573
solve! at C:\Users\Michael\.julia\packages\Sundials\k9hc3\src\common_interface\solve.jl:1558
unknown function (ip: 0000000063e290e1)
top-level scope at .\timing.jl:263
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:897
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965
eval at .\boot.jl:368 [inlined]
include_string at .\loading.jl:1428
_include at .\loading.jl:1488
include at .\client.jl:476
unknown function (ip: 000000005b23e1e6)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
do_call at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:126
eval_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:215
eval_stmt_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:166 [inlined]
eval_body at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:594
jl_interpret_toplevel_thunk at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:750
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:906
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965
eval at .\boot.jl:368 [inlined]
eval_user_input at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:151
repl_backend_loop at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:247
start_repl_backend at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:232
#run_repl#47 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:369
run_repl at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:355
jfptr_run_repl_69828.clone_1 at C:\Users\Michael\AppData\Local\Programs\Julia-1.8.0\lib\julia\sys.dll (unknown line)
#966 at .\client.jl:419
jfptr_YY.966_58133.clone_1 at C:\Users\Michael\AppData\Local\Programs\Julia-1.8.0\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:729 [inlined]
invokelatest at .\essentials.jl:726 [inlined]
run_main_repl at .\client.jl:404
exec_options at .\client.jl:318
_start at .\client.jl:522
jfptr__start_57488.clone_1 at C:\Users\Michael\AppData\Local\Programs\Julia-1.8.0\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:575
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:719
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:59
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 347048866 (Pool: 346978877; Big: 69989); GC: 141

And in Linux (Sundials v4.10.0):

signal (11): Segmentation fault
in expression starting at /home/runner/work/MyPackage/mycode.jl:30
unknown function (ip: 0x77a5580)
Allocations: 540922622 (Pool: 540699010; Big: 223612); GC: 230
ERROR: LoadError: Package MyPackage errored during testing (received signal: 11)
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types /opt/hostedtoolcache/julia/1.8.0/x64/share/julia/stdlib/v1.8/Pkg/src/Types.jl:67
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
   @ Pkg.Operations /opt/hostedtoolcache/julia/1.8.0/x64/share/julia/stdlib/v1.8/Pkg/src/Operations.jl:1813
 [3] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Vector{String}, test_args::Cmd, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::Base.Pairs{Symbol, IOContext{Base.PipeEndpoint}, Tuple{Symbol}, NamedTuple{(:io,), Tuple{IOContext{Base.PipeEndpoint}}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.8.0/x64/share/julia/stdlib/v1.8/Pkg/src/API.jl:431
 [4] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{Base.PipeEndpoint}, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:coverage, :julia_args, :force_latest_compatible_version), Tuple{Bool, Vector{String}, Bool}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.8.0/x64/share/julia/stdlib/v1.8/Pkg/src/API.jl:156
 [5] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:coverage, :julia_args, :force_latest_compatible_version), Tuple{Bool, Vector{String}, Bool}}})
   @ Pkg.API /opt/hostedtoolcache/julia/1.8.0/x64/share/julia/stdlib/v1.8/Pkg/src/API.jl:171
 [6] top-level scope
   @ ~/work/_actions/julia-actions/julia-runtest/v1/test_harness.jl:15
 [7] include(fname::String)
   @ Base.MainInclude ./client.jl:476
 [8] top-level scope
   @ none:1
in expression starting at /home/runner/work/_actions/julia-actions/julia-runtest/v1/test_harness.jl:7

This is closed source software, but if an NDA is signed then I can provide an rr trace.

@ChrisRackauckas
Copy link
Member

The preconditioners test is also doing weird things on v1.8 (#366) and we haven't figured out why. That might be a lead, but it's hard to say in advance.

@bolognam
Copy link
Contributor Author

This code doesn't have preconditioners, the only options used (all of which error), are max_order=1 or no options.

I've got a less stiff problem working with CVODE_BDF(), maybe that helps?

@bolognam
Copy link
Contributor Author

Looks like by building Sundials from src (with debug build) and hacking Sundials.jl to use the compiled libraries (instead of Sundials_jll) we can avoid the segfaults. This was built from Sundials 5.2.0 (see tag in git). Unfortunately it seems like the debug build causes the memory space to be different than that of the release build.

@ChrisRackauckas
Copy link
Member

@giordano or @staticfloat could we get some help retriggering the build?

@giordano
Copy link
Contributor

You want a debug build of Sundials (presumably v5.2.0)? As the official build or something to be used only for testing?

@ChrisRackauckas
Copy link
Member

I don't think it needs to be the debug build? But I'm also confused why there's now a memory issue with the old builds.

@bolognam
Copy link
Contributor Author

So the current jll that's used with Sundials.jl is 5.2.1+0, which presumably is very close the C++ Sundials v5.2.0 code, is what I compiled from source. The release build of C++ Sundials code causes the segfaults in Julia 1.8, but when I switch to the debug build of C++ Sundials code I don't get segfaults in Julia 1.8.

Remember, Julia 1.7 works just fine with the Sundials jll, but for Julia 1.8 I get segfaults.

@giordano
Copy link
Contributor

Well, in Julia v1.7 and v1.8 you should get the same binary. I don't see how rebuilding would help anything here. Maybe the problem is in how the routines from the libraries are called.

@bolognam
Copy link
Contributor Author

My guess is that a function pointer got garbage collected (i.e. the unknown function message in Linux) and that's how we're segfaulting. I understand that we should get the same Sundials binary between Julia 1.7 and 1.8, so I'm guessing that the way Julia is handling memory has changed.

Using a debug build of C++ Sundials would change how memory is managed overall in the computer, which is why re-building Sundials with a debug build removes the segfault issue I experienced.

@ChrisRackauckas
Copy link
Member

It would be helpful for someone to track down what exactly is getting garbage collected too early here.

@bolognam
Copy link
Contributor Author

If someone can guide me through the process that would help me. Would I build julia from source with assertions to debug this?

@sjdaines
Copy link
Contributor

sjdaines commented Sep 11, 2022

I just retested our application (that uses Kinsol), putting GC.enable(false) and GC.enable(true) around the call to Sundials, and this appears to remove the segfault (an example failure was in PALEOtoolkit/PALEOmodel.jl#25). So this is at least consistent with the suggestion above that the problem is connected to something getting garbage collected in Julia 1.8 (NB: not a definitive test as the failures are not fully reproducable, so this could just be a statistical fluke).

Some speculation on a possible cause: looking at the Sundials.jl code, it seems to me that there might be a problem here with the use of convert by the NVector wrapper, which returns an N_Vector for use by the Sundials C API, but as I understand it can't be safely used by ccall in this way as it there is no guarantee about the lifetime of the nv.ref_nv::Ref{N_Vector} object:

Base.convert(::Type{N_Vector}, nv::NVector) = nv.ref_nv[]

The NVector and hence the nv.ref_nv::Ref{N_Vector} may well be temporary, in which case can the garbage collector delete the ref_nv and call the finalizer (which calls N_VDestroy_Serial(ref_nv[]) to free the C memory), before the ccall uses the N_Vector C pointer?

The most obvious case (which explicitly creates temporary NVector objects) is:

""" `N_Vector(v::Vector{T})`
Converts Julia `Vector` to `N_Vector`.
Implicitly creates `NVector` object that manages automatic
destruction of `N_Vector` object when no longer in use.
"""
Base.convert(::Type{N_Vector}, v::Vector{realtype}) = N_Vector(NVector(v))
Base.convert(::Type{N_Vector}, v::Vector{T}) where {T <: Real} = N_Vector(NVector(v))

but in fact this pattern is used widely by autogenerated code, eg
function CVode(cvode_mem, tout::realtype, yout::N_Vector, tret, itask::Cint)
ccall((:CVode, libsundials_cvodes), Cint,
(CVODEMemPtr, realtype, N_Vector, Ptr{realtype}, Cint), cvode_mem, tout, yout,
tret, itask)
end
function CVode(cvode_mem, tout, yout, tret, itask)
__yout = convert(NVector, yout)
CVode(cvode_mem, tout, convert(N_Vector, __yout), tret, convert(Cint, itask))
end

where the problem in this case could be that __yout.ref_nv is garbage collected after convert(N_Vector, __yout) but before the ccall ?

I admit I can't understand the documentation for unsafe_convert, https://docs.julialang.org/en/v1/base/c/#Base.unsafe_convert, but looking at the implementation of ccall
https://github.com/JuliaLang/julia/blob/afb6c60d69a38e8a2442a0c7e87c47b8880ad294/base/c.jl#L663-L671
it looks like this is the problem that a combination of cconvert and unsafe_convert is intended to solve ? (see also https://discourse.julialang.org/t/how-to-keep-a-reference-for-c-structure-to-avoid-gc/9310/5).

If this is the problem, then given that there is a comment contradicting this in generate.jl:

Sundials.jl/gen/generate.jl

Lines 134 to 138 in c7469aa

if arg_type_expr == :N_Vector
# first convert argument to NVector() and store in local var,
# this guarantees that the wrapper and associated Sundials object (e.g. N_Vector)
# is not removed by GC
return (arg_name_expr,

and if this comment is in fact incorrect there could I imagine be other similar problems, any fix will need to touch (or at least regenerate) a lot of code. So perhaps the first priority is to understand the intended usage of ccall, unsafe_convert, and cconvert and review the design here?

@ChrisRackauckas
Copy link
Member

Note that the required vectors are stored in the Julia types for the function and integrator:

https://github.com/SciML/Sundials.jl/blob/master/src/common_interface/integrator_types.jl#L63
https://github.com/SciML/Sundials.jl/blob/master/src/common_interface/function_types.jl#L11-L12
https://github.com/SciML/Sundials.jl/blob/master/src/common_interface/integrator_utils.jl#L140

So in theory the GC would know those finalizers are not hit until after the integrator's reference is gone. At least, that's how it was working before v1.8.

@sjdaines
Copy link
Contributor

sjdaines commented Sep 14, 2022

The explicitly stored NVectors should of course be fine (as well as the explicitly stored memory handles).

But this doesn't look like it covers everything: the root cause here I think is that there exists a Base.convert(::Type{N_Vector}, nv::NVector) = nv.ref_nv[] at all.
This means that the API calls in libsundials_api.jl and nvector_wrapper.jl allow the use of an NVector without managing its lifetime through the following ccall (so this could go wrong if the supplied NVector isn't explicitly stored or returned), and (worse) provide an API that explicitly creates temporary NVectors from Vectors.

Some examples:

u0nv = NVector(u0)
_u0 = copy(u0)
utmp = NVector(_u0)
function arkodemem(; fe = C_NULL, fi = C_NULL, t0 = t0, u0 = convert(N_Vector, u0nv))

here u0nv is not subsequently used so could I imagine be garbage collected at any time.

or (perhaps in the vicinity of one of the reported issues with CVODE):

LS = SUNLinSol_SPGMR(u0, alg.prec_side, alg.krylov_dim)

where u0 here is a Julia Vector, so this will hit the convert path that creates a temporary NVector.

It seems to me the design here is not robust: it is incorrect in providing a Base.convert(::Type{N_Vector}, nv::NVector) and relying on the lifetime of local variables (see code comments in generate.jl L135-7 linked above), instead of using cconvert / unsafe_convert.

@sjdaines
Copy link
Contributor

In fact looking at this again, I'm not sure there isn't another problem? According to the Julia manual finalizer is only defined for a mutable struct, so it's not clear what happens when it is used with the ref_nv::Ref member of an immutable struct ?

struct NVector <: DenseVector{realtype}
ref_nv::Ref{N_Vector} # reference to N_Vector
v::Vector{realtype} # array that is referenced by N_Vector
function NVector(v::Vector{realtype})
# note that N_VMake_Serial() creates N_Vector doesn't own the data,
# so calling N_VDestroy_Serial() would not deallocate v
nv = new(Ref{N_Vector}(N_VMake_Serial(length(v), v)), v)
finalizer(release_handle, nv.ref_nv)
return nv
end
function NVector(nv::N_Vector)
# wrap N_Vector into NVector and get non-owning access to `nv` data
# via `v`, but don't register finalizer for `nv`
return new(Ref{N_Vector}(nv), asarray(nv))
end
end

@sjdaines
Copy link
Contributor

Also, what requirements does the Sundials C API place on the lifetime of C N_Vector arguments to C function calls ? Can they be deleted once the C API call returns (which is what I was assuming above), or is the caller required to manage them ? (seems a bit unlikely, this would provide another whole set of failure modes ...)

@markowkes
Copy link

@bolognam Have you found a solution or workaround to this? I'm working on a project and can consistently reproduce a segfault using julia --track-allocation=user test/Case1.jl. I'm also occasionally getting the segfault with a GitHub workflow that tests the project on various OS and Julia versions. For example, this test failed on macOS-latest with Julia v1.8. Simply rerunning the test worked without error.

The beginning of the error message is

signal (11): Segmentation fault: 11
in expression starting at /Users/runner/work/Biofilm.jl/Biofilm.jl/test/Case1.jl:56
N_VGetArrayPointer_Serial at /Users/runner/.julia/artifacts/4a28b80622d82d21479ac290cab082046c3da0db/lib/libsundials_nvecserial.5.2.0.dylib (unknown line)
N_VGetArrayPointer_Serial at /Users/runner/.julia/packages/Sundials/3c9Un/lib/libsundials_api.jl:5926 [inlined]
cvodefunjac at /Users/runner/.julia/packages/Sundials/3c9Un/src/common_interface/function_types.jl:29
unknown function (ip: 0x118dda5b7)
cvLsDenseDQJac at /Users/runner/.julia/artifacts/4a28b80622d82d21479ac290cab082046c3da0db/lib/libsundials_cvodes.5.2.0.dylib (unknown line)
cvLsLinSys at /Users/runner/.julia/artifacts/4a28b80622d82d21479ac290cab082046c3da0db/lib/libsundials_cvodes.5.2.0.dylib (unknown line)
cvLsSetup at /Users/runner/.julia/artifacts/4a28b80622d82d21479ac290cab082046c3da0db/lib/libsundials_cvodes.5.2.0.dylib (unknown line)
cvNlsLSetup at /Users/runner/.julia/artifacts/4a28b80622d82d21479ac290cab082046c3da0db/lib/libsundials_cvodes.5.2.0.dylib (unknown line)
SUNNonlinSolSolve_Newton at /Users/runner/.julia/artifacts/4a28b80622d82d21479ac290cab082046c3da0db/lib/libsundials_sunnonlinsolnewton.2.2.0.dylib (unknown line)
CVode at /Users/runner/.julia/artifacts/4a28b80622d82d21479ac290cab082046c3da0db/lib/libsundials_cvodes.5.2.0.dylib (unknown line)
...

Turning off garbage collection avoids the issue, but isn't feasible for all runs.

@bolognam
Copy link
Contributor Author

bolognam commented Oct 6, 2022

Sorry @markowkes I haven't found a solution. I think @sjdaines is onto a solution but nobody (as far as I'm aware) has been working on making a fix in Sundials.

@sjdaines
Copy link
Contributor

sjdaines commented Jan 10, 2023

With Julia1.9.0-beta2 and our application code, CVODE segfaults in more places (and almost every time...)
(edit: this case at least is fixed by #380 above, just adding the now-reproducible failure with 1.9.0-beta2 in case it helps others)

Example stack trace, looks very similar to the initial report above, showing this is consistent with a problem due to N_Vector:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x653429e7 -- N_VLinearSum_Serial at C:\Users\sd336\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_nvecserial.dll (unknown line)
in expression starting at C:\Users\sd336\PALEO\PALEOdev.jl\PALEOexamples\src\ocean\blacksea\PALEO_examples_blacksea_P_O2_SO4.jl:24
N_VLinearSum_Serial at C:\Users\sd336\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_nvecserial.dll (unknown line)
cvNlsResidual at C:\Users\sd336\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_cvodes.dll (unknown line)
SUNNonlinSolSolve_Newton at C:\Users\sd336\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_sunnonlinsolnewton.dll (unknown line)
cvStep at C:\Users\sd336\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_cvodes.dll (unknown line)
CVode at C:\Users\sd336\.julia\artifacts\4ccc575631c856942ae91cbb8294de9b0a746c9d\bin\libsundials_cvodes.dll (unknown line)
CVode at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\lib\libsundials_api.jl:1841 [inlined]
CVode at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\lib\libsundials_api.jl:1848
solver_step at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:1406
#solve!#115 at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:1488
kwcall at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:1472
unknown function (ip: 00000297da13c080)
#__solve#24 at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:17
kwcall at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:3 [inlined]
kwcall at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:3 [inlined]
kwcall at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:3 [inlined]
kwcall at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:3 [inlined]
kwcall at C:\Users\sd336\.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:3 [inlined]
#solve_call#21 at C:\Users\sd336\.julia\packages\DiffEqBase\9H9Sj\src\solve.jl:494 [inlined]
kwcall at C:\Users\sd336\.julia\packages\DiffEqBase\9H9Sj\src\solve.jl:464 [inlined]
#solve_up#27 at C:\Users\sd336\.julia\packages\DiffEqBase\9H9Sj\src\solve.jl:856
kwcall at C:\Users\sd336\.julia\packages\DiffEqBase\9H9Sj\src\solve.jl:829 [inlined]
#solve#26 at C:\Users\sd336\.julia\packages\DiffEqBase\9H9Sj\src\solve.jl:823
kwcall at C:\Users\sd336\.julia\packages\DiffEqBase\9H9Sj\src\solve.jl:813
unknown function (ip: 00000297da137d52)
macro expansion at .\timing.jl:273 [inlined]
#integrate#3 at C:\Users\sd336\PALEO\PALEOmodel.jl\src\ODE.jl:197
unknown function (ip: 00000297cedfc7c2)
kwcall at C:\Users\sd336\PALEO\PALEOmodel.jl\src\ODE.jl:162
#integrateForwardDiff#4 at C:\Users\sd336\PALEO\PALEOmodel.jl\src\ODE.jl:217 [inlined]
kwcall at C:\Users\sd336\PALEO\PALEOmodel.jl\src\ODE.jl:209
unknown function (ip: 00000297cedf694f)
jl_apply at C:/workdir/src\julia.h:1874 [inlined]
do_call at C:/workdir/src\interpreter.c:126
eval_value at C:/workdir/src\interpreter.c:226
eval_stmt_value at C:/workdir/src\interpreter.c:177 [inlined]
eval_body at C:/workdir/src\interpreter.c:624
jl_interpret_toplevel_thunk at C:/workdir/src\interpreter.c:762
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:912
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:856
ijl_toplevel_eval at C:/workdir/src\toplevel.c:921 [inlined]
ijl_toplevel_eval_in at C:/workdir/src\toplevel.c:971
eval at .\boot.jl:370 [inlined]
include_string at .\loading.jl:1754
_include at .\loading.jl:1814
include at .\client.jl:478 [inlined]
top-level scope at .\timing.jl:273 [inlined]
top-level scope at .\REPL[3]:0
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:903
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:856
ijl_toplevel_eval at C:/workdir/src\toplevel.c:921 [inlined]
ijl_toplevel_eval_in at C:/workdir/src\toplevel.c:971
eval at .\boot.jl:370 [inlined]
eval_user_input at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:152
repl_backend_loop at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:248
#start_repl_backend#46 at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:233
kwcall at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:230 [inlined]
#run_repl#59 at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:376
run_repl at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:362
jfptr_run_repl_61782.clone_1 at C:\Users\sd336\AppData\Local\Programs\Julia-1.9.0-beta2\lib\julia\sys.dll (unknown line)
#1018 at .\client.jl:421
jfptr_YY.1018_30785.clone_1 at C:\Users\sd336\AppData\Local\Programs\Julia-1.9.0-beta2\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:1874 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:774

where the first few functions in stack trace are:

CVode at C:\Users\sd336.julia\packages\Sundials\Y6XpH\lib\libsundials_api.jl:1841 [inlined]

function CVode(cvode_mem, tout::realtype, yout::N_Vector, tret, itask::Cint)
    ccall((:CVode, libsundials_cvodes), Cint,
          (CVODEMemPtr, realtype, N_Vector, Ptr{realtype}, Cint), cvode_mem, tout, yout,
          tret, itask)
end

CVode at C:\Users\sd336.julia\packages\Sundials\Y6XpH\lib\libsundials_api.jl:1848

function CVode(cvode_mem, tout, yout, tret, itask)
    __yout = convert(NVector, yout)
    CVode(cvode_mem, tout, convert(N_Vector, __yout), tret, convert(Cint, itask))
end

solver_step at C:\Users\sd336.julia\packages\Sundials\Y6XpH\src\common_interface\solve.jl:1406

function solver_step(integrator::CVODEIntegrator, tstop)
    integrator.flag = CVode(integrator.mem, tstop, integrator.u, integrator.tout,
                            CV_ONE_STEP)
    if integrator.opts.progress
        Logging.@logmsg(-1,
                        integrator.opts.progress_name,
                        _id=:Sundials,
                        message=integrator.opts.progress_message(integrator.dt,
                                                                 integrator.u,
                                                                 integrator.p,
                                                                 integrator.t),
                        progress=integrator.t / integrator.sol.prob.tspan[2])
    end
end

@ChrisRackauckas
Copy link
Member

Can you try the PR branch #380 ?

@ChrisRackauckas
Copy link
Member

oh it's you haha. Did the PR branch not fix your case?

@sjdaines
Copy link
Contributor

sjdaines commented Jan 10, 2023

Yup this was my fix for the case above ! (I just got the order of issue update and submit PR wrong)

@bolognam @markowkes this might not be the only problem, but #380 fixes the issue I see here

ChrisRackauckas pushed a commit that referenced this issue Jan 13, 2023
Issue #367

This fixes a problem seen with Julia >= 1.8, where a temporary NVector was
created and the N_Vector pointer it contains passed to ccall, which is unsafe as
the temporary NVector may be garbage collected.

The fix is to remove convert(N_Vector, x) and replace with the combination of
cconvert and unsafe_convert. An NVector can new be supplied as an
argument to ccall with memory management then handled correctly (
and it is now impossible to explicitly create an N_Vector).

Changes:
 - change to cconvert / unsafe_convert is in src/nvector_wrapper.jl .
 - A global edit (~400 changes) to the autogenerated code in
lib/libsundials_api.jl.
 - small number of changes elsewhere, replacing N_Vector with NVector

NB: The correct fix would be to update the code generator in gen/generate.jl,
and regenerate the wrappers.

NB: As the comments in src/nvector_wrapper.jl show this was an intentional
and (now) incorrect use of temporary variables to avoid garbage collection,
it is possible there are other similar problems (pointers from other types of
temporary variables passed to ccall).
@ChrisRackauckas
Copy link
Member

Thanks for digging into this! Good to see this finally all cleaned up to a newer version of Julia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants