-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReverseDiff tape generation over newer Zygote versions segfaults #586
Comments
@simeonschaub it looks like that Cassette pass caused a segfault. Could you look into it? @adolfocorreia could you post |
I installed julia using the standard "Generic Linux on x86" version on julialang.org. Were you able to reproduce the issue? Here is my version info:
|
On it. |
For me with version 1.41 I get the |
Do |
I think I have narrowed down the problem to this line: https://github.com/SciML/DiffEqSensitivity.jl/blob/1786e7e8f2c7a37a53c80b8ce675997729293ed1/src/concrete_solve.jl#L32. So I don't think it has anything to do with the Cassette pass, but I suspect it's an Enzyme.jl bug. |
Even though the stacktracing is pointing to hasbranching.jl? |
Hmm, now I am not so sure anymore. Trying it again, it now just gets stuck instead of throwing any error, which happened before as well. Let me look into it more tomorrow. |
Any news so far? I tried running the code again with the newest packages (DiffEqFlux 1.41.0, DifferentialEquations 6.17.2, Flux 0.12.4 and DiffEqSensitivity 6.55.2), but I'm still getting the segmentation fault. |
For the workaround, just pass the sensealg to avoid the |
I'm still getting segmentation faults or exceptions when passing sensealg in |
It turns out this has two things. We can fix the DiffEqSensitivity.jl side, but then we still hit a different Zygote bug. To be really sure, I commented out all of the function compute_z(x::Float32, f, f_z; saveat=[])
function node_system!(dz, z, _, _)
z1, z2 = z
dz[1] = f(z1)
dz[2] = f_z(z1) * z2
end
z0 = [x, 1.0f0]
tspan = (0.0f0, T)
prob = ODEProblem(node_system!, z0, tspan; saveat=saveat)
return solve(prob,sensealg=InterpolatingAdjoint(autojacvec=ReverseDiffVJP(false)))
end and sure enough, I got a segfault from Zygote. So I think this is a little deeper and actually something to do with Zygote itself, not the DiffEqSensitivity/DiffEqFlux update. But it's rather specific so I can document it. Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x27fb137 -- jl_datatype_isinlinealloc at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:253 [inlined]
jl_datatype_isinlinealloc at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:250 [inlined]
union_isinlinable at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:277 [inlined]
union_isinlinable at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:266
in expression starting at C:\Users\accou\OneDrive\Computer\Desktop\test.jl:74
jl_datatype_isinlinealloc at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:253 [inlined]
jl_datatype_isinlinealloc at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:250 [inlined]
union_isinlinable at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:277 [inlined]
union_isinlinable at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:266
jl_islayout_inline at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:300 [inlined]
jl_compute_field_offsets at /cygdrive/c/buildbot/worker/package_win64/build/src\datatype.c:446
inst_datatype_inner at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:1520
inst_type_w_ at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:1760
inst_type_w_ at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:1752
inst_datatype_inner at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:1483
inst_datatype_env at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:906
inst_datatype_env at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:910
inst_datatype_env at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:910
inst_datatype_env at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:910
inst_datatype_env at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:910
inst_datatype_env at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:910
jl_apply_type at /cygdrive/c/buildbot/worker/package_win64/build/src\jltypes.c:933
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:713
apply_type_tfunc at .\compiler\tfuncs.jl:1315
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:713
builtin_tfunction at .\compiler\tfuncs.jl:1566
abstract_call_builtin at .\compiler\abstractinterpretation.jl:1014
abstract_call_known at .\compiler\abstractinterpretation.jl:1185
abstract_call at .\compiler\abstractinterpretation.jl:1316
abstract_call at .\compiler\abstractinterpretation.jl:1301
abstract_eval_statement at .\compiler\abstractinterpretation.jl:1455
typeinf_local at .\compiler\abstractinterpretation.jl:1842
typeinf_nocycle at .\compiler\abstractinterpretation.jl:1932
_typeinf at .\compiler\typeinfer.jl:226
typeinf at .\compiler\typeinfer.jl:209
typeinf_edge at .\compiler\typeinfer.jl:822 [inlined]
abstract_call_method at .\compiler\abstractinterpretation.jl:473
abstract_call_gf_by_type at .\compiler\abstractinterpretation.jl:160
abstract_call_known at .\compiler\abstractinterpretation.jl:1262
abstract_call at .\compiler\abstractinterpretation.jl:1316
abstract_call at .\compiler\abstractinterpretation.jl:1301
abstract_eval_statement at .\compiler\abstractinterpretation.jl:1455
typeinf_local at .\compiler\abstractinterpretation.jl:1842
typeinf_nocycle at .\compiler\abstractinterpretation.jl:1932
_typeinf at .\compiler\typeinfer.jl:226
typeinf at .\compiler\typeinfer.jl:209
typeinf_edge at .\compiler\typeinfer.jl:822 [inlined]
abstract_call_method at .\compiler\abstractinterpretation.jl:473
abstract_call_gf_by_type at .\compiler\abstractinterpretation.jl:160
abstract_call_known at .\compiler\abstractinterpretation.jl:1262
abstract_call at .\compiler\abstractinterpretation.jl:1316
abstract_call at .\compiler\abstractinterpretation.jl:1301
abstract_eval_statement at .\compiler\abstractinterpretation.jl:1455
typeinf_local at .\compiler\abstractinterpretation.jl:1842
typeinf_nocycle at .\compiler\abstractinterpretation.jl:1932
_typeinf at .\compiler\typeinfer.jl:226
typeinf at .\compiler\typeinfer.jl:209
typeinf_edge at .\compiler\typeinfer.jl:822 [inlined]
abstract_call_method at .\compiler\abstractinterpretation.jl:473
abstract_call_gf_by_type at .\compiler\abstractinterpretation.jl:160
abstract_call at .\compiler\abstractinterpretation.jl:1314
abstract_call at .\compiler\abstractinterpretation.jl:1301
abstract_eval_statement at .\compiler\abstractinterpretation.jl:1455
typeinf_local at .\compiler\abstractinterpretation.jl:1842
typeinf_nocycle at .\compiler\abstractinterpretation.jl:1932
_typeinf at .\compiler\typeinfer.jl:226
typeinf at .\compiler\typeinfer.jl:209
typeinf_edge at .\compiler\typeinfer.jl:822 [inlined]
abstract_call_method at .\compiler\abstractinterpretation.jl:473
abstract_call_gf_by_type at .\compiler\abstractinterpretation.jl:160
abstract_call_known at .\compiler\abstractinterpretation.jl:1262
abstract_call at .\compiler\abstractinterpretation.jl:1316
abstract_call at .\compiler\abstractinterpretation.jl:1301
abstract_eval_statement at .\compiler\abstractinterpretation.jl:1455
typeinf_local at .\compiler\abstractinterpretation.jl:1842
typeinf_nocycle at .\compiler\abstractinterpretation.jl:1932
_typeinf at .\compiler\typeinfer.jl:226
typeinf at .\compiler\typeinfer.jl:209
typeinf_edge at .\compiler\typeinfer.jl:822 [inlined]
abstract_call_method at .\compiler\abstractinterpretation.jl:473
abstract_call_gf_by_type at .\compiler\abstractinterpretation.jl:160
abstract_call_known at .\compiler\abstractinterpretation.jl:1262
abstract_call at .\compiler\abstractinterpretation.jl:1316
abstract_call at .\compiler\abstractinterpretation.jl:1301
abstract_eval_statement at .\compiler\abstractinterpretation.jl:1455
typeinf_local at .\compiler\abstractinterpretation.jl:1842
typeinf_nocycle at .\compiler\abstractinterpretation.jl:1932
_typeinf at .\compiler\typeinfer.jl:226
typeinf at .\compiler\typeinfer.jl:209
typeinf_edge at .\compiler\typeinfer.jl:822 [inlined]
abstract_call_method at .\compiler\abstractinterpretation.jl:473
abstract_call_gf_by_type at .\compiler\abstractinterpretation.jl:160
abstract_call at .\compiler\abstractinterpretation.jl:1314
abstract_call at .\compiler\abstractinterpretation.jl:1301
abstract_eval_statement at .\compiler\abstractinterpretation.jl:1455
typeinf_local at .\compiler\abstractinterpretation.jl:1842
typeinf_nocycle at .\compiler\abstractinterpretation.jl:1932
_typeinf at .\compiler\typeinfer.jl:226
typeinf at .\compiler\typeinfer.jl:209
typeinf_edge at .\compiler\typeinfer.jl:822 [inlined]
abstract_call_method at .\compiler\abstractinterpretation.jl:473
abstract_call_gf_by_type at .\compiler\abstractinterpretation.jl:160
abstract_call_known at .\compiler\abstractinterpretation.jl:1262
abstract_call at .\compiler\abstractinterpretation.jl:1316
abstract_call at .\compiler\abstractinterpretation.jl:1301
abstract_eval_statement at .\compiler\abstractinterpretation.jl:1455
typeinf_local at .\compiler\abstractinterpretation.jl:1842
typeinf_nocycle at .\compiler\abstractinterpretation.jl:1932
_typeinf at .\compiler\typeinfer.jl:226
typeinf at .\compiler\typeinfer.jl:209
typeinf_ext at .\compiler\typeinfer.jl:908
typeinf_ext_toplevel at .\compiler\typeinfer.jl:941
typeinf_ext_toplevel at .\compiler\typeinfer.jl:937
jfptr_typeinf_ext_toplevel_11279.clone_1 at C:\Users\accou\AppData\Local\Programs\Julia-1.7.0-beta3\lib\julia\sys.dll (unknown line)
_jl_invoke at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2245 [inlined]
jl_apply_generic at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2427 [inlined]
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
jl_type_infer at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:295
jl_generate_fptr at /cygdrive/c/buildbot/worker/package_win64/build/src\jitlayers.cpp:338
jl_compile_method_internal at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:1978
jl_compile_method_internal at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:1932 [inlined]
_jl_invoke at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2237 [inlined]
jl_apply_generic at /cygdrive/c/buildbot/worker/package_win64/build/src\gf.c:2427
#adjointdiffcache#101 at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\adjoint_common.jl:144
unknown function (ip: 00000000016ea3ba)
adjointdiffcache##kw at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\adjoint_common.jl:27
unknown function (ip: 00000000016e891d)
#ODEInterpolatingAdjointSensitivityFunction#156 at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\interpolating_adjoint.jl:73
unknown function (ip: 00000000016c35c7)
ODEInterpolatingAdjointSensitivityFunction at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\interpolating_adjoint.jl:23
unknown function (ip: 000000000146dee9)
#ODEAdjointProblem#161 at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\interpolating_adjoint.jl:252
ODEAdjointProblem##kw at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\interpolating_adjoint.jl:231 [inlined]
#_adjoint_sensitivities#71 at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\sensitivity_interface.jl:17
unknown function (ip: 000000000146dc59)
_adjoint_sensitivities##kw at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\sensitivity_interface.jl:16
_adjoint_sensitivities##kw at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\sensitivity_interface.jl:16
unknown function (ip: 0000000001469606)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:713
#adjoint_sensitivities#70 at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\sensitivity_interface.jl:6
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:713
adjoint_sensitivities##kw at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\sensitivity_interface.jl:6
adjoint_sensitivity_backpass at C:\Users\accou\.julia\dev\DiffEqSensitivity\src\concrete_solve.jl:227
ZBack at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\chainrules.jl:91 [inlined]
#215 at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\lib\lib.jl:203
#1792#back at C:\Users\accou\.julia\packages\ZygoteRules\OjfTt\src\adjoint.jl:59
unknown function (ip: 00000000014619d6)
Pullback at C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:73 [inlined]
Pullback at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface2.jl:0
#215 at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\lib\lib.jl:203 [inlined]
#1792#back at C:\Users\accou\.julia\packages\ZygoteRules\OjfTt\src\adjoint.jl:59
unknown function (ip: 0000000001461346)
Pullback at C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:68 [inlined]
Pullback at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface2.jl:0
Pullback at C:\Users\accou\OneDrive\Computer\Desktop\test.jl:40 [inlined]
Pullback at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface2.jl:0
unknown function (ip: 000000000145f856)
Pullback at C:\Users\accou\OneDrive\Computer\Desktop\test.jl:31 [inlined]
Pullback at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface2.jl:0
unknown function (ip: 000000000145dfb6)
Pullback at C:\Users\accou\OneDrive\Computer\Desktop\test.jl:50 [inlined]
Pullback at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface2.jl:0
Pullback at C:\Users\accou\OneDrive\Computer\Desktop\test.jl:62 [inlined]
Pullback at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface2.jl:0
Pullback at C:\Users\accou\OneDrive\Computer\Desktop\test.jl:69 [inlined]
Pullback at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface2.jl:0
#215 at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\lib\lib.jl:203
#1792#back at C:\Users\accou\.julia\packages\ZygoteRules\OjfTt\src\adjoint.jl:59 [inlined]
Pullback at C:\Users\accou\.julia\packages\Flux\Zz9RI\src\optimise\train.jl:105 [inlined]
Pullback at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface2.jl:0
#96 at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface.jl:348
unknown function (ip: 0000000001458e9d)
gradient at C:\Users\accou\.julia\packages\Zygote\TaBlo\src\compiler\interface.jl:76
unknown function (ip: 00000000680344c7)
macro expansion at C:\Users\accou\.julia\packages\Flux\Zz9RI\src\optimise\train.jl:104 [inlined]
macro expansion at C:\Users\accou\.julia\packages\Juno\n6wyj\src\progress.jl:119 [inlined]
#train!#36 at C:\Users\accou\.julia\packages\Flux\Zz9RI\src\optimise\train.jl:102
train!##kw at C:\Users\accou\.julia\packages\Flux\Zz9RI\src\optimise\train.jl:100
unknown function (ip: 000000006803275f)
run_optimization_flux at C:\Users\accou\OneDrive\Computer\Desktop\test.jl:71
unknown function (ip: 0000000068028663)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
do_call at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:125
eval_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:214
eval_stmt_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:165 [inlined]
eval_body at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:579
jl_interpret_toplevel_thunk at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:727
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:885
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:894 [inlined]
jl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:944
eval at .\boot.jl:373 [inlined]
include_string at .\loading.jl:1196
include_string at C:\Users\accou\.julia\packages\Atom\fDork\src\utils.jl:286 [inlined]
#218 at C:\Users\accou\.julia\packages\Atom\fDork\src\eval.jl:121
withpath at C:\Users\accou\.julia\packages\CodeTools\VsjEq\src\utils.jl:30
unknown function (ip: 0000000067ff40c4)
withpath at C:\Users\accou\.julia\packages\Atom\fDork\src\eval.jl:9
#217 at C:\Users\accou\.julia\packages\Atom\fDork\src\eval.jl:119
unknown function (ip: 0000000061f38303)
with_logstate at .\logging.jl:511
with_logger at .\logging.jl:623 [inlined]
#216 at C:\Users\accou\.julia\packages\Atom\fDork\src\eval.jl:118 [inlined]
hideprompt at C:\Users\accou\.julia\packages\Atom\fDork\src\repl.jl:127
macro expansion at C:\Users\accou\.julia\packages\Atom\fDork\src\eval.jl:117 [inlined]
macro expansion at C:\Users\accou\.julia\packages\Media\ItEPc\src\dynamic.jl:24 [inlined]
eval at C:\Users\accou\.julia\packages\Atom\fDork\src\eval.jl:114
unknown function (ip: 0000000067ff01b3)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:757
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:713
#invokelatest#2 at .\essentials.jl:716
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:713
invokelatest at .\essentials.jl:714
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
do_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:713
macro expansion at C:\Users\accou\.julia\packages\Atom\fDork\src\eval.jl:41 [inlined]
#200 at .\task.jl:411
unknown function (ip: 0000000061e76ea3)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1787 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:878
Allocations: 440380235 (Pool: 440293069; Big: 87166); GC: 187 |
It's the interaction of the two different AD systems, so the issue is the Zygote.gradient call inside of the |
Which Julia version is that? Looks like it might be a bug in Base. |
v1.7-beta3 |
Looks suspiciously like JuliaLang/julia#41503. Although this particular one might be fixed by JuliaLang/julia#41516, which should be part of the next beta. |
oh interesting. Did you test your PR on v1.6 and showed it works? |
I thought I did, but apparently I had some other parts commented out as well. Let me try again. |
Ok, I now get the segfault again, so looks like that didn't fix it. |
So does this need to wait for beta-4? |
This got fixed |
When running the attached code ("github.jl") I get a "LoadError: UndefVarError: ForwardDiff not defined" when importing DiffEqFlux v1.41. I tried pinning DiffEqFlux to v1.40.1, which solves the import warning, but then I get a segfault-like error at the julia process ("Internal error: encountered unexpected error in runtime").
I've attached the Project and Manifest files for both scenarios for you to be able to reproduce the issue. I've also attached a file with the stack trace. I'm running on Pop_OS! 20.04 (which is a derivative of Ubuntu LTS 20.04).
This issue might be related to SciML/SciMLSensitivity.jl#433.
DiffEqFlux.v1.40.1.tar.gz
DiffEqFlux.v1.41.0.tar.gz
stacktrace.txt
github.jl.txt
The text was updated successfully, but these errors were encountered: