-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward-over-reverse HVP ignores rows for matrix input, only with Julia 1.11 #2071
Comments
is there a simpler code which triggers this (both removing the hvp as well as reducing the complexity of f) |
The incorrect result only appears in the HVP and Hessian parts of the DI test suite, not in the first order, so I think the function g(x)
s = zero(eltype(x))
for i in eachindex(x), j in eachindex(x)
s += x[i]^3 * x[j]^4
end
return s
end Interestingly though, my original function throws an error on this other laptop of mine (an old macbook pro), whereas it ran without issue on my linux computer: julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (x86_64-apple-darwin22.4.0)
CPU: 4 × Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
julia> Pkg.status()
Status `/private/var/folders/cx/7tl61h5d5tvbhk6xjtnyb85m0000gn/T/jl_hS9xKD/Project.toml`
[7da242da] Enzyme v0.13.14 julia> pv = Enzyme.hvp(g, xv, dxv) # correct result
ERROR: Current scope:
; Function Attrs: mustprogress noinline willreturn memory(readwrite, argmem: none, inaccessiblemem: none)
define private fastcc void @preprocess_diffejulia_mapreduce_impl_63711({} addrspace(10)* noalias nocapture nonnull readonly align 8 dereferenceable(32) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@double, [-1,8]:Pointer, [-1,8,0]:Integer, [-1,8,1]:Integer, [-1,8,2]:Integer, [-1,8,3]:Integer, [-1,8,4]:Integer, [-1,8,5]:Integer, [-1,8,6]:Integer, [-1,8,7]:Integer, [-1,8,8]:Pointer, [-1,8,8,-1]:Float@double, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer}" "enzymejl_parmtype"="4935644608" "enzymejl_parmtype_ref"="2" %"'", i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="4979721520" "enzymejl_parmtype_ref"="0" %0, double %differeturn) unnamed_addr #15 !dbg !2938 {
top:
%1 = call {}*** @julia.get_pgcstack() #25
%ptls_field16 = getelementptr inbounds {}**, {}*** %1, i64 2
%2 = bitcast {}*** %ptls_field16 to i64***
%ptls_load1718 = load i64**, i64*** %2, align 8, !tbaa !28, !alias.scope !2939, !noalias !2942
%3 = getelementptr inbounds i64*, i64** %ptls_load1718, i64 2
%safepoint = load i64*, i64** %3, align 8, !tbaa !32, !alias.scope !2944, !noalias !2947
fence syncscope("singlethread") seq_cst
call void @julia.safepoint(i64* %safepoint) #26, !dbg !2949
fence syncscope("singlethread") seq_cst
%.not = icmp eq i64 %0, 1, !dbg !2950
br i1 %.not, label %invertL17, label %L22, !dbg !2951
L83.epil.preheader: ; preds = %L83.lr.ph
%_unwrap48 = add nuw i64 %spec.select, 9223372036854775806
%unroll_iter_unwrap49 = and i64 %_unwrap48, 9223372036854775800
%unroll_iter_unwrap49.op = or i64 %unroll_iter_unwrap49, 2
%_unwrap50 = select i1 %12, i64 2, i64 %unroll_iter_unwrap49.op
%storemerge1 = add nsw i64 %xtraiter, -1
%_unwrap51 = add nuw nsw i64 %_unwrap50, %storemerge1, !dbg !2952
%"'ipg39_unwrap" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap51, !dbg !2952
%"'ipc40_unwrap" = bitcast {} addrspace(10)* addrspace(13)* %"'ipg39_unwrap" to double addrspace(13)*, !dbg !2952
%4 = load double, double addrspace(13)* %"'ipc40_unwrap", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%5 = fadd fast double %4, %differeturn, !dbg !2952
store double %5, double addrspace(13)* %"'ipc40_unwrap", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%6 = icmp eq i64 %storemerge1, 0
br i1 %6, label %invertL79.common.ret.loopexit_crit_edge.unr-lcssa, label %invertL83.epil.1
L22: ; preds = %top
%7 = add i64 %0, -1, !dbg !2966
%.not19 = icmp slt i64 %7, 1024, !dbg !2968
br i1 %.not19, label %L39, label %invertL128, !dbg !2967
L39: ; preds = %L22
%"'ipc85" = bitcast {} addrspace(10)* %"'" to { i8*, {} addrspace(10)* } addrspace(10)*, !dbg !2969
%"'ipc86" = addrspacecast { i8*, {} addrspace(10)* } addrspace(10)* %"'ipc85" to { i8*, {} addrspace(10)* } addrspace(11)*, !dbg !2969
%"'ipc91" = bitcast {} addrspace(10)* %"'" to {} addrspace(10)** addrspace(10)*, !dbg !2969
%"'ipc92" = addrspacecast {} addrspace(10)** addrspace(10)* %"'ipc91" to {} addrspace(10)** addrspace(11)*, !dbg !2969
%"'ipl93" = load {} addrspace(10)**, {} addrspace(10)** addrspace(11)* %"'ipc92", align 8, !dbg !2969, !tbaa !82, !alias.scope !2971, !noalias !2974, !invariant.group !2976, !enzyme_type !84, !enzymejl_byref_BITS_VALUE !0, !enzymejl_source_type_Ptr\7BFloat64\7D !0, !enzyme_nocache !0
%"'ipg87" = getelementptr inbounds { i8*, {} addrspace(10)* }, { i8*, {} addrspace(10)* } addrspace(11)* %"'ipc86", i64 0, i32 1, !dbg !2969
%"'ipl88" = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %"'ipg87", align 8, !dbg !2969, !tbaa !82, !alias.scope !2971, !noalias !2974, !invariant.group !2977, !enzyme_type !87, !enzymejl_source_type_Memory\7BFloat64\7D !0, !enzymejl_byref_MUT_REF !0
%8 = call "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" {} addrspace(10)* addrspace(13)* @julia.gc_loaded({} addrspace(10)* noundef %"'ipl88", {} addrspace(10)** noundef %"'ipl93") #25, !dbg !2969
%spec.select = call i64 @llvm.smax.i64(i64 %0, i64 noundef 2) #25, !dbg !2978
%9 = add nsw i64 %spec.select, -3, !dbg !2982
%10 = icmp ugt i64 %9, 9223372036854775806, !dbg !2986
br i1 %10, label %invertL39, label %L83.lr.ph, !dbg !2987
L83.lr.ph: ; preds = %L39
%11 = add nuw nsw i64 %spec.select, 6, !dbg !2988
%xtraiter = and i64 %11, 7, !dbg !2989
%12 = icmp ult i64 %9, 7, !dbg !2989
%lcmp.mod.not = icmp eq i64 %xtraiter, 0, !dbg !2989
br i1 %lcmp.mod.not, label %invertL79.common.ret.loopexit_crit_edge.unr-lcssa, label %L83.epil.preheader, !dbg !2989
inverttop: ; preds = %invertL128, %invertL39, %invertL17
fence syncscope("singlethread") seq_cst
ret void
invertL17: ; preds = %top
%"'ipc8" = bitcast {} addrspace(10)* %"'" to { i8*, {} addrspace(10)* } addrspace(10)*, !dbg !2990
%"'ipc9" = addrspacecast { i8*, {} addrspace(10)* } addrspace(10)* %"'ipc8" to { i8*, {} addrspace(10)* } addrspace(11)*, !dbg !2990
%"'ipc13" = bitcast {} addrspace(10)* %"'" to {} addrspace(10)** addrspace(10)*, !dbg !2990
%"'ipc14" = addrspacecast {} addrspace(10)** addrspace(10)* %"'ipc13" to {} addrspace(10)** addrspace(11)*, !dbg !2990
%"'ipl15" = load {} addrspace(10)**, {} addrspace(10)** addrspace(11)* %"'ipc14", align 8, !dbg !2990, !tbaa !82, !alias.scope !2971, !noalias !2974, !enzyme_type !84, !enzymejl_byref_BITS_VALUE !0, !enzymejl_source_type_Ptr\7BFloat64\7D !0, !enzyme_nocache !0
%"'ipg10" = getelementptr inbounds { i8*, {} addrspace(10)* }, { i8*, {} addrspace(10)* } addrspace(11)* %"'ipc9", i64 0, i32 1, !dbg !2990
%"'ipl" = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %"'ipg10", align 8, !dbg !2990, !tbaa !82, !alias.scope !2971, !noalias !2974, !dereferenceable_or_null !690, !enzyme_type !87, !enzymejl_source_type_Memory\7BFloat64\7D !0, !enzymejl_byref_MUT_REF !0
%13 = call "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" {} addrspace(10)* addrspace(13)* @julia.gc_loaded({} addrspace(10)* noundef %"'ipl", {} addrspace(10)** noundef %"'ipl15") #25, !dbg !2990
%"'ipc_unwrap" = bitcast {} addrspace(10)* addrspace(13)* %13 to double addrspace(13)*, !dbg !2990
%14 = load double, double addrspace(13)* %"'ipc_unwrap", align 8, !dbg !2990, !tbaa !90, !alias.scope !2992, !noalias !2995, !enzyme_type !786, !enzymejl_byref_BITS_VALUE !0, !enzymejl_source_type_Float64 !0
%15 = fadd fast double %14, %differeturn, !dbg !2990
store double %15, double addrspace(13)* %"'ipc_unwrap", align 8, !dbg !2990, !tbaa !90, !alias.scope !2992, !noalias !2997
br label %inverttop
mergeinvertL83_L79.common.ret.loopexit_crit_edge.unr-lcssa.loopexit: ; preds = %invertL79.common.ret.loopexit_crit_edge.unr-lcssa
%_unwrap24 = add nuw i64 %spec.select, 9223372036854775806
%unroll_iter_unwrap = and i64 %_unwrap24, 9223372036854775800
%_unwrap25 = add nsw i64 %unroll_iter_unwrap, -8
%_unwrap26 = lshr exact i64 %_unwrap25, 3
%16 = add nuw nsw i64 %_unwrap26, 1
%min.iters.check = icmp ult i64 %_unwrap25, 120
br i1 %min.iters.check, label %scalar.ph, label %vector.scevcheck
vector.scevcheck: ; preds = %mergeinvertL83_L79.common.ret.loopexit_crit_edge.unr-lcssa.loopexit
%17 = add nsw i64 %unroll_iter_unwrap, -6
%scevgep = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %17
%mul.result.neg = mul nsw i64 %_unwrap25, -8
%mul.overflow = icmp ugt i64 %_unwrap25, 2305843009213693944
%scevgep50 = bitcast {} addrspace(10)* addrspace(13)* %scevgep to i8 addrspace(13)*
%18 = getelementptr i8, i8 addrspace(13)* %scevgep50, i64 %mul.result.neg
%19 = icmp ugt i8 addrspace(13)* %18, %scevgep50
%20 = add nsw i64 %unroll_iter_unwrap, -5
%scevgep51 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %20
%scevgep5155 = bitcast {} addrspace(10)* addrspace(13)* %scevgep51 to i8 addrspace(13)*
%21 = getelementptr i8, i8 addrspace(13)* %scevgep5155, i64 %mul.result.neg
%22 = icmp ugt i8 addrspace(13)* %21, %scevgep5155
%23 = add nsw i64 %unroll_iter_unwrap, -4
%scevgep56 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %23
%scevgep5660 = bitcast {} addrspace(10)* addrspace(13)* %scevgep56 to i8 addrspace(13)*
%24 = getelementptr i8, i8 addrspace(13)* %scevgep5660, i64 %mul.result.neg
%25 = icmp ugt i8 addrspace(13)* %24, %scevgep5660
%26 = or i1 %mul.overflow, %25
%27 = add nsw i64 %unroll_iter_unwrap, -3
%scevgep61 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %27
%scevgep6165 = bitcast {} addrspace(10)* addrspace(13)* %scevgep61 to i8 addrspace(13)*
%28 = getelementptr i8, i8 addrspace(13)* %scevgep6165, i64 %mul.result.neg
%29 = icmp ugt i8 addrspace(13)* %28, %scevgep6165
%30 = add nsw i64 %unroll_iter_unwrap, -2
%scevgep66 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %30
%scevgep6670 = bitcast {} addrspace(10)* addrspace(13)* %scevgep66 to i8 addrspace(13)*
%31 = getelementptr i8, i8 addrspace(13)* %scevgep6670, i64 %mul.result.neg
%32 = icmp ugt i8 addrspace(13)* %31, %scevgep6670
%33 = add nsw i64 %unroll_iter_unwrap, -1
%scevgep71 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %33
%scevgep7175 = bitcast {} addrspace(10)* addrspace(13)* %scevgep71 to i8 addrspace(13)*
%34 = getelementptr i8, i8 addrspace(13)* %scevgep7175, i64 %mul.result.neg
%35 = icmp ugt i8 addrspace(13)* %34, %scevgep7175
%scevgep76 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %unroll_iter_unwrap
%scevgep7680 = bitcast {} addrspace(10)* addrspace(13)* %scevgep76 to i8 addrspace(13)*
%36 = getelementptr i8, i8 addrspace(13)* %scevgep7680, i64 %mul.result.neg
%37 = icmp ugt i8 addrspace(13)* %36, %scevgep7680
%38 = or i64 %unroll_iter_unwrap, 1
%scevgep81 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %38
%scevgep8185 = bitcast {} addrspace(10)* addrspace(13)* %scevgep81 to i8 addrspace(13)*
%39 = getelementptr i8, i8 addrspace(13)* %scevgep8185, i64 %mul.result.neg
%40 = icmp ugt i8 addrspace(13)* %39, %scevgep8185
%41 = or i1 %22, %19
%42 = or i1 %41, %26
%43 = or i1 %29, %42
%44 = or i1 %32, %43
%45 = or i1 %35, %44
%46 = or i1 %37, %45
%47 = or i1 %40, %46
br i1 %47, label %scalar.ph, label %vector.ph
vector.ph: ; preds = %vector.scevcheck
%n.vec = and i64 %16, -8
%48 = insertelement <4 x double> <double poison, double 0.000000e+00, double 0.000000e+00, double 0.000000e+00>, double %160, i64 0
%.splatinsert = insertelement <4 x i64> poison, i64 %_unwrap26, i64 0
%.splat = shufflevector <4 x i64> %.splatinsert, <4 x i64> poison, <4 x i32> zeroinitializer
%induction = add nsw <4 x i64> %.splat, <i64 0, i64 -1, i64 -2, i64 -3>
%broadcast.splatinsert = insertelement <4 x double> poison, double %159, i64 0
%broadcast.splat = shufflevector <4 x double> %broadcast.splatinsert, <4 x double> poison, <4 x i32> zeroinitializer
br label %vector.body
vector.body: ; preds = %vector.body, %vector.ph
%iv = phi i64 [ %iv.next, %vector.body ], [ 0, %vector.ph ]
%vec.phi = phi <4 x double> [ %48, %vector.ph ], [ %138, %vector.body ]
%vec.phi86 = phi <4 x double> [ zeroinitializer, %vector.ph ], [ %139, %vector.body ]
%vec.ind = phi <4 x i64> [ %induction, %vector.ph ], [ %vec.ind.next, %vector.body ]
%49 = shl nuw i64 %iv, 3
%iv.next = add nuw nsw i64 %iv, 1
%step.add = add <4 x i64> %vec.ind, <i64 -4, i64 -4, i64 -4, i64 -4>
%50 = shl nuw <4 x i64> %vec.ind, <i64 3, i64 3, i64 3, i64 3>, !dbg !2952
%51 = shl nuw <4 x i64> %step.add, <i64 3, i64 3, i64 3, i64 3>, !dbg !2952
%52 = add <4 x i64> %50, <i64 9, i64 9, i64 9, i64 9>, !dbg !2952
%53 = add <4 x i64> %51, <i64 9, i64 9, i64 9, i64 9>, !dbg !2952
%54 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %52, !dbg !2952
%55 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %53, !dbg !2952
%56 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %54 to <4 x double addrspace(13)*>, !dbg !2952
%57 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %55 to <4 x double addrspace(13)*>, !dbg !2952
%wide.masked.gather = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %56, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%wide.masked.gather88 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %57, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%58 = fadd fast <4 x double> %wide.masked.gather, %broadcast.splat, !dbg !2952
%59 = fadd fast <4 x double> %wide.masked.gather88, %broadcast.splat, !dbg !2952
%60 = add <4 x i64> %50, <i64 8, i64 8, i64 8, i64 8>, !dbg !2952
%61 = add <4 x i64> %51, <i64 8, i64 8, i64 8, i64 8>, !dbg !2952
%62 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %60, !dbg !2952
%63 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %61, !dbg !2952
%64 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %62 to <4 x double addrspace(13)*>, !dbg !2952
%65 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %63 to <4 x double addrspace(13)*>, !dbg !2952
%wide.masked.gather91 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %64, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%wide.masked.gather92 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %65, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%66 = fadd fast <4 x double> %wide.masked.gather91, %broadcast.splat, !dbg !2952
%67 = fadd fast <4 x double> %wide.masked.gather92, %broadcast.splat, !dbg !2952
%68 = or <4 x i64> %50, <i64 7, i64 7, i64 7, i64 7>, !dbg !2952
%69 = or <4 x i64> %51, <i64 7, i64 7, i64 7, i64 7>, !dbg !2952
%70 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %68, !dbg !2952
%71 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %69, !dbg !2952
%72 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %70 to <4 x double addrspace(13)*>, !dbg !2952
%73 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %71 to <4 x double addrspace(13)*>, !dbg !2952
%wide.masked.gather93 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %72, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%wide.masked.gather94 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %73, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%74 = fadd fast <4 x double> %wide.masked.gather93, %broadcast.splat, !dbg !2952
%75 = fadd fast <4 x double> %wide.masked.gather94, %broadcast.splat, !dbg !2952
%76 = or <4 x i64> %50, <i64 6, i64 6, i64 6, i64 6>, !dbg !2952
%77 = or <4 x i64> %51, <i64 6, i64 6, i64 6, i64 6>, !dbg !2952
%78 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %76, !dbg !2952
%79 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %77, !dbg !2952
%80 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %78 to <4 x double addrspace(13)*>, !dbg !2952
%81 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %79 to <4 x double addrspace(13)*>, !dbg !2952
%wide.masked.gather95 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %80, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%wide.masked.gather96 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %81, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%82 = fadd fast <4 x double> %wide.masked.gather95, %broadcast.splat, !dbg !2952
%83 = fadd fast <4 x double> %wide.masked.gather96, %broadcast.splat, !dbg !2952
%84 = or <4 x i64> %50, <i64 5, i64 5, i64 5, i64 5>, !dbg !2952
%85 = or <4 x i64> %51, <i64 5, i64 5, i64 5, i64 5>, !dbg !2952
%86 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %84, !dbg !2952
%87 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %85, !dbg !2952
%88 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %86 to <4 x double addrspace(13)*>, !dbg !2952
%89 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %87 to <4 x double addrspace(13)*>, !dbg !2952
%wide.masked.gather97 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %88, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%wide.masked.gather98 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %89, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%90 = fadd fast <4 x double> %wide.masked.gather97, %broadcast.splat, !dbg !2952
%91 = fadd fast <4 x double> %wide.masked.gather98, %broadcast.splat, !dbg !2952
%92 = or <4 x i64> %50, <i64 4, i64 4, i64 4, i64 4>, !dbg !2952
%93 = or <4 x i64> %51, <i64 4, i64 4, i64 4, i64 4>, !dbg !2952
%94 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %92, !dbg !2952
%95 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %93, !dbg !2952
%96 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %94 to <4 x double addrspace(13)*>, !dbg !2952
%97 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %95 to <4 x double addrspace(13)*>, !dbg !2952
%wide.masked.gather99 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %96, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%wide.masked.gather100 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %97, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%98 = fadd fast <4 x double> %wide.masked.gather99, %broadcast.splat, !dbg !2952
%99 = fadd fast <4 x double> %wide.masked.gather100, %broadcast.splat, !dbg !2952
%100 = or <4 x i64> %50, <i64 3, i64 3, i64 3, i64 3>, !dbg !2952
%101 = or <4 x i64> %51, <i64 3, i64 3, i64 3, i64 3>, !dbg !2952
%102 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %100, !dbg !2952
%103 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %101, !dbg !2952
%104 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %102 to <4 x double addrspace(13)*>, !dbg !2952
%105 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %103 to <4 x double addrspace(13)*>, !dbg !2952
%wide.masked.gather101 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %104, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%wide.masked.gather102 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %105, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%106 = fadd fast <4 x double> %wide.masked.gather101, %broadcast.splat, !dbg !2952
%107 = fadd fast <4 x double> %wide.masked.gather102, %broadcast.splat, !dbg !2952
%108 = or <4 x i64> %50, <i64 2, i64 2, i64 2, i64 2>, !dbg !2952
%109 = or <4 x i64> %51, <i64 2, i64 2, i64 2, i64 2>, !dbg !2952
%110 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %108, !dbg !2952
%111 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, <4 x i64> %109, !dbg !2952
%112 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %110 to <4 x double addrspace(13)*>, !dbg !2952
%113 = bitcast <4 x {} addrspace(10)* addrspace(13)*> %111 to <4 x double addrspace(13)*>, !dbg !2952
%wide.masked.gather103 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %112, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%wide.masked.gather104 = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %113, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%114 = fadd fast <4 x double> %wide.masked.gather103, %broadcast.splat, !dbg !2952
%115 = fadd fast <4 x double> %wide.masked.gather104, %broadcast.splat, !dbg !2952
%116 = extractelement <4 x double addrspace(13)*> %112, i64 0, !dbg !2952
%117 = getelementptr double, double addrspace(13)* %116, i64 -24, !dbg !2952
%118 = bitcast double addrspace(13)* %117 to <32 x double> addrspace(13)*, !dbg !2952
%119 = extractelement <4 x double addrspace(13)*> %113, i64 0, !dbg !2952
%120 = getelementptr double, double addrspace(13)* %119, i64 -24, !dbg !2952
%121 = bitcast double addrspace(13)* %120 to <32 x double> addrspace(13)*, !dbg !2952
%reverse = shufflevector <4 x double> %114, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse105 = shufflevector <4 x double> %106, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse106 = shufflevector <4 x double> %98, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse107 = shufflevector <4 x double> %90, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse108 = shufflevector <4 x double> %82, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse109 = shufflevector <4 x double> %74, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse110 = shufflevector <4 x double> %66, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse111 = shufflevector <4 x double> %58, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%122 = shufflevector <4 x double> %reverse, <4 x double> %reverse105, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, !dbg !2952
%123 = shufflevector <4 x double> %reverse106, <4 x double> %reverse107, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, !dbg !2952
%124 = shufflevector <4 x double> %reverse108, <4 x double> %reverse109, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, !dbg !2952
%125 = shufflevector <4 x double> %reverse110, <4 x double> %reverse111, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, !dbg !2952
%126 = shufflevector <8 x double> %122, <8 x double> %123, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>, !dbg !2952
%127 = shufflevector <8 x double> %124, <8 x double> %125, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>, !dbg !2952
%interleaved.vec = shufflevector <16 x double> %126, <16 x double> %127, <32 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>, !dbg !2952
store <32 x double> %interleaved.vec, <32 x double> addrspace(13)* %118, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%reverse112 = shufflevector <4 x double> %115, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse113 = shufflevector <4 x double> %107, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse114 = shufflevector <4 x double> %99, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse115 = shufflevector <4 x double> %91, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse116 = shufflevector <4 x double> %83, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse117 = shufflevector <4 x double> %75, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse118 = shufflevector <4 x double> %67, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%reverse119 = shufflevector <4 x double> %59, <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>, !dbg !2952
%128 = shufflevector <4 x double> %reverse112, <4 x double> %reverse113, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, !dbg !2952
%129 = shufflevector <4 x double> %reverse114, <4 x double> %reverse115, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, !dbg !2952
%130 = shufflevector <4 x double> %reverse116, <4 x double> %reverse117, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, !dbg !2952
%131 = shufflevector <4 x double> %reverse118, <4 x double> %reverse119, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, !dbg !2952
%132 = shufflevector <8 x double> %128, <8 x double> %129, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>, !dbg !2952
%133 = shufflevector <8 x double> %130, <8 x double> %131, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>, !dbg !2952
%interleaved.vec120 = shufflevector <16 x double> %132, <16 x double> %133, <32 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>, !dbg !2952
store <32 x double> %interleaved.vec120, <32 x double> addrspace(13)* %121, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%134 = icmp eq <4 x i64> %vec.ind, zeroinitializer
%135 = icmp eq <4 x i64> %step.add, zeroinitializer
%136 = select <4 x i1> %134, <4 x double> %broadcast.splat, <4 x double> zeroinitializer
%137 = select <4 x i1> %135, <4 x double> %broadcast.splat, <4 x double> zeroinitializer
%138 = fadd fast <4 x double> %136, %vec.phi
%139 = fadd fast <4 x double> %137, %vec.phi86
%index.next = add nuw i64 %49, 8
%vec.ind.next = add <4 x i64> %vec.ind, <i64 -8, i64 -8, i64 -8, i64 -8>
%140 = icmp eq i64 %index.next, %n.vec
br i1 %140, label %middle.block, label %vector.body, !llvm.loop !2998
middle.block: ; preds = %vector.body
%bin.rdx = fadd fast <4 x double> %139, %138
%141 = call fast double @llvm.vector.reduce.fadd.v4f64(double noundef -0.000000e+00, <4 x double> %bin.rdx) #25
%cmp.n = icmp eq i64 %16, %n.vec
br i1 %cmp.n, label %invertL39, label %middle.block.scalar.ph_crit_edge
middle.block.scalar.ph_crit_edge: ; preds = %middle.block
%ind.end = sub nsw i64 %_unwrap26, %n.vec
%.pre = add nsw i64 %ind.end, 1
br label %scalar.ph
scalar.ph: ; preds = %middle.block.scalar.ph_crit_edge, %vector.scevcheck, %mergeinvertL83_L79.common.ret.loopexit_crit_edge.unr-lcssa.loopexit
%.pre-phi = phi i64 [ %.pre, %middle.block.scalar.ph_crit_edge ], [ %16, %vector.scevcheck ], [ %16, %mergeinvertL83_L79.common.ret.loopexit_crit_edge.unr-lcssa.loopexit ]
%bc.resume.val = phi i64 [ %ind.end, %middle.block.scalar.ph_crit_edge ], [ %_unwrap26, %vector.scevcheck ], [ %_unwrap26, %mergeinvertL83_L79.common.ret.loopexit_crit_edge.unr-lcssa.loopexit ]
%bc.merge.rdx = phi double [ %141, %middle.block.scalar.ph_crit_edge ], [ 0.000000e+00, %vector.scevcheck ], [ 0.000000e+00, %mergeinvertL83_L79.common.ret.loopexit_crit_edge.unr-lcssa.loopexit ]
%142 = insertelement <4 x double> poison, double %159, i64 0, !dbg !2952
%143 = shufflevector <4 x double> %142, <4 x double> poison, <4 x i32> zeroinitializer, !dbg !2952
%xtraiter5 = and i64 %.pre-phi, 3
%lcmp.mod.not6 = icmp eq i64 %xtraiter5, 0
br i1 %lcmp.mod.not6, label %invertL83.prol.loopexit, label %invertL83.prol.preheader
invertL83.prol.preheader: ; preds = %scalar.ph
br label %invertL83.prol
invertL83.prol: ; preds = %invertL83.prol.preheader, %invertL83.prol
%iv1 = phi i64 [ 0, %invertL83.prol.preheader ], [ %iv.next2, %invertL83.prol ]
%"'de29.2.prol" = phi double [ %154, %invertL83.prol ], [ %bc.merge.rdx, %invertL83.prol.preheader ]
%144 = mul nsw i64 %iv1, -1, !dbg !2952
%iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !2952
%145 = add nsw i64 %bc.resume.val, %144, !dbg !2952
%_unwrap105.prol = shl nuw i64 %145, 3, !dbg !2952
%_unwrap136.prol = or i64 %_unwrap105.prol, 6, !dbg !2952
%"'ipg133_unwrap.prol" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap136.prol, !dbg !2952
%146 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg133_unwrap.prol" to <4 x double> addrspace(13)*, !dbg !2952
%147 = load <4 x double>, <4 x double> addrspace(13)* %146, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%148 = fadd fast <4 x double> %147, %143, !dbg !2952
store <4 x double> %148, <4 x double> addrspace(13)* %146, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%_unwrap175.prol = or i64 %_unwrap105.prol, 2, !dbg !2952
%"'ipg172_unwrap.prol" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap175.prol, !dbg !2952
%149 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg172_unwrap.prol" to <4 x double> addrspace(13)*, !dbg !2952
%150 = load <4 x double>, <4 x double> addrspace(13)* %149, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%151 = fadd fast <4 x double> %150, %143, !dbg !2952
store <4 x double> %151, <4 x double> addrspace(13)* %149, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%152 = icmp eq i64 %145, 0
%153 = select fast i1 %152, double %159, double 0.000000e+00
%154 = fadd fast double %153, %"'de29.2.prol"
%155 = add nsw i64 %145, -1
%prol.iter.cmp.not = icmp eq i64 %iv.next2, %xtraiter5
br i1 %prol.iter.cmp.not, label %invertL83.prol.loopexit.loopexit, label %invertL83.prol, !llvm.loop !2999
invertL83.prol.loopexit.loopexit: ; preds = %invertL83.prol
br label %invertL83.prol.loopexit
invertL83.prol.loopexit: ; preds = %invertL83.prol.loopexit.loopexit, %scalar.ph
%.lcssa.unr = phi double [ undef, %scalar.ph ], [ %154, %invertL83.prol.loopexit.loopexit ]
%"'de29.2.unr" = phi double [ %bc.merge.rdx, %scalar.ph ], [ %154, %invertL83.prol.loopexit.loopexit ]
%storemerge.unr = phi i64 [ %bc.resume.val, %scalar.ph ], [ %155, %invertL83.prol.loopexit.loopexit ]
%156 = icmp ult i64 %bc.resume.val, 3
br i1 %156, label %invertL39, label %invertL83.preheader
invertL83.preheader: ; preds = %invertL83.prol.loopexit
br label %invertL83
invertL79.common.ret.loopexit_crit_edge.unr-lcssa: ; preds = %invertL83.epil.6, %invertL83.epil.5, %invertL83.epil.4, %invertL83.epil.3, %invertL83.epil.2, %invertL83.epil.1, %L83.lr.ph, %L83.epil.preheader
%157 = phi double [ %differeturn, %L83.lr.ph ], [ 0.000000e+00, %invertL83.epil.6 ], [ 0.000000e+00, %invertL83.epil.5 ], [ 0.000000e+00, %invertL83.epil.4 ], [ 0.000000e+00, %invertL83.epil.3 ], [ 0.000000e+00, %invertL83.epil.2 ], [ 0.000000e+00, %invertL83.epil.1 ], [ 0.000000e+00, %L83.epil.preheader ]
%"value_phi223.unr'de.0" = phi double [ 0.000000e+00, %L83.lr.ph ], [ %181, %invertL83.epil.6 ], [ %178, %invertL83.epil.5 ], [ %173, %invertL83.epil.4 ], [ %differeturn, %invertL83.epil.3 ], [ %differeturn, %invertL83.epil.2 ], [ %differeturn, %invertL83.epil.1 ], [ %differeturn, %L83.epil.preheader ]
%158 = fadd fast double %"value_phi223.unr'de.0", %157
%159 = select fast i1 %12, double 0.000000e+00, double %158
%160 = select fast i1 %12, double %"value_phi223.unr'de.0", double 0.000000e+00
br i1 %12, label %invertL39, label %mergeinvertL83_L79.common.ret.loopexit_crit_edge.unr-lcssa.loopexit
invertL83.epil.1: ; preds = %L83.epil.preheader
%storemerge1.1 = add nsw i64 %xtraiter, -2
%_unwrap51.1 = add nuw nsw i64 %_unwrap50, %storemerge1.1, !dbg !2952
%"'ipg39_unwrap.1" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap51.1, !dbg !2952
%"'ipc40_unwrap.1" = bitcast {} addrspace(10)* addrspace(13)* %"'ipg39_unwrap.1" to double addrspace(13)*, !dbg !2952
%161 = load double, double addrspace(13)* %"'ipc40_unwrap.1", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%162 = fadd fast double %161, %differeturn, !dbg !2952
store double %162, double addrspace(13)* %"'ipc40_unwrap.1", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%163 = icmp eq i64 %storemerge1.1, 0
br i1 %163, label %invertL79.common.ret.loopexit_crit_edge.unr-lcssa, label %invertL83.epil.2
invertL83.epil.2: ; preds = %invertL83.epil.1
%storemerge1.2 = add nsw i64 %xtraiter, -3
%_unwrap51.2 = add nuw nsw i64 %_unwrap50, %storemerge1.2, !dbg !2952
%"'ipg39_unwrap.2" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap51.2, !dbg !2952
%"'ipc40_unwrap.2" = bitcast {} addrspace(10)* addrspace(13)* %"'ipg39_unwrap.2" to double addrspace(13)*, !dbg !2952
%164 = load double, double addrspace(13)* %"'ipc40_unwrap.2", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%165 = fadd fast double %164, %differeturn, !dbg !2952
store double %165, double addrspace(13)* %"'ipc40_unwrap.2", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%166 = icmp eq i64 %storemerge1.2, 0
br i1 %166, label %invertL79.common.ret.loopexit_crit_edge.unr-lcssa, label %invertL83.epil.3
invertL83.epil.3: ; preds = %invertL83.epil.2
%storemerge1.3 = add nsw i64 %xtraiter, -4
%_unwrap51.3 = add nuw nsw i64 %_unwrap50, %storemerge1.3, !dbg !2952
%"'ipg39_unwrap.3" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap51.3, !dbg !2952
%"'ipc40_unwrap.3" = bitcast {} addrspace(10)* addrspace(13)* %"'ipg39_unwrap.3" to double addrspace(13)*, !dbg !2952
%167 = load double, double addrspace(13)* %"'ipc40_unwrap.3", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%168 = fadd fast double %167, %differeturn, !dbg !2952
store double %168, double addrspace(13)* %"'ipc40_unwrap.3", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%169 = icmp eq i64 %storemerge1.3, 0
br i1 %169, label %invertL79.common.ret.loopexit_crit_edge.unr-lcssa, label %invertL83.epil.4
invertL83.epil.4: ; preds = %invertL83.epil.3
%storemerge1.4 = add nsw i64 %xtraiter, -5
%_unwrap51.4 = add nuw nsw i64 %_unwrap50, %storemerge1.4, !dbg !2952
%"'ipg39_unwrap.4" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap51.4, !dbg !2952
%"'ipc40_unwrap.4" = bitcast {} addrspace(10)* addrspace(13)* %"'ipg39_unwrap.4" to double addrspace(13)*, !dbg !2952
%170 = load double, double addrspace(13)* %"'ipc40_unwrap.4", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%171 = fadd fast double %170, %differeturn, !dbg !2952
store double %171, double addrspace(13)* %"'ipc40_unwrap.4", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%172 = icmp eq i64 %storemerge1.4, 0
%173 = select fast i1 %172, double %differeturn, double 0.000000e+00
br i1 %172, label %invertL79.common.ret.loopexit_crit_edge.unr-lcssa, label %invertL83.epil.5
invertL83.epil.5: ; preds = %invertL83.epil.4
%storemerge1.5 = add nsw i64 %xtraiter, -6
%_unwrap51.5 = add nuw nsw i64 %_unwrap50, %storemerge1.5, !dbg !2952
%"'ipg39_unwrap.5" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap51.5, !dbg !2952
%"'ipc40_unwrap.5" = bitcast {} addrspace(10)* addrspace(13)* %"'ipg39_unwrap.5" to double addrspace(13)*, !dbg !2952
%174 = load double, double addrspace(13)* %"'ipc40_unwrap.5", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%175 = fadd fast double %174, %differeturn, !dbg !2952
store double %175, double addrspace(13)* %"'ipc40_unwrap.5", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%176 = icmp eq i64 %storemerge1.5, 0
%177 = select fast i1 %176, double %differeturn, double 0.000000e+00
%178 = fadd fast double %173, %177
br i1 %176, label %invertL79.common.ret.loopexit_crit_edge.unr-lcssa, label %invertL83.epil.6
invertL83.epil.6: ; preds = %invertL83.epil.5
%storemerge1.6 = add nsw i64 %xtraiter, -7
%_unwrap51.6 = add nuw nsw i64 %storemerge1.6, %_unwrap50, !dbg !2952
%"'ipg39_unwrap.6" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap51.6, !dbg !2952
%"'ipc40_unwrap.6" = bitcast {} addrspace(10)* addrspace(13)* %"'ipg39_unwrap.6" to double addrspace(13)*, !dbg !2952
%179 = load double, double addrspace(13)* %"'ipc40_unwrap.6", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%180 = fadd fast double %179, %differeturn, !dbg !2952
store double %180, double addrspace(13)* %"'ipc40_unwrap.6", align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%181 = fadd fast double %178, %differeturn
br label %invertL79.common.ret.loopexit_crit_edge.unr-lcssa
invertL39.loopexit: ; preds = %invertL83
br label %invertL39, !dbg !2969
invertL39: ; preds = %invertL39.loopexit, %invertL79.common.ret.loopexit_crit_edge.unr-lcssa, %invertL83.prol.loopexit, %middle.block, %L39
%"'de29.0" = phi double [ %"value_phi223.unr'de.0", %invertL79.common.ret.loopexit_crit_edge.unr-lcssa ], [ %differeturn, %L39 ], [ %141, %middle.block ], [ %.lcssa.unr, %invertL83.prol.loopexit ], [ %227, %invertL39.loopexit ]
%182 = bitcast {} addrspace(10)* addrspace(13)* %8 to <2 x double> addrspace(13)*, !dbg !2969
%183 = load <2 x double>, <2 x double> addrspace(13)* %182, align 8, !dbg !2969, !tbaa !90, !alias.scope !2956, !noalias !2959
%184 = insertelement <2 x double> poison, double %"'de29.0", i64 0, !dbg !3000
%185 = shufflevector <2 x double> %184, <2 x double> poison, <2 x i32> zeroinitializer, !dbg !3000
%186 = fadd fast <2 x double> %185, %183, !dbg !2969
store <2 x double> %186, <2 x double> addrspace(13)* %182, align 8, !dbg !2969, !tbaa !90, !alias.scope !2956, !noalias !2961
br label %inverttop
invertL83: ; preds = %invertL83.preheader, %invertL83
%iv3 = phi i64 [ 0, %invertL83.preheader ], [ %iv.next4, %invertL83 ]
%"'de29.2" = phi double [ %227, %invertL83 ], [ %"'de29.2.unr", %invertL83.preheader ]
%187 = mul i64 %iv3, -4, !dbg !2952
%iv.next4 = add nuw nsw i64 %iv3, 1, !dbg !2952
%188 = add nsw i64 %storemerge.unr, %187, !dbg !2952
%_unwrap105 = shl nuw i64 %188, 3, !dbg !2952
%_unwrap136 = or i64 %_unwrap105, 6, !dbg !2952
%"'ipg133_unwrap" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap136, !dbg !2952
%189 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg133_unwrap" to <4 x double> addrspace(13)*, !dbg !2952
%190 = load <4 x double>, <4 x double> addrspace(13)* %189, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%191 = fadd fast <4 x double> %190, %143, !dbg !2952
store <4 x double> %191, <4 x double> addrspace(13)* %189, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%_unwrap175 = or i64 %_unwrap105, 2, !dbg !2952
%"'ipg172_unwrap" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap175, !dbg !2952
%192 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg172_unwrap" to <4 x double> addrspace(13)*, !dbg !2952
%193 = load <4 x double>, <4 x double> addrspace(13)* %192, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%194 = fadd fast <4 x double> %193, %143, !dbg !2952
store <4 x double> %194, <4 x double> addrspace(13)* %192, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%195 = icmp eq i64 %188, 0
%196 = select fast i1 %195, double %159, double 0.000000e+00
%197 = fadd fast double %196, %"'de29.2"
%198 = add nsw i64 %188, -1
%_unwrap105.1 = shl nuw i64 %198, 3, !dbg !2952
%_unwrap136.1 = or i64 %_unwrap105.1, 6, !dbg !2952
%"'ipg133_unwrap.1" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap136.1, !dbg !2952
%199 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg133_unwrap.1" to <4 x double> addrspace(13)*, !dbg !2952
%200 = load <4 x double>, <4 x double> addrspace(13)* %199, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%201 = fadd fast <4 x double> %200, %143, !dbg !2952
store <4 x double> %201, <4 x double> addrspace(13)* %199, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%_unwrap175.1 = or i64 %_unwrap105.1, 2, !dbg !2952
%"'ipg172_unwrap.1" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap175.1, !dbg !2952
%202 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg172_unwrap.1" to <4 x double> addrspace(13)*, !dbg !2952
%203 = load <4 x double>, <4 x double> addrspace(13)* %202, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%204 = fadd fast <4 x double> %203, %143, !dbg !2952
store <4 x double> %204, <4 x double> addrspace(13)* %202, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%205 = icmp eq i64 %198, 0
%206 = select fast i1 %205, double %159, double 0.000000e+00
%207 = fadd fast double %197, %206
%208 = add nsw i64 %188, -2
%_unwrap105.2 = shl nuw i64 %208, 3, !dbg !2952
%_unwrap136.2 = or i64 %_unwrap105.2, 6, !dbg !2952
%"'ipg133_unwrap.2" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap136.2, !dbg !2952
%209 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg133_unwrap.2" to <4 x double> addrspace(13)*, !dbg !2952
%210 = load <4 x double>, <4 x double> addrspace(13)* %209, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%211 = fadd fast <4 x double> %210, %143, !dbg !2952
store <4 x double> %211, <4 x double> addrspace(13)* %209, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%_unwrap175.2 = or i64 %_unwrap105.2, 2, !dbg !2952
%"'ipg172_unwrap.2" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap175.2, !dbg !2952
%212 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg172_unwrap.2" to <4 x double> addrspace(13)*, !dbg !2952
%213 = load <4 x double>, <4 x double> addrspace(13)* %212, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%214 = fadd fast <4 x double> %213, %143, !dbg !2952
store <4 x double> %214, <4 x double> addrspace(13)* %212, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%215 = icmp eq i64 %208, 0
%216 = select fast i1 %215, double %159, double 0.000000e+00
%217 = fadd fast double %207, %216
%218 = add nsw i64 %188, -3
%_unwrap105.3 = shl nuw i64 %218, 3, !dbg !2952
%_unwrap136.3 = or i64 %_unwrap105.3, 6, !dbg !2952
%"'ipg133_unwrap.3" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap136.3, !dbg !2952
%219 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg133_unwrap.3" to <4 x double> addrspace(13)*, !dbg !2952
%220 = load <4 x double>, <4 x double> addrspace(13)* %219, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%221 = fadd fast <4 x double> %220, %143, !dbg !2952
store <4 x double> %221, <4 x double> addrspace(13)* %219, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%_unwrap175.3 = or i64 %_unwrap105.3, 2, !dbg !2952
%"'ipg172_unwrap.3" = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %8, i64 %_unwrap175.3, !dbg !2952
%222 = bitcast {} addrspace(10)* addrspace(13)* %"'ipg172_unwrap.3" to <4 x double> addrspace(13)*, !dbg !2952
%223 = load <4 x double>, <4 x double> addrspace(13)* %222, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2959
%224 = fadd fast <4 x double> %223, %143, !dbg !2952
store <4 x double> %224, <4 x double> addrspace(13)* %222, align 8, !dbg !2952, !tbaa !90, !alias.scope !2956, !noalias !2961
%225 = icmp eq i64 %218, 0
%226 = select fast i1 %225, double %159, double 0.000000e+00
%227 = fadd fast double %217, %226
%228 = add nsw i64 %188, -4
br i1 %225, label %invertL39.loopexit, label %invertL83, !llvm.loop !3002
invertL128: ; preds = %L22
%_unwrap184 = lshr i64 %7, 1, !dbg !3003
%_unwrap185 = add nuw nsw i64 %_unwrap184, 1, !dbg !3003
%_unwrap186 = add nuw nsw i64 %_unwrap184, 2, !dbg !3003
call fastcc void @diffejulia_mapreduce_impl_63711.6({} addrspace(10)* noalias nocapture nonnull readonly align 8 dereferenceable(32) %"'", i64 signext %_unwrap186, i64 signext %0, double %differeturn) #25, !dbg !3003
call fastcc void @diffejulia_mapreduce_impl_63711.6({} addrspace(10)* noalias nocapture nonnull readonly align 8 dereferenceable(32) %"'", i64 noundef signext 1, i64 signext %_unwrap185, double %differeturn) #25, !dbg !3004
br label %inverttop
}
cannot handle (forward) unknown intrinsic
llvm.masked.gather
%wide.masked.gather = call <4 x double> @llvm.masked.gather.v4f64.v4p13f64(<4 x double addrspace(13)*> %56, i32 noundef 8, <4 x i1> noundef <i1 true, i1 true, i1 true, i1 true>, <4 x double> poison) #25, !dbg !49, !tbaa !58, !alias.scope !61, !noalias !66
Stacktrace:
[1] getindex
@ ./essentials.jl:917
[2] macro expansion
@ ./reduce.jl:264
[3] macro expansion
@ ./simdloop.jl:77
[4] mapreduce_impl
@ ./reduce.jl:263
Stacktrace:
[1] getindex
@ ./essentials.jl:917 [inlined]
[2] macro expansion
@ ./reduce.jl:264 [inlined]
[3] macro expansion
@ ./simdloop.jl:77 [inlined]
[4] mapreduce_impl
@ ./reduce.jl:263
[5] mapreduce_impl
@ ./reduce.jl:277 [inlined]
[6] _mapreduce
@ ./reduce.jl:444 [inlined]
[7] _mapreduce_dim
@ ./reducedim.jl:337 [inlined]
[8] mapreduce
@ ./reducedim.jl:329 [inlined]
[9] _sum
@ ./reducedim.jl:987 [inlined]
[10] _sum
@ ./reducedim.jl:986 [inlined]
[11] sum
@ ./reducedim.jl:982 [inlined]
[12] g
@ ./REPL[69]:1 [inlined]
[13] diffejulia_g_63639wrap
@ ./REPL[69]:0 [inlined]
[14] macro expansion
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:8305 [inlined]
[15] enzyme_call
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:7868 [inlined]
[16] CombinedAdjointThunk
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:7641 [inlined]
[17] autodiff_deferred
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:781 [inlined]
[18] autodiff
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:512 [inlined]
[19] gradient!
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:1787 [inlined]
[20] fwddiffejulia_gradient__63612wrap
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:0
[21] macro expansion
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:8305 [inlined]
[22] enzyme_call
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:7868 [inlined]
[23] ForwardModeThunk
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:7657 [inlined]
[24] autodiff
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:647 [inlined]
[25] autodiff
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:537 [inlined]
[26] autodiff
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:504 [inlined]
[27] hvp!
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:2498 [inlined]
[28] hvp(f::typeof(g), x::Vector{Float64}, v::Vector{Float64})
@ Enzyme ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:2465
[29] top-level scope
@ REPL[90]:1 |
when did julia emit masked gathers??? cc @gbaraldi |
in particular, why is this not a masked_load ? |
@gdalle I can't repro your first example locally on Julia 1.11.1, on my Linux box julia> using Enzyme: Enzyme
julia> f(x) = sum(vec(x .^ 3) .* transpose(vec(x .^ 4)))
f (generic function with 1 method)
julia> xv, dxv = float.(1:12), ones(12); # vector input
julia> pv = Enzyme.hvp(f, xv, dxv) # correct result
12-element Vector{Float64}:
518076.0
1.374984e6
2.617524e6
4.292496e6
6.4467e6
9.126936e6
1.2380004e7
1.6252704e7
2.0791836e7
2.60442e7
3.2056596e7
3.8875824e7
julia> xm, dxm = float.(reshape(1:12, 4, 3)), ones(4, 3); # matrix input
julia> pm = Enzyme.hvp(f, xm, dxm) # only the first row is correct
4×3 Matrix{Float64}:
518076.0 6.4467e6 2.07918e7
1.37498e6 9.12694e6 2.60442e7
2.61752e6 1.238e7 3.20566e7
4.2925e6 1.62527e7 3.88758e7
julia> reshape(pv, 4, 3) - pm # should be zero everywhere
4×3 Matrix{Float64}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> |
Things are getting stranger and stranger. But at least that behavior is also present in my CI on 1.11, I didn't dream it: The first few test failures are all happening during comparison of an HVP output to the expected result from the test scenario. As you can see in the CI log, the HVPs computed by DI using Enzyme (which is essentially the same as |
So, to sum up:
|
The osx error I can resolve without extra stuff. The correctness one concerns me. I really need you to minimize this further. Eg desugar broadcast hvp, whatever else you can. Then maybe I have a chance of reproducing and thus fixing it |
I'll try when I get back to my linux laptop, probably next week, maybe this weekend if I have some time |
I removed the using Enzyme
function mygradient(f, x)
g = make_zero(x)
autodiff(Reverse, f, Active, Duplicated(x, g))
return g
end
myhvp(f, x, dx) = return autodiff(Forward, mygradient, Const(f), Duplicated(x, dx))[1]
xm, dxm = Matrix{Float64}(reshape(1:12, 4, 3)), ones(4, 3); # matrix input
f1(x) = sum(vec(x) .* transpose(vec(x .^ 2)))
f2(x) = sum(vec(x .^ 2) .* transpose(vec(x .^ 2))) julia> pm1 = myhvp(f1, xm, dxm)
4×3 Matrix{Float64}:
336.0 432.0 528.0
360.0 456.0 552.0
384.0 480.0 576.0
408.0 504.0 600.0
julia> pm2 = myhvp(f2, xm, dxm)
freeing without malloc %_cache300.i.0 = phi double* [ undef, %L126.i ], [ %190, %L347.loopexit.i ]
freeing without malloc %_cache323.i.0 = phi double* [ undef, %L492.i ], [ %422, %L713.i.loopexit ]
4×3 Matrix{Float64}:
3224.0 5720.0 8216.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0 This was on my linux laptop: julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 1
(@di) pkg> st
Status `~/.julia/environments/di/Project.toml`
...
[7da242da] Enzyme v0.13.14
... |
This test function may seem random but it's an easy and GPU-friendly way to get a Hessian where every coefficient |
yeah for sure but broadcasting adds a ton of misc bloat, if you can try to reduce while maintaining the failure |
I've tried but did not succeed. As shown above, the version written as a loop gives the correct result. So the issue is probably with broadcasting itself. |
Is there anything else I can do to help? I understand that this function is not the simplest you can possibly conceive, but it is still pretty elementary. And most importantly, it satisfies the essential conditions for my DI test suite:
|
Yeah I think just simplifying it (eg remove broadcast and call say Base.broadcast etc, and then simplifying the use of that code) |
Broadcasting code generates a bunch of things and isn’t really elementary from what Julia generates =/ |
As mentioned in this comment, I tried simplifying it but the version with just a manual loop does not trigger the same error. So it is probably due to broadcasting itself. I'll try with
I understand that but it is still a fundamental part of the language, so Enzyme needs to support it too. If you don't think it should be supported, then users need to be warned that Enzyme may return wrong results if their functions include broadcasts. |
I agree and we do support broadcasting. However you asked what you could do to help and I pointed you to how to simplify broadcasting code in this case while hopefully retaining the error. I often use @which and @code_lowered to figure out what functions are called to try to inline, and simplify |
I'll do my best |
I started to peel off the layers of complexity but I hit a snag right away. When I replace using Enzyme
function mygradient(f, x)
g = make_zero(x)
autodiff(Reverse, f, Active, Duplicated(x, g))
return g
end
myhvp(f, x, dx) = return autodiff(Forward, mygradient, Const(f), Duplicated(x, dx))[1]
myabs2(x) = abs2.(x)
function f(x)
a = vec(abs2.(x))
b = transpose(vec(abs2.(x)))
return sum(a .* b)
end
function f_bis(x)
a = vec(myabs2(x))
b = transpose(vec(myabs2(x)))
return sum(a .* b)
end
xv, dxv = Vector{Float64}(1:4), ones(4); # vector input julia> myhvp(f, xv, dxv)
4-element Vector{Float64}:
200.0
280.0
360.0
440.0
julia> myhvp(f_bis, xv, dxv)
ERROR: Constant memory is stored (or returned) to a differentiable variable.
As a result, Enzyme cannot provably ensure correctness and throws this error.
This might be due to the use of a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity).
If Enzyme should be able to prove this use non-differentable, open an issue!
To work around this issue, either:
a) rewrite this variable to not be conditionally active (fastest, but requires a code change), or
b) set the Enzyme mode to turn on runtime activity (e.g. autodiff(set_runtime_activity(Reverse), ...) ). This will maintain correctness, but may slightly reduce performance.
Mismatched activity for: %.unpack.fca.10.insert = insertvalue { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, i8*, {} addrspace(10)*, {} addrspace(10)*, i8*, {} addrspace(10)*, i64, {} addrspace(10)*, i64, double*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)**, {} addrspace(10)*, i1 } %.unpack.fca.9.insert, {} addrspace(10)* %.sroa.27.0, 10, !dbg !80 const val: %.sroa.27.0 = phi {} addrspace(10)* [ %11, %L8 ], [ @ejl_jl_nothing, %L10 ]
value=Unknown object of type Union{Nothing, Memory{Float64}}
llvalue= %.sroa.27.0 = phi {} addrspace(10)* [ %11, %L8 ], [ @ejl_jl_nothing, %L10 ]
Stacktrace:
[1] copy
@ ./broadcast.jl:892
[2] materialize
@ ./broadcast.jl:867
[3] myabs2
@ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/playground.jl:11
Stacktrace:
[1] copy
@ ./broadcast.jl:892 [inlined]
[2] materialize
@ ./broadcast.jl:867 [inlined]
[3] myabs2
@ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/playground.jl:11
[4] f_bis
@ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/playground.jl:20 [inlined]
[5] diffejulia_f_bis_61640wrap
@ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/playground.jl:0 [inlined]
[6] macro expansion
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:8305 [inlined]
[7] enzyme_call
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:7868 [inlined]
[8] CombinedAdjointThunk
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:7641 [inlined]
[9] autodiff_deferred
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:781 [inlined]
[10] autodiff
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:512 [inlined]
[11] mygradient
@ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/playground.jl:5 [inlined]
[12] fwddiffejulia_mygradient_61610wrap
@ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/playground.jl:0
[13] macro expansion
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:8305 [inlined]
[14] enzyme_call
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:7868 [inlined]
[15] ForwardModeThunk
@ ~/.julia/packages/Enzyme/RvNgp/src/compiler.jl:7657 [inlined]
[16] autodiff
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:647 [inlined]
[17] autodiff(::ForwardMode{…}, ::Const{…}, ::Const{…}, ::Duplicated{…})
@ Enzyme ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:537
[18] autodiff
@ ~/.julia/packages/Enzyme/RvNgp/src/Enzyme.jl:504 [inlined]
[19] myhvp(f::Function, x::Vector{Float64}, dx::Vector{Float64})
@ Main ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/playground.jl:9
[20] top-level scope
@ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/playground.jl:29
Some type information was truncated. Use `show(err)` to see complete types. |
julia inlining is fun and non obvious. Maybe mark the function @ inline? |
Okay, I have dived deep into the broadcasting logic and inlined as much as I could. What I find is that the incorrectness hinges on the In the code below, using Base.Broadcast:
Broadcasted, DefaultArrayStyle, materialize, preprocess, preprocess_args
using Enzyme
function mygradient(f, x)
g = make_zero(x)
autodiff(Reverse, f, Active, Duplicated(x, g))
return g
end
myhvp(f, x, dx) = return autodiff(Forward, mygradient, Const(f), Duplicated(x, dx))[1]
@inline cube(x) = x^3
@inline function arraycube_bad(x::AbstractArray{N}) where {N}
y = similar(x)
args = (x,)
style = nothing
ax = axes(x)
bc = Broadcasted(style, cube, args, ax)
bc´ = preprocess(y, bc)
@inbounds @simd for i in eachindex(bc´)
y[i] = bc´[i]
end
return y
end
@inline function arraycube_good(x)
y = similar(x)
args = (x,)
style = nothing
ax = axes(x)
bc = Broadcasted(style, cube, args, ax)
bc´ = preprocess(y, bc)
@inbounds for i in eachindex(bc´)
y[i] = bc´[i]
end
return y
end
f_bad(x) = sum(arraycube_bad(x))
f_good(x) = sum(arraycube_good(x))
xm, dxm = Matrix{Float64}(reshape(1:12, 4, 3)), ones(4, 3); # matrix input julia> myhvp(f_bad, xm, dxm)
freeing without malloc %_cache120.i.0 = phi double* [ undef, %L103.i ], [ %253, %L310.i.loopexit ]
4×3 Matrix{Float64}:
6.0 30.0 54.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> myhvp(f_good, xm, dxm)
4×3 Matrix{Float64}:
6.0 30.0 54.0
12.0 36.0 60.0
18.0 42.0 66.0
24.0 48.0 72.0 |
Would it be possible to remove the preprocess/broadcasted parts? |
I'm not at my work computer right now but if I remember correctly, I had tried removing the |
@wsmoses I finally have a minimum working (or failing) example without using Enzyme
function mygradient(f, x)
g = make_zero(x)
autodiff(Reverse, f, Active, Duplicated(x, g))
return g
end
myhvp(f, x, dx) = return autodiff(Forward, mygradient, Const(f), Duplicated(x, dx))[1]
@inline function arraycube_bad(x::AbstractArray{N}) where {N}
y = similar(x)
@inbounds @simd for i in CartesianIndices(y)
y[i] = x[i]^3
end
return y
end
@inline function arraycube_good1(x)
y = similar(x)
@inbounds for i in CartesianIndices(y)
y[i] = x[i]^3
end
return y
end
@inline function arraycube_good2(x)
y = similar(x)
@inbounds @simd for i in eachindex(y)
y[i] = x[i]^3
end
return y
end
f_bad(x) = sum(arraycube_bad(x))
f_good1(x) = sum(arraycube_good1(x))
f_good2(x) = sum(arraycube_good2(x))
xm, dxm = Matrix{Float64}(reshape(1:12, 4, 3)), ones(4, 3); # matrix input
@assert f_bad(xm) == f_good1(xm) == f_good2(xm)
myhvp(f_bad, xm, dxm) # incorrect
myhvp(f_good1, xm, dxm) # correct
myhvp(f_good2, xm, dxm) # correct |
Awesome, yeah this should be enough to go on. |
However, sadly on my laptop I still get correct answers... julia> myhvp(f_bad, xm, dxm) # incorrect
freeing without malloc %_cache174.i.0 = phi i8* [ undef, %L23.i ], [ %_malloccache175.i, %L193.i.loopexit ]
4×3 Matrix{Float64}:
6.0 30.0 54.0
12.0 36.0 60.0
18.0 42.0 66.0
24.0 48.0 72.0
julia> myhvp(f_good1, xm, dxm) # correct
4×3 Matrix{Float64}:
6.0 30.0 54.0
12.0 36.0 60.0
18.0 42.0 66.0
24.0 48.0 72.0
julia> myhvp(f_good2, xm, dxm) # correct
4×3 Matrix{Float64}:
6.0 30.0 54.0
12.0 36.0 60.0
18.0 42.0 66.0
24.0 48.0 72.0 Let me try my desktop, and if that doesn't fail I'll need your help to figure out an even more specific example which triggers the issue (e.g. try to remove simd, inline cartesianindex iterators into a while loop of the iterate function, possibly even using llvmcall on the exact input IR) |
Yeah it also gets the right answer on my linux box. Some more reduction ideas in interim: Maybe also try to get rid of sum? [e.g. just use first, or potentially do a manual for loop over, and possibly inbounds]. Does squaring work instead of cube It doesn't matter if it computes something different ffrom the original fn, so long as the result is wrong |
My machine's specs are as follows: julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores) Here's what I get with the previous example, both on Enzyme v0.13.14 and on the brand new Enzyme v0.13.15: julia> myhvp(f_bad, xm, dxm) # incorrect
freeing without malloc %_cache37.i.0 = phi double* [ undef, %L23.i ], [ %134, %L193.i.loopexit ]
freeing without malloc %_cache84.i.0 = phi double* [ undef, %L23.i ], [ %209, %L193.i.loopexit ]
4×3 Matrix{Float64}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> myhvp(f_good1, xm, dxm) # correct
4×3 Matrix{Float64}:
6.0 30.0 54.0
12.0 36.0 60.0
18.0 42.0 66.0
24.0 48.0 72.0
julia> myhvp(f_good2, xm, dxm) # correct
4×3 Matrix{Float64}:
6.0 30.0 54.0
12.0 36.0 60.0
18.0 42.0 66.0
24.0 48.0 72.0 Note the following warning, which was a staple of every one of my previous attempts too. When the HVP is correct, it doesn't appear.
On your platform it is slightly different:
Any idea what this could mean? |
Good catch with @inline function mysum(y::AbstractArray)
s = zero(eltype(y))
@inbounds @simd for i in eachindex(y)
s += y[i]
end
return s
end |
Can you try inlining the iterate in the for loop and the sum? And yeah the machine specs are helpful, but without the ability to repro your issue I need something more precise to debug it. |
Those are just warnings, I also got them on my end alongside correct answers |
I'll try more inlining but I think you might be able to reproduce it on GitHub's CI machines, after all that's where I first diagnosed it. If you want I can set you up with a simple repo containing only this test, so that you can investigate too? |
yeah I guess try de-sugaring the julia iterator through the |
I got rid of using Enzyme
function mygradient(f, x)
g = make_zero(x)
autodiff(Reverse, f, Active, Duplicated(x, g))
return g
end
myhvp(f, x, dx) = return autodiff(Forward, mygradient, Const(f), Duplicated(x, dx))[1]
@inline function arraycube_bad(x::AbstractArray{N}) where {N}
y = similar(x)
for i in axes(y, 1)
for j in axes(y, 2)
y[i, j] = x[i, j]^3
end
end
return y
end
@inline function arraycube_good(x)
y = similar(x)
for k in eachindex(y)
y[k] = x[k]^3
end
return y
end
f_bad(x) = sum(arraycube_bad(x))
f_good(x) = sum(arraycube_good(x)) julia> xm, dxm = Matrix{Float64}(reshape(1:12, 4, 3)), ones(4, 3);
julia> myhvp(f_bad, xm, dxm)
4×3 Matrix{Float64}:
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
julia> myhvp(f_good, xm, dxm)
4×3 Matrix{Float64}:
6.0 30.0 54.0
12.0 36.0 60.0
18.0 42.0 66.0
24.0 48.0 72.0 I have also set up a repo with GitHub actions for testing, where the error is reliably reproducible in CI. That way you can experiment with it even it your hardware behaves differently. Check it out here: https://github.com/gdalle/EnzymeErrorReproducer.jl. Can you take it from here or do I need to dig even further? |
can you add me to the repo? I'll see what I can do [but if it fails to reproduce again for me, I may ask for more help here] |
Ironically @inbounds actually makes it simpler, if it still fails with that |
Still fails to repro for me. Can you add also ideally change |
I added you to the repo. It now contains the following code, which only fails when we remove the using Enzyme
function mygradient(f, x)
g = zero(x)
autodiff(Reverse, f, Active, Duplicated(x, g))
return g
end
myhvp(f, x, dx) = return autodiff(Forward, mygradient, Const(f), Duplicated(x, dx))[1]
@inline function arraycube_bad(x::AbstractArray{N}) where {N}
y = similar(x)
@inbounds for i in 1:4
@inbounds for j in 1:3
y[i, j] = x[i, j]^3
end
end
return y
end
@inline function arraycube_good(x)
y = similar(x)
@inbounds for k in 1:12
y[k] = x[k]^3
end
return y
end
f_bad(x) = sum(arraycube_bad(x))
f_good(x) = sum(arraycube_good(x))
inputs() = Matrix{Float64}(reshape(1:12, 4, 3)), ones(4, 3) |
Can you run this on your machine and paste the log? The lines at the top should print a bunch of info to stderr, which I'll need to debug (and CI doesn't seem to capture). |
I think my GitHub repo didn't print anything because I modified the Enzyme settings in |
In both files below there is presumably a log for |
Could it be that the reason you were seeing nothing on your hardware is because you're running Julia with automatic bound checks (inbounds matters) whereas I was running it with systematic bound checks (inbounds is ignored)? |
I further split the test process on the example repo so that it shows exactly when the HVP goes wrong. Let me know if I need to do anything else? |
Hey @wsmoses, friendly ping on this one. |
I have not managed to reproduce it. Change it so just It should emit llvm modules before each Enzyme call, if you can take the module of the final autodiff which produces an error and put it as
then hopefully it can be in a form that we can repro? |
here's the docs on LLVMcall: https://docs.julialang.org/en/v1/base/c/#Core.Intrinsics.llvmcall |
Hey @wsmoses, thanks for following up. I don't understand what I'm supposed to copy, and I have never worked with LLVM code before. Can you figure it out from the following CI log? It corresponds to this test file. |
I updated the package https://github.com/gdalle/EnzymeErrorReproducer.jl to inline the sum as much as possible. I reached the point where if I inline one more time it returns the correct result. With that version of the code, i ran using Enzyme
Enzyme.Compiler.DumpPreEnzyme[] = true
using EnzymeErrorReproducer
x, dx = inputs()
myhvp(f_cartesian, x, dx) and got two LLVM modules (encoded as two Julia strings in the attached file). I assume the first one is for the inner gradient and the second one is for the outer pushforward? I couldn't figure out how to call julia> function hvp_llvm(x, dx)
return Core.Intrinsics.llvmcall(
(llvm_module2, "mygradient"), Nothing, Tuple{Any,Any}, x, dx
)
end
hvp_llvm (generic function with 1 method)
julia> hvp_llvm(x, dx)
ERROR: error statically evaluating llvm IR argument
Stacktrace:
[1] hvp_llvm(x::Matrix{Float64}, dx::Matrix{Float64})
@ Main ~/Work/GitHub/Julia/EnzymeErrorReproducer.jl/test/playground.jl:3862
[2] top-level scope
@ ~/Work/GitHub/Julia/EnzymeErrorReproducer.jl/test/playground.jl:3867 |
I diagnosed this problem with DifferentiationInterface, but it also appears using Enzyme's own
hvp
function.The following code is run on Julia 1.11 with Enzyme v0.13.14. On Julia 1.10 it returns the correct result.
The text was updated successfully, but these errors were encountered: