Constant Folding openlibm functions #9942

jwmerrill · 2015-01-27T19:56:47Z

https://groups.google.com/forum/#!topic/julia-users/Jndl9sYwj5Q reports a performance regression in some simple code from a blog post that was meant to illustrate the importance of type stability: http://www.johnmyleswhite.com/notebook/2013/12/06/writing-type-stable-code-in-julia/

The code is

function sumofsins2(n::Integer)  
    r = 0.0  
    for i in 1:n  
        r += sin(3.4)  
    end  
    return r  
end

and the blog post gives the output (as of Julia 2.x) of code_llvm(sumofsins2, (Int,))

define double @julia_sumofsins21068(i64) {  
top:  
  %1 = icmp slt i64 %0, 1, !dbg !5151  
  br i1 %1, label %L2, label %pass, !dbg !5151  

pass:                                             ; preds = %top, %pass  
  %"#s6.04" = phi i64 [ %3, %pass ], [ 1, %top ]  
  %r.03 = phi double [ %2, %pass ], [ 0.000000e+00, %top ]  
  %2 = fadd double %r.03, 0xBFD05AC910FF4C6C, !dbg !5156  
  %3 = add i64 %"#s6.04", 1, !dbg !5156  
  %4 = icmp sgt i64 %3, %0, !dbg !5151  
  br i1 %4, label %L2, label %pass, !dbg !5151  

L2:                                               ; preds = %pass, %top  
  %r.0.lcssa = phi double [ 0.000000e+00, %top ], [ %2, %pass ]  
  ret double %r.0.lcssa, !dbg !5157  
}

In this IR code, the sin(3.4) has been constant folded to 0xBFD05AC910FF4C6C.

As of Julia 3.4, the new llvm code is

define double @"julia_sumofsins2;20064"(i64) {
top:
  %1 = icmp sgt i64 %0, 0, !dbg !841
  br i1 %1, label %L, label %L3, !dbg !841

L:                                                ; preds = %top, %pass
  %r.0 = phi double [ %6, %pass ], [ 0.000000e+00, %top ]
  %"#s119.0" = phi i64 [ %5, %pass ], [ 1, %top ]
  %2 = call double inttoptr (i64 4551878240 to double (double)*)(double 3.400000e+00), !dbg !842
  %3 = fcmp ord double %2, 0.000000e+00, !dbg !842
  br i1 %3, label %pass, label %fail, !dbg !842

fail:                                             ; preds = %L
  %4 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !842, !tbaa %jtbaa_const
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %4, i32 4), !dbg !842
  unreachable, !dbg !842

pass:                                             ; preds = %L
  %5 = add i64 %"#s119.0", 1, !dbg !841
  %6 = fadd double %r.0, %2, !dbg !842
  %7 = icmp eq i64 %"#s119.0", %0, !dbg !842
  br i1 %7, label %L3, label %L, !dbg !842

L3:                                               ; preds = %pass, %top
  %r.1 = phi double [ 0.000000e+00, %top ], [ %6, %pass ]
  ret double %r.1, !dbg !845
}

so it looks like the call to sin is no longer being constant folded.

I haven't checked the generated code on master, but it sounds like the performance regression is still present there.

The text was updated successfully, but these errors were encountered:

simonster · 2015-01-27T20:37:54Z

On 0.2.1:

julia> code_llvm(sin, (Float64,))

define double @julia_sin(double) {
top:
  %1 = call double @sin(double %0), !dbg !3342
  %2 = fcmp ord double %1, 0.000000e+00, !dbg !3342
  %3 = fcmp uno double %0, 0.000000e+00, !dbg !3342
  %4 = or i1 %2, %3, !dbg !3342
  br i1 %4, label %pass, label %fail, !dbg !3342

fail:                                             ; preds = %top
  %5 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !3342
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %5, i32 282), !dbg !3342
  unreachable, !dbg !3342

pass:                                             ; preds = %top
  ret double %1, !dbg !3342
}

On master:

define double @julia_sin_43255(double) {
top:
  %1 = call double inttoptr (i64 4668253248 to double (double)*)(double %0), !dbg !265
  %2 = fcmp ord double %1, 0.000000e+00, !dbg !265
  %3 = fcmp uno double %0, 0.000000e+00, !dbg !265
  %4 = or i1 %2, %3, !dbg !265
  br i1 %4, label %pass, label %fail, !dbg !265

fail:                                             ; preds = %top
  %5 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !265, !tbaa %jtbaa_const
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %5, i32 123), !dbg !265
  unreachable, !dbg !265

pass:                                             ; preds = %top
  ret double %1, !dbg !265
}

I'd guess that for LLVM to constant fold sin, it has to know that it's named sin, and somewhere along the line (precompilation?) we stopped giving it that information.

vtjnash · 2015-01-27T20:50:43Z

correct. we now force llvm to use the sin in libopenlibm, rather than giving it the freedom to pick any function named sin

dhoegh · 2015-01-28T10:30:26Z

I do not get the performance from constant folding, here is the output from code_llvm(sumofsins2, (Int,)).

julia> code_llvm(sumofsins2, (Int,))

define double @julia_sumofsins2_1172(i64) {
top:
  %1 = icmp sgt i64 %0, 0, !dbg !3561
  br i1 %1, label %L, label %L3, !dbg !3561

L:                                                ; preds = %top, %pass
  %r.0 = phi double [ %6, %pass ], [ 0.000000e+00, %top ]
  %"#s3.0" = phi i64 [ %5, %pass ], [ 1, %top ]
  %2 = call double inttoptr (i64 1752503104 to double (double)*)(double 3.400000e+00), !dbg !3562
  %3 = fcmp ord double %2, 0.000000e+00, !dbg !3562
  br i1 %3, label %pass, label %fail, !dbg !3562

fail:                                             ; preds = %L
  %4 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !3562, !tbaa %jtbaa_const
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %4, i32 4), !dbg !3562
  unreachable, !dbg !3562

pass:                                             ; preds = %L
  %5 = add i64 %"#s3.0", 1, !dbg !3561
  %6 = fadd double %r.0, %2, !dbg !3562
  %7 = icmp eq i64 %"#s3.0", %0, !dbg !3562
  br i1 %7, label %L3, label %L, !dbg !3562

L3:                                               ; preds = %pass, %top
  %r.1 = phi double [ 0.000000e+00, %top ], [ %6, %pass ]
  ret double %r.1, !dbg !3565
}

My version info is:

julia> versioninfo()
Julia Version 0.4.0-dev+2847
Commit fc61385 (2015-01-21 18:34 UTC)
Platform Info:
  System: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

eschnett · 2015-08-25T16:46:16Z

LLVM defines intrinsics for many operations (sqrt, sin, etc). It seems we need to generate these intrinsics instead of libcalls to benefit from LLVM's optimizations. Does this sound reasonable? This looks to be straightforward-but-tedious.

simonster · 2015-08-25T16:48:14Z

We would also need to make the LLVM intrinsics call openlibm instead of the system libm.

eschnett · 2015-08-25T17:32:01Z

Can you remind me why we need to call openlibm? This prevents not only constant folding, but also using hardware instructions if they are available, e.g. vsqrtpd on Intel.

LLVM tries to intercept libcalls to certain well-known functions, such as sqrt in the global namespace. I assume (couldn't find the logic yet) that it thus also replaces intrinsics by such libcalls if necessary. To ensure it calls openlibm instead of libm, we would need to declare these openlibm functions to LLVM's code generators.

eschnett · 2015-08-26T02:24:23Z

Regarding making LLVM call openlibm instead of libm: If Julia is configured with UNTRUSTED_SYSTEM_LIBM, then it uses openlibm instead of libm -- this includes code generated by LLVM. That is:

either we always want to use openlibm for generated code; we can then simply always configure with this option (suitably renamed)
or we only want to use openlibm if libm is broken; in this case, the existing configure mechanism should suffice.

What am I missing?

Given this, adding new intrinsics is easy. I added the sin intrinsic for fun -- it's three lines in src/intrinsics.cpp, and the respective change in base/math.jl.

eschnett · 2015-08-26T14:04:53Z

See https://github.com/eschnett/julia/tree/math-intrinsics for the prototype implementation.

eschnett · 2015-08-26T14:26:06Z

I looked at http://www.johnmyleswhite.com/notebook/2013/12/06/writing-type-stable-code-in-julia/ again with this branch, and there is unfortunately no speedup. LLVM correctly recognizes sees the sin intrinsic, but it still doesn't constant-fold. For reference, here is the inner loop as generate on my system:

L78:    vmovsd  %xmm0, -24(%rbp)
Source line: 12
    vmovsd  -32(%rbp), %xmm0
    callq   *%r14
    vucomisd    %xmm0, %xmm0
    jp  L132
    vmovsd  -24(%rbp), %xmm1
    vaddsd  %xmm0, %xmm1, %xmm1
    vmovaps %xmm1, %xmm0
    decq    %rbx
    jne L78

The call etc. is still there. Thus, to enable constant-folding, it is likely necessary to address #5234.

eschnett · 2015-08-26T17:53:06Z

Update: No speedup when building against LLVM 3.3 (the default), but with LLVM 3.6, the speedup is restored. Apparently LLVM 3.3 still lacks certain optimizations for sin.

stevengj · 2015-12-08T18:27:06Z

Note that #14324 is also relevant here — this issue is not really specific to libm functions and LLVM intrinsics.

KristofferC · 2016-02-03T16:45:03Z

Any idea how to progress on this? This feels a bit sad (latest master)

f(x) = cos(x) + cos(x)

function f2(x)
    c = cos(x)
    c + c
end

@time for i = 1:10^6 f(2.0) end,
#  0.021374 seconds

@time for i = 1:10^6 f2(2.0) end
#  0.010027 seconds

stevengj · 2016-04-21T14:45:13Z

@KristofferC, that particular case would involve CSE (common subexpression elimination) for @pure functions, which is a bit different than the issue here.

oscardssmith · 2020-12-07T23:51:21Z

At this point, we only use libm for expm1, log2, log10, and some rounding functions. log2 and log10 both could probably be moved to pure Julia pretty easily. I'll look into it.

vtjnash changed the title ~~Constant Folding Regression~~ Constant Folding openlibm functions Jan 29, 2015

vtjnash added the feature label Jan 29, 2015

ViralBShah removed the feature label Feb 14, 2015

ihnorton added the performance Must go faster label Mar 24, 2015

simonster mentioned this issue Apr 21, 2015

Optimising repeated calls to log #10922

Closed

simonbyrne mentioned this issue May 22, 2015

make ccall sigatomic (defer SIGINT handling) #2622

Closed

eschnett mentioned this issue Aug 26, 2015

RFC: Use Julia hypot instead of libm hypot. #12808

Closed

eschnett mentioned this issue Aug 27, 2015

Use LLVM intrinsics for math functions #12830

Closed

yuyichao mentioned this issue Aug 28, 2015

Performance of sin, cos #12859

Closed

KristofferC mentioned this issue Sep 24, 2018

LICM for pure functions #29285

Open

stevengj mentioned this issue Jul 8, 2019

codegen: propagate at-pure macro to llvm #32368

Closed

ViralBShah added the maths Mathematical functions label Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant Folding openlibm functions #9942

Constant Folding openlibm functions #9942

jwmerrill commented Jan 27, 2015

simonster commented Jan 27, 2015

vtjnash commented Jan 27, 2015

dhoegh commented Jan 28, 2015

eschnett commented Aug 25, 2015

simonster commented Aug 25, 2015

eschnett commented Aug 25, 2015

eschnett commented Aug 26, 2015

eschnett commented Aug 26, 2015

eschnett commented Aug 26, 2015

eschnett commented Aug 26, 2015

stevengj commented Dec 8, 2015

KristofferC commented Feb 3, 2016

stevengj commented Apr 21, 2016 •

edited

Loading

oscardssmith commented Dec 7, 2020

Constant Folding openlibm functions #9942

Constant Folding openlibm functions #9942

Comments

jwmerrill commented Jan 27, 2015

simonster commented Jan 27, 2015

vtjnash commented Jan 27, 2015

dhoegh commented Jan 28, 2015

eschnett commented Aug 25, 2015

simonster commented Aug 25, 2015

eschnett commented Aug 25, 2015

eschnett commented Aug 26, 2015

eschnett commented Aug 26, 2015

eschnett commented Aug 26, 2015

eschnett commented Aug 26, 2015

stevengj commented Dec 8, 2015

KristofferC commented Feb 3, 2016

stevengj commented Apr 21, 2016 • edited Loading

oscardssmith commented Dec 7, 2020

stevengj commented Apr 21, 2016 •

edited

Loading