-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant Folding openlibm functions #9942
Comments
On 0.2.1: julia> code_llvm(sin, (Float64,))
define double @julia_sin(double) {
top:
%1 = call double @sin(double %0), !dbg !3342
%2 = fcmp ord double %1, 0.000000e+00, !dbg !3342
%3 = fcmp uno double %0, 0.000000e+00, !dbg !3342
%4 = or i1 %2, %3, !dbg !3342
br i1 %4, label %pass, label %fail, !dbg !3342
fail: ; preds = %top
%5 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !3342
call void @jl_throw_with_superfluous_argument(%jl_value_t* %5, i32 282), !dbg !3342
unreachable, !dbg !3342
pass: ; preds = %top
ret double %1, !dbg !3342
} On master: define double @julia_sin_43255(double) {
top:
%1 = call double inttoptr (i64 4668253248 to double (double)*)(double %0), !dbg !265
%2 = fcmp ord double %1, 0.000000e+00, !dbg !265
%3 = fcmp uno double %0, 0.000000e+00, !dbg !265
%4 = or i1 %2, %3, !dbg !265
br i1 %4, label %pass, label %fail, !dbg !265
fail: ; preds = %top
%5 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !265, !tbaa %jtbaa_const
call void @jl_throw_with_superfluous_argument(%jl_value_t* %5, i32 123), !dbg !265
unreachable, !dbg !265
pass: ; preds = %top
ret double %1, !dbg !265
} I'd guess that for LLVM to constant fold |
correct. we now force llvm to use the sin in libopenlibm, rather than giving it the freedom to pick any function named sin |
I do not get the performance from constant folding, here is the output from julia> code_llvm(sumofsins2, (Int,))
define double @julia_sumofsins2_1172(i64) {
top:
%1 = icmp sgt i64 %0, 0, !dbg !3561
br i1 %1, label %L, label %L3, !dbg !3561
L: ; preds = %top, %pass
%r.0 = phi double [ %6, %pass ], [ 0.000000e+00, %top ]
%"#s3.0" = phi i64 [ %5, %pass ], [ 1, %top ]
%2 = call double inttoptr (i64 1752503104 to double (double)*)(double 3.400000e+00), !dbg !3562
%3 = fcmp ord double %2, 0.000000e+00, !dbg !3562
br i1 %3, label %pass, label %fail, !dbg !3562
fail: ; preds = %L
%4 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !3562, !tbaa %jtbaa_const
call void @jl_throw_with_superfluous_argument(%jl_value_t* %4, i32 4), !dbg !3562
unreachable, !dbg !3562
pass: ; preds = %L
%5 = add i64 %"#s3.0", 1, !dbg !3561
%6 = fadd double %r.0, %2, !dbg !3562
%7 = icmp eq i64 %"#s3.0", %0, !dbg !3562
br i1 %7, label %L3, label %L, !dbg !3562
L3: ; preds = %pass, %top
%r.1 = phi double [ 0.000000e+00, %top ], [ %6, %pass ]
ret double %r.1, !dbg !3565
} My version info is: julia> versioninfo()
Julia Version 0.4.0-dev+2847
Commit fc61385 (2015-01-21 18:34 UTC)
Platform Info:
System: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3 |
LLVM defines intrinsics for many operations ( |
We would also need to make the LLVM intrinsics call openlibm instead of the system libm. |
Can you remind me why we need to call openlibm? This prevents not only constant folding, but also using hardware instructions if they are available, e.g. LLVM tries to intercept libcalls to certain well-known functions, such as |
Regarding making LLVM call openlibm instead of libm: If Julia is configured with
What am I missing? Given this, adding new intrinsics is easy. I added the |
See https://github.com/eschnett/julia/tree/math-intrinsics for the prototype implementation. |
I looked at http://www.johnmyleswhite.com/notebook/2013/12/06/writing-type-stable-code-in-julia/ again with this branch, and there is unfortunately no speedup. LLVM correctly recognizes sees the
The |
Update: No speedup when building against LLVM 3.3 (the default), but with LLVM 3.6, the speedup is restored. Apparently LLVM 3.3 still lacks certain optimizations for |
Note that #14324 is also relevant here — this issue is not really specific to libm functions and LLVM intrinsics. |
Any idea how to progress on this? This feels a bit sad (latest master) f(x) = cos(x) + cos(x)
function f2(x)
c = cos(x)
c + c
end
@time for i = 1:10^6 f(2.0) end,
# 0.021374 seconds
@time for i = 1:10^6 f2(2.0) end
# 0.010027 seconds |
@KristofferC, that particular case would involve CSE (common subexpression elimination) for |
At this point, we only use libm for |
https://groups.google.com/forum/#!topic/julia-users/Jndl9sYwj5Q reports a performance regression in some simple code from a blog post that was meant to illustrate the importance of type stability: http://www.johnmyleswhite.com/notebook/2013/12/06/writing-type-stable-code-in-julia/
The code is
and the blog post gives the output (as of Julia 2.x) of
code_llvm(sumofsins2, (Int,))
In this IR code, the
sin(3.4)
has been constant folded to0xBFD05AC910FF4C6C
.As of Julia 3.4, the new llvm code is
so it looks like the call to
sin
is no longer being constant folded.I haven't checked the generated code on master, but it sounds like the performance regression is still present there.
The text was updated successfully, but these errors were encountered: