Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f() = 2^5 not compiled all the way to result #30126

Closed
jakobnissen opened this issue Nov 23, 2018 · 4 comments
Closed

f() = 2^5 not compiled all the way to result #30126

jakobnissen opened this issue Nov 23, 2018 · 4 comments
Labels
compiler:inference Type inference performance Must go faster

Comments

@jakobnissen
Copy link
Contributor

jakobnissen commented Nov 23, 2018

Have a look at:

julia> f() = 2^5; @code_native f()
    .text
; Function f {
; Location: none:1
; Function literal_pow; {
; Location: none
; Function macro expansion; {
; Location: none
; Function ^; {
; Location: none:1
    pushq   %rax
    movabsq $power_by_squaring, %rax
    movl    $2, %edi
    movl    $5, %esi
    callq   *%rax
;}}}
    popq    %rcx
    retq
    nopl    (%rax)
;}

This behaviour is weird. Since the result is completely fixed, shouldn't it compile all the way down to "return this predefined number" instead of requiring a new function call?
To make things weirder, the following functions do compile down to "return this number":
m() = 2^8; n() = 3^4; g() = 2.0^5; h() = 2^5.0, k() = 1<<5
In practice, this means that running k() is unnecessarily ~3x faster than f(). Is this a bug that should be fixed?

Environment:

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-4210H CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)
Environment:
  JULIA_EDITOR = atom -a
  JULIA_NUM_THREADS = 2
@Keno Keno added performance Must go faster compiler:inference Type inference labels Nov 23, 2018
@Keno
Copy link
Member

Keno commented Nov 23, 2018

Slightly better on master, but inference's purity analysis should be able to just figure this out.

@ranocha
Copy link
Member

ranocha commented Jun 8, 2020

Is there any update? On my system, I get nice results of literal_pow for powers two and three, but bad results starting at four.

julia> versioninfo()
Julia Version 1.5.0-beta1.0
Commit 6443f6c95a (2020-05-28 17:42 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 1

julia> foo(x) = x^4
foo (generic function with 1 method)

julia> @code_llvm foo(5.0)

;  @ REPL[24]:1 within `foo'
define double @julia_foo_3022(double) {
top:
; ┌ @ intfuncs.jl:300 within `literal_pow'
; │┌ @ math.jl:905 within `^'
    %1 = call double @llvm.pow.f64(double %0, double 4.000000e+00)
; └└
  ret double %1
}

julia> bar(x) = (x^2)^2
bar (generic function with 1 method)

julia> @code_llvm bar(5.0)

;  @ REPL[26]:1 within `bar'
define double @julia_bar_3030(double) {
top:
; ┌ @ intfuncs.jl:296 within `literal_pow'
; │┌ @ float.jl:405 within `*'
    %1 = fmul double %0, %0
    %2 = fmul double %1, %1
; └└
  ret double %2
}
julia> using BenchmarkTools

julia> @benchmark foo($(Ref(5.0))[])
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     11.585 ns (0.00% GC)
  median time:      12.096 ns (0.00% GC)
  mean time:        12.148 ns (0.00% GC)
  maximum time:     41.651 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> @benchmark bar($(Ref(5.0))[])
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.057 ns (0.00% GC)
  median time:      1.058 ns (0.00% GC)
  mean time:        1.061 ns (0.00% GC)
  maximum time:     4.052 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

In mathematical code, it is quite handy to be able to write x^4 instead of (x^2)^2 but that can become a performance bottleneck.

@jakobnissen
Copy link
Contributor Author

@ranocha That is a different issue. That's because ^ is special cased for the exponent being -1, 0, 1, 2 or 3. It's not really a bug. The issue in this thread is the lack of constant folding - and it does constant fold for floats.

@jakobnissen
Copy link
Contributor Author

Fixed on master (presumably #45613)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants