Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order-dependent/redefinition-dependent allocations with functions using keyword arguments #28940

Closed
schmrlng opened this issue Aug 29, 2018 · 5 comments
Labels
compiler:inference Type inference performance Must go faster

Comments

@schmrlng
Copy link
Contributor

Reminiscent of #28342 and #28683, but I'm not sure if it's the same root cause:

  | | |_| | | | (_| |  |  Version 1.0.0 (2018-08-08)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: AMD Ryzen 7 1800X Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, znver1)

(v1.0) pkg> st
    Status `~/.julia/environments/v1.0/Project.toml`
  [6e4b80f9] BenchmarkTools v0.4.0

julia> using BenchmarkTools

julia> @noinline f(; a=1) = a
f (generic function with 1 method)

julia> g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> @benchmark g()
BenchmarkTools.Trial:
  memory estimate:  64 bytes
  allocs estimate:  2
  --------------
  minimum time:     16.212 ns (0.00% GC)
  median time:      17.016 ns (0.00% GC)
  mean time:        23.304 ns (22.95% GC)
  maximum time:     27.683 μs (99.92% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> h(; a=1//2) = f(a=a)
h (generic function with 1 method)

julia> @benchmark h()
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.191 ns (0.00% GC)
  median time:      6.192 ns (0.00% GC)
  mean time:        6.204 ns (0.00% GC)
  maximum time:     14.447 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark g()
BenchmarkTools.Trial:
  memory estimate:  64 bytes
  allocs estimate:  2
  --------------
  minimum time:     16.254 ns (0.00% GC)
  median time:      16.986 ns (0.00% GC)
  mean time:        23.435 ns (23.47% GC)
  maximum time:     27.776 μs (99.93% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> @benchmark g()
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.931 ns (0.00% GC)
  median time:      5.941 ns (0.00% GC)
  mean time:        5.949 ns (0.00% GC)
  maximum time:     14.257 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
@martinholters
Copy link
Member

Calling f before compilation of g makes the difference here:

julia> @noinline f(; a=1) = a
f (generic function with 1 method)

julia> g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> @code_llvm g()

; Function g
; Location: REPL[2]:1
define void @julia_g_34737({ i64, i64 }* noalias nocapture sret) {
top:
  %1 = alloca %jl_value_t addrspace(10)*, i32 3
  %gcframe = alloca %jl_value_t addrspace(10)*, i32 3
  %2 = bitcast %jl_value_t addrspace(10)** %gcframe to i8*
  call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 24, i32 0, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"()
  %ptls_i8 = getelementptr i8, i8* %thread_ptr, i64 -10920
  %ptls = bitcast i8* %ptls_i8 to %jl_value_t***
; Function #g#4; {
; Location: REPL[2]:1
  %3 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 0
  %4 = bitcast %jl_value_t addrspace(10)** %3 to i64*
  store i64 2, i64* %4
  %5 = getelementptr %jl_value_t**, %jl_value_t*** %ptls, i32 0
  %6 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 1
  %7 = bitcast %jl_value_t addrspace(10)** %6 to %jl_value_t***
  %8 = load %jl_value_t**, %jl_value_t*** %5
  store %jl_value_t** %8, %jl_value_t*** %7
  %9 = bitcast %jl_value_t*** %5 to %jl_value_t addrspace(10)***
  store %jl_value_t addrspace(10)** %gcframe, %jl_value_t addrspace(10)*** %9
  %10 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, i32 0
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416515539168 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %10
  %11 = call nonnull %jl_value_t addrspace(10)* @jsys1_kwfunc_22523(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416601936528 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %1, i32 1)
; Function #f; {
; Location: none
  %12 = bitcast %jl_value_t*** %ptls to i8*
  %13 = call noalias nonnull %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8* %12, i32 1448, i32 32) #1
  %14 = bitcast %jl_value_t addrspace(10)* %13 to %jl_value_t addrspace(10)* addrspace(10)*
  %15 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(10)* %14, i64 -1
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416534658272 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspace(10)* %15
  %16 = bitcast %jl_value_t addrspace(10)* %13 to <2 x i64> addrspace(10)*
  store <2 x i64> <i64 1, i64 2>, <2 x i64> addrspace(10)* %16, align 8
  %17 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 2
  store %jl_value_t addrspace(10)* %13, %jl_value_t addrspace(10)** %17
  %18 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, i32 0
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416515539160 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %18
  %19 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, i32 1
  store %jl_value_t addrspace(10)* %13, %jl_value_t addrspace(10)** %19
  %20 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, i32 2
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416515539168 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %20
  %21 = call nonnull %jl_value_t addrspace(10)* @jl_invoke(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416539791120 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %1, i32 3)
;}}
  %22 = bitcast { i64, i64 }* %0 to i8*
  %23 = bitcast %jl_value_t addrspace(10)* %21 to i8 addrspace(10)*
  call void @llvm.memcpy.p0i8.p10i8.i64(i8* %22, i8 addrspace(10)* %23, i64 16, i32 8, i1 false)
  %24 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %gcframe, i32 1
  %25 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %24
  %26 = getelementptr %jl_value_t**, %jl_value_t*** %ptls, i32 0
  %27 = bitcast %jl_value_t*** %26 to %jl_value_t addrspace(10)**
  store %jl_value_t addrspace(10)* %25, %jl_value_t addrspace(10)** %27
  ret void
}

julia> g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> f(a=1//2)
1//2

julia> @code_llvm g()

; Function g
; Location: REPL[4]:1
define void @julia_g_34751({ i64, i64 }* noalias nocapture sret) {
top:
  %1 = alloca %jl_value_t addrspace(10)*
  %2 = alloca <2 x i64>, align 16
  %tmpcast = bitcast <2 x i64>* %2 to { i64, i64 }*
  %3 = alloca { i64, i64 }, align 8
; Function //; {
; Location: rational.jl:41
; Function Type; {
; Location: rational.jl:18
; Function Type; {
; Location: rational.jl:15
  store <2 x i64> <i64 1, i64 2>, <2 x i64>* %2, align 16
;}}}
; Function #g#5; {
; Location: REPL[4]:1
  %4 = getelementptr %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %1, i32 0
  store %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416515539168 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %4
  %5 = call nonnull %jl_value_t addrspace(10)* @jsys1_kwfunc_22523(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416601936528 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)** %1, i32 1)
; Function #f; {
; Location: none
  %6 = addrspacecast { i64, i64 }* %tmpcast to { i64, i64 } addrspace(11)*
  call void @"julia_#f#3_34742"({ i64, i64 }* noalias nocapture nonnull sret %3, { i64, i64 } addrspace(11)* nocapture readonly %6, %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140416515539168 to %jl_value_t*) to %jl_value_t addrspace(10)*))
;}}
  %7 = bitcast { i64, i64 }* %0 to i8*
  %8 = bitcast { i64, i64 }* %3 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %7, i8* %8, i64 16, i32 8, i1 false)
  ret void
}

Curiously, @code_warntype does not change.

@KristofferC
Copy link
Member

Would be interesting to profile this and show the trace (with C=true).

@schmrlng
Copy link
Contributor Author

schmrlng commented Sep 7, 2018

This is (perhaps unsurprisingly) unaffected by #29086, but here are some profiling results run on that PR's branch in any case:

  | | |_| | | | (_| |  |  Version 1.1.0-DEV.207 (2018-09-07)
 _/ |\__'_|_|_|\__'_|  |  jb/kwfunc_nothrow_fix/3459b880d9 (fork: 1 commits, 0 days)
|__/                   |

julia> using Profile, BenchmarkTools

julia> @noinline f(; a=1) = a
f (generic function with 1 method)

julia> @noinline g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> h() = for i in 1:1e7; g(); end
h (generic function with 1 method)

julia> h(); Profile.clear(); @time @profile h(); Profile.print(C=true)
  0.358387 seconds (40.00 M allocations: 1.192 GiB, 8.64% gc time)
358 /home/schmrlng/code/oss/julia-master/src/task.c:271; start_task
 358 /home/schmrlng/code/oss/julia-master/src/julia.h:1558; jl_apply
  358 /home/schmrlng/code/oss/julia-master/src/gf.c:2196; jl_apply_generic
   358 ./task.jl:259; (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})()
    358 ...hmrlng/code/oss/julia-master/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:117; macro expansion
     358 ...hmrlng/code/oss/julia-master/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
      358 /home/schmrlng/code/oss/julia-master/src/gf.c:2196; jl_apply_generic
       358 ./boot.jl:319; eval(::Module, ::Any)
        358 /home/schmrlng/code/oss/julia-master/src/builtins.c:622; jl_toplevel_eval_in
         358 /home/schmrlng/code/oss/julia-master/src/toplevel.c:765; jl_toplevel_eval_flex
          358 /home/schmrlng/code/oss/julia-master/src/toplevel.c:812; jl_toplevel_eval_flex
           358 /home/schmrlng/code/oss/julia-master/src/gf.c:1841; jl_fptr_trampoline
            297 ./none:0; top-level scope
             297 ./util.jl:156; top-level scope
              297 ...e/oss/julia-master/usr/share/julia/stdlib/v1.1/Profile/src/Profile.jl:25; macro expansion
               297 ./REPL[4]:1; h()
                293 ./REPL[3]:1; g()
                 2   /home/schmrlng/code/oss/julia-master/src/gc.c:941; jl_gc_pool_alloc
                 1   /home/schmrlng/code/oss/julia-master/src/gc.c:998; jl_gc_pool_alloc
                 2   /home/schmrlng/code/oss/julia-master/src/gc.c:1000; jl_gc_pool_alloc
                 2   /home/schmrlng/code/oss/julia-master/src/gf.c:39; jl_invoke
268 unknown stackframe
                  229 ./REPL[3]:1; #g#4(::Rational{Int64}, ::Function)
                   191 ./none:0; #f
                    32 /home/schmrlng/code/oss/julia-master/src/gc.c:941; jl_gc_pool_alloc
                    1  /home/schmrlng/code/oss/julia-master/src/gc.c:953; jl_gc_pool_alloc
                    32 /home/schmrlng/code/oss/julia-master/src/gc.c:955; jl_gc_pool_alloc
                     32 /home/schmrlng/code/oss/julia-master/src/gc.c:2638; jl_gc_collect
                      2  /home/schmrlng/code/oss/julia-master/src/gc.c:2472; _jl_gc_collect
                       2 /home/schmrlng/code/oss/julia-master/src/gc.c:2277; mark_roots
                        2 /home/schmrlng/code/oss/julia-master/src/gc.c:1483; gc_mark_queue_obj
                         2 /home/schmrlng/code/oss/julia-master/src/gc.c:1430; gc_try_setmark
                      7  /home/schmrlng/code/oss/julia-master/src/gc.c:2473; _jl_gc_collect
                       3 /home/schmrlng/code/oss/julia-master/src/gc.c:1809; gc_mark_loop
                        3 /home/schmrlng/code/oss/julia-master/src/gc.c:1521; gc_mark_scan_objarray
                         3 /home/schmrlng/code/oss/julia-master/src/gc.c:1430; gc_try_setmark
                       3 /home/schmrlng/code/oss/julia-master/src/gc.c:1874; gc_mark_loop
                        3 /home/schmrlng/code/oss/julia-master/src/gc.c:1430; gc_try_setmark
                       1 /home/schmrlng/code/oss/julia-master/src/gc.c:2035; gc_mark_loop
                      1  /home/schmrlng/code/oss/julia-master/src/gc.c:2567; _jl_gc_collect
                       1 /home/schmrlng/code/oss/julia-master/src/gc.c:1222; gc_sweep_other
                        1 /home/schmrlng/code/oss/julia-master/src/gc.c:887; sweep_malloced_arrays
                      22 /home/schmrlng/code/oss/julia-master/src/gc.c:2570; _jl_gc_collect
                       22 /home/schmrlng/code/oss/julia-master/src/gc.c:1284; gc_sweep_pool
                        22 /home/schmrlng/code/oss/julia-master/src/gc.c:1209; sweep_pool_pagetable
                         1  /home/schmrlng/code/oss/julia-master/src/gc.c:1173; sweep_pool_pagetable1
                         21 /home/schmrlng/code/oss/julia-master/src/gc.c:1180; sweep_pool_pagetable1
                          21 /home/schmrlng/code/oss/julia-master/src/gc.c:1160; sweep_pool_pagetable0
                           21 /home/schmrlng/code/oss/julia-master/src/gc.c:1140; sweep_pool_page
                            2  /home/schmrlng/code/oss/julia-master/src/gc.c:1035; sweep_page
                             1 /home/schmrlng/code/oss/julia-master/src/gc.c:912; reset_page
                              1 /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71; memset
                             1 /home/schmrlng/code/oss/julia-master/src/gc.c:915; reset_page
                            1  /home/schmrlng/code/oss/julia-master/src/gc.c:1054; sweep_page
                             1 /home/schmrlng/code/oss/julia-master/src/gc.h:388; page_pfl_beg
                            13 /home/schmrlng/code/oss/julia-master/src/gc.c:1072; sweep_page
                            5  /home/schmrlng/code/oss/julia-master/src/gc.c:1078; sweep_page
                    9  /home/schmrlng/code/oss/julia-master/src/gc.c:959; jl_gc_pool_alloc
                    1  /home/schmrlng/code/oss/julia-master/src/gc.c:982; jl_gc_pool_alloc
                     1 /home/schmrlng/code/oss/julia-master/src/gc.h:383; gc_page_data
                    16 /home/schmrlng/code/oss/julia-master/src/gc.c:1000; jl_gc_pool_alloc
                    2  /home/schmrlng/code/oss/julia-master/src/gf.c:41; jl_invoke
87 unknown stackframe
                     10 ./REPL[2]:1; #f#3(::Rational{Int64}, ::Function)
                     1  /home/schmrlng/code/oss/julia-master/src/gc.c:945; jl_gc_pool_alloc
                     28 /home/schmrlng/code/oss/julia-master/src/gc.c:953; jl_gc_pool_alloc
                     3  /home/schmrlng/code/oss/julia-master/src/gc.c:979; jl_gc_pool_alloc
                     1  /home/schmrlng/code/oss/julia-master/src/gc.c:987; jl_gc_pool_alloc
                      1 /home/schmrlng/code/oss/julia-master/src/gc.h:436; page_metadata
                     2  /home/schmrlng/code/oss/julia-master/src/gc.c:1000; jl_gc_pool_alloc
                  5   /home/schmrlng/code/oss/julia-master/src/gc.c:945; jl_gc_pool_alloc
                  1   /home/schmrlng/code/oss/julia-master/src/gc.c:959; jl_gc_pool_alloc
                  1   /home/schmrlng/code/oss/julia-master/src/gc.c:965; jl_gc_pool_alloc
                  5   /home/schmrlng/code/oss/julia-master/src/gc.c:982; jl_gc_pool_alloc
                   5 /home/schmrlng/code/oss/julia-master/src/gc.h:383; gc_page_data
                  1   /home/schmrlng/code/oss/julia-master/src/gc.c:1000; jl_gc_pool_alloc
                4   ./range.jl:567; iterate
                 4 ./int.jl:49; <
            3   /home/schmrlng/code/oss/julia-master/src/gc.c:2659;
            46  /home/schmrlng/code/oss/julia-master/src/gc.c:1000; jl_gc_pool_alloc
            1   /home/schmrlng/code/oss/julia-master/src/gf.c:2198;
11 unknown stackframe
             1 ./REPL[3]:1; #g#4(::Rational{Int64}, ::Function)
             6 ./REPL[3]:1; g()
             2 /home/schmrlng/code/oss/julia-master/src/gc.c:941; jl_gc_pool_alloc
1 unknown stackframe

julia> @noinline g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> h(); Profile.clear(); @time @profile h(); Profile.print(C=true)
  0.173897 seconds (20.00 M allocations: 610.352 MiB, 15.97% gc time)
173 /home/schmrlng/code/oss/julia-master/src/task.c:271; start_task
 173 /home/schmrlng/code/oss/julia-master/src/julia.h:1558; jl_apply
  173 /home/schmrlng/code/oss/julia-master/src/gf.c:2196; jl_apply_generic
   173 ./task.jl:259; (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})()
    173 ...hmrlng/code/oss/julia-master/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:117; macro expansion
     173 ...hmrlng/code/oss/julia-master/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
      173 /home/schmrlng/code/oss/julia-master/src/gf.c:2196; jl_apply_generic
       173 ./boot.jl:319; eval(::Module, ::Any)
        173 /home/schmrlng/code/oss/julia-master/src/builtins.c:622; jl_toplevel_eval_in
         173 /home/schmrlng/code/oss/julia-master/src/toplevel.c:765; jl_toplevel_eval_flex
          173 /home/schmrlng/code/oss/julia-master/src/toplevel.c:812; jl_toplevel_eval_flex
           173 /home/schmrlng/code/oss/julia-master/src/gf.c:1841; jl_fptr_trampoline
            161 ./none:0; top-level scope
             161 ./util.jl:156; top-level scope
              161 ...e/oss/julia-master/usr/share/julia/stdlib/v1.1/Profile/src/Profile.jl:25; macro expansion
               161 ./REPL[4]:1; h()
                158 ./REPL[6]:1; g()
                 3   /home/schmrlng/code/oss/julia-master/src/gc.c:941; jl_gc_pool_alloc
                 1   /home/schmrlng/code/oss/julia-master/src/gc.c:967; jl_gc_pool_alloc
                  1 /home/schmrlng/code/oss/julia-master/src/gc.h:383; gc_page_data
                 1   /home/schmrlng/code/oss/julia-master/src/gc.c:972; jl_gc_pool_alloc
                 2   /home/schmrlng/code/oss/julia-master/src/gc.c:978; jl_gc_pool_alloc
                 1   /home/schmrlng/code/oss/julia-master/src/gc.c:1000; jl_gc_pool_alloc
                 1   /home/schmrlng/code/oss/julia-master/src/gf.c:41; jl_invoke
111 unknown stackframe
                  33 ./REPL[6]:1; #g#5(::Rational{Int64}, ::Function)
                   20 ./none:0; #f
                    8 ./REPL[2]:1; #f#3(::Rational{Int64}, ::Function)
                  2  /home/schmrlng/code/oss/julia-master/src/gc.c:953; jl_gc_pool_alloc
                  28 /home/schmrlng/code/oss/julia-master/src/gc.c:955; jl_gc_pool_alloc
                   28 /home/schmrlng/code/oss/julia-master/src/gc.c:2638; jl_gc_collect
                    2  /home/schmrlng/code/oss/julia-master/src/gc.c:2567; _jl_gc_collect
                     2 /home/schmrlng/code/oss/julia-master/src/gc.c:1222; gc_sweep_other
                      2 /home/schmrlng/code/oss/julia-master/src/gc.c:888; sweep_malloced_arrays
                    26 /home/schmrlng/code/oss/julia-master/src/gc.c:2570; _jl_gc_collect
                     26 /home/schmrlng/code/oss/julia-master/src/gc.c:1284; gc_sweep_pool
                      26 /home/schmrlng/code/oss/julia-master/src/gc.c:1209; sweep_pool_pagetable
                       1  /home/schmrlng/code/oss/julia-master/src/gc.c:1173; sweep_pool_pagetable1
                       25 /home/schmrlng/code/oss/julia-master/src/gc.c:1180; sweep_pool_pagetable1
                        25 /home/schmrlng/code/oss/julia-master/src/gc.c:1160; sweep_pool_pagetable0
                         25 /home/schmrlng/code/oss/julia-master/src/gc.c:1140; sweep_pool_page
                          1  /home/schmrlng/code/oss/julia-master/src/gc.c:1035; sweep_page
                           1 /home/schmrlng/code/oss/julia-master/src/gc.c:915; reset_page
                          1  /home/schmrlng/code/oss/julia-master/src/gc.c:1054; sweep_page
                           1 /home/schmrlng/code/oss/julia-master/src/gc.h:388; page_pfl_beg
                          15 /home/schmrlng/code/oss/julia-master/src/gc.c:1072; sweep_page
                          5  /home/schmrlng/code/oss/julia-master/src/gc.c:1078; sweep_page
                          2  /home/schmrlng/code/oss/julia-master/src/gc.c:1084; sweep_page
                          1  /home/schmrlng/code/oss/julia-master/src/gc.c:1100; sweep_page
                  1  /home/schmrlng/code/oss/julia-master/src/gc.c:959; jl_gc_pool_alloc
                  5  /home/schmrlng/code/oss/julia-master/src/gc.c:964; jl_gc_pool_alloc
                  1  /home/schmrlng/code/oss/julia-master/src/gc.c:965; jl_gc_pool_alloc
                  3  /home/schmrlng/code/oss/julia-master/src/gc.c:983; jl_gc_pool_alloc
                  1  /home/schmrlng/code/oss/julia-master/src/gc.c:987; jl_gc_pool_alloc
                   1 /home/schmrlng/code/oss/julia-master/src/gc.h:432; page_metadata
                3   ./range.jl:567; iterate
                 3 ./int.jl:49; <
12 unknown stackframe
             1 ./REPL[6]:1; #g#5(::Rational{Int64}, ::Function)
             3 ./REPL[6]:1; g()

julia> g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> h(); Profile.clear(); @time @profile h(); Profile.print(C=true)
  0.026552 seconds
26 /home/schmrlng/code/oss/julia-master/src/task.c:271; start_task
 26 /home/schmrlng/code/oss/julia-master/src/julia.h:1558; jl_apply
  26 /home/schmrlng/code/oss/julia-master/src/gf.c:2196; jl_apply_generic
   26 ./task.jl:259; (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})()
    26 ...hmrlng/code/oss/julia-master/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:117; macro expansion
     26 ...hmrlng/code/oss/julia-master/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
      26 /home/schmrlng/code/oss/julia-master/src/gf.c:2196; jl_apply_generic
       26 ./boot.jl:319; eval(::Module, ::Any)
        26 /home/schmrlng/code/oss/julia-master/src/builtins.c:622; jl_toplevel_eval_in
         26 /home/schmrlng/code/oss/julia-master/src/toplevel.c:765; jl_toplevel_eval_flex
          26 /home/schmrlng/code/oss/julia-master/src/toplevel.c:812; jl_toplevel_eval_flex
           26 /home/schmrlng/code/oss/julia-master/src/gf.c:1841; jl_fptr_trampoline
            24 ./none:0; top-level scope
             24 ./util.jl:156; top-level scope
              24 ...e/oss/julia-master/usr/share/julia/stdlib/v1.1/Profile/src/Profile.jl:25; macro expansion
               24 ./REPL[4]:1; h()
                24 ./REPL[8]:1; g()
2 unknown stackframe

Redefining @noinline g(; a=1//2) = f(a=a) does improve the situation (middle profile) but in order to get allocations to 0 it seems that g(; a=1//2) = f(a=a) must be inlined (last profile).

@schmrlng
Copy link
Contributor Author

It's very low priority for me, but this is still an issue with the 1.0.1 RC. Maybe someone with write access can add the "performance" label to keep this issue from getting lost?

@KristofferC KristofferC added performance Must go faster compiler:inference Type inference labels Sep 27, 2018
@schmrlng
Copy link
Contributor Author

This case appears to be fixed in 1.1.1 and 1.2.0-rc2.0; 1.2.0-rc2.0 benchmarks a bit slower but that's only because it actually respects the @noinline designation of f (arguably a bug in 1.1.1):

  | | |_| | | | (_| |  |  Version 1.1.1 (2019-05-16)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

julia> using BenchmarkTools

julia> @noinline f(; a=1) = a
f (generic function with 1 method)

julia> g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> @benchmark g()
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.452 ns (0.00% GC)
  median time:      1.698 ns (0.00% GC)
  mean time:        1.886 ns (0.00% GC)
  maximum time:     31.025 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @code_native g()
	.text
; ┌ @ REPL[4]:1 within `g'
	movabsq	$140648840456872, %rax  # imm = 0x7FEB5C2C6AA8
	vmovups	(%rax), %xmm0
	vmovups	%xmm0, (%rdi)
	movq	%rdi, %rax
	retq
	nopw	%cs:(%rax,%rax)
; └
  | | |_| | | | (_| |  |  Version 1.2.0-rc2.0 (2019-07-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Julia Version 1.2.0-rc2.0
Commit 9248bf7687 (2019-07-08 19:42 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

julia> using BenchmarkTools

julia> @noinline f(; a=1) = a
f (generic function with 1 method)

julia> g(; a=1//2) = f(a=a)
g (generic function with 1 method)

julia> @benchmark g()
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.529 ns (0.00% GC)
  median time:      3.077 ns (0.00% GC)
  mean time:        3.316 ns (0.00% GC)
  maximum time:     33.103 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @code_native g()
	.text
; ┌ @ REPL[4]:1 within `g'
	pushq	%rbx
	subq	$16, %rsp
	movq	%rdi, %rbx
; │┌ @ REPL[4]:1 within `#g#4'
; ││┌ @ none within `#f'
	movabsq	$140334513157312, %rsi  # imm = 0x7FA22CCE74C0
	movabsq	$"#f#3", %rax
	movq	%rsp, %rdi
	callq	*%rax
; │└└
	vmovups	(%rsp), %xmm0
	vmovups	%xmm0, (%rbx)
	movq	%rbx, %rax
	addq	$16, %rsp
	popq	%rbx
	retq
	nopw	%cs:(%rax,%rax)
; └

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants