Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in Char -> String causing problems with Cholesky decomposition #557

Closed
dawbarton opened this issue Aug 24, 2018 · 9 comments · Fixed by JuliaLang/julia#28873
Labels
regression Regression in behavior compared to a previous version

Comments

@dawbarton
Copy link

A significant performance regression was noted on Discourse with multi-variate normal random number generation with Distributions.jl. After digging through the code I found that the performance difference was due to regressions in the performance of the Cholesky decomposition; specifically on v0.6.4 we have

julia> c = cholfact(eye(10)) ;

julia> @btime $(c)[:UL]
  68.935 ns (2 allocations: 48 bytes)

whereas on v1.0.0 we have

julia> c = cholesky(Matrix{Float64}(I, 10, 10)) ;

julia> @btime $(c).UL
  772.167 ns (5 allocations: 208 bytes)

Digging further, the problem appears to be the line Symbol(Cuplo) == :U in getproperty for the Cholesky decomposition. This creates a Symbol from a Char which calls Symbol(string(x...)).

Looking at string('U') on v0.6.4 gives

julia> @btime string('U')
  16.119 ns (1 allocation: 32 bytes)

whereas on v1.0.0

julia> @btime string('U')
  689.019 ns (4 allocations: 192 bytes)

The code corresponding to string(Char) appears to have changed significantly between v0.6.4 and v1.0.0 so that's as far as I can go...

@andreasnoack
Copy link
Member

So JuliaLang/julia#28873 should fix this. I'm getting a 1700x speedup for your example.

@KristofferC
Copy link
Member

KristofferC commented Aug 24, 2018

Looks like JuliaLang/julia#28149 introduced the regression where a method for string(::Char) was promptly deleted. cc @bkamins

@dawbarton
Copy link
Author

@andreasnoack thanks very much for the quick fix to cholesky! It would be nice to have a fix for the underlying issue as well since it might well crop up in other performance sensitive places.

@dawbarton
Copy link
Author

dawbarton commented Aug 24, 2018

One further oddity with this Char -> String issue

julia> a = 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> @btime string($a)
  689.523 ns (4 allocations: 192 bytes)
"a"

julia> @btime string("", $a)
  153.136 ns (4 allocations: 192 bytes)
"a"

(BTW: I think the "strings" label is probably more appropriate than "linear algebra" for this issue as the Cholesky performance regression is a symptom rather than a cause.)

@KristofferC
Copy link
Member

KristofferC commented Aug 24, 2018

FWIW, the majority of the slowness here is from the dynamic dispatch:

           636 ./strings/io.jl:143; string
            636 ./strings/io.jl:112; print_to_string
....
            477 /Users/kristoffer/julia/src/gf.c:56; jl_invoke
              477 /Users/kristoffer/julia/src/./julia.h:1552; jl_apply
               474 /Users/kristoffer/julia/src/gf.c:2180; jl_apply_generic
                1   /Users/kristoffer/julia/src/gf.c:0; jl_lookup_generic_
                7   /Users/kristoffer/julia/src/gf.c:2088; jl_lookup_generic_
                1   /Users/kristoffer/julia/src/gf.c:2122; jl_lookup_generic_
                1   /Users/kristoffer/julia/src/gf.c:2123; jl_lookup_generic_
                463 /Users/kristoffer/julia/src/gf.c:2134; jl_lookup_generic_
                 2   /Users/kristoffer/julia/src/./julia_internal.h:883; jl_typemap_assoc_exact
                 461 /Users/kristoffer/julia/src/./julia_internal.h:887; jl_typemap_assoc_exact
                  1   /Users/kristoffer/julia/src/typemap.c:826; jl_typemap_level_assoc_exact
                  3   /Users/kristoffer/julia/src/typemap.c:832; jl_typemap_level_assoc_exact
                   1 /Users/kristoffer/julia/src/typemap.c:252; mtcache_hash_lookup
                    1 /Users/kristoffer/julia/src/typemap.c:184; jl_intref
                   1 /Users/kristoffer/julia/src/typemap.c:258; mtcache_hash_lookup
                   1 /Users/kristoffer/julia/src/typemap.c:267; mtcache_hash_lookup
                  457 /Users/kristoffer/julia/src/typemap.c:833; jl_typemap_level_assoc_exact
                   457 /Users/kristoffer/julia/src/./julia_internal.h:887; jl_typemap_assoc_exact
                    2   /Users/kristoffer/julia/src/typemap.c:821; jl_typemap_level_assoc_exact
                    1   /Users/kristoffer/julia/src/typemap.c:823; jl_typemap_level_assoc_exact
                    453 /Users/kristoffer/julia/src/typemap.c:838; jl_typemap_level_assoc_exact

Just copy pasting the definition improves performance 7x.

julia> function print_to_string2(xs...; env=nothing)
           if isempty(xs)
               return ""
           end
           # specialized for performance reasons
           s = IOBuffer(sizehint=Base.tostr_sizehint(xs[1]))
           if env !== nothing
               env_io = IOContext(s, env)
               for x in xs
                   print(env_io, x)
               end
           else
               for x in xs
                   print(s, x)
               end
           end
           String(resize!(s.data, s.size))
       end
print_to_string2 (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime print_to_string2('a')
  113.243 ns (4 allocations: 192 bytes)

julia> @btime Base.print_to_string('a')
  730.310 ns (4 allocations: 192 bytes)

Ref JuliaLang/julia#28683

@bkamins
Copy link
Member

bkamins commented Aug 24, 2018

Actually my testing shows that the offending line is:

    s = IOBuffer(sizehint=tostr_sizehint(xs[1]))

as in the old implementation we had s = IOBuffer().

When I changed this I thought it is not important, but it seems it is crucial :).

We should change line:

tostr_sizehint(x) = 0

to something like:

tostr_sizehint(x) = 8

to ensure minimum buffer size greater than 0 and all works as previously.

@KristofferC
Copy link
Member

KristofferC commented Aug 24, 2018

It helps but still

julia> @btime Base.print_to_string('a')
  712.584 ns (4 allocations: 192 bytes)
"a"

julia> @eval Base tostr_sizehint(x) = 8
tostr_sizehint (generic function with 4 methods)

julia> @btime Base.print_to_string('a')
  256.696 ns (3 allocations: 160 bytes)
"a"

is significantly slower than in 0.6. The dynamic dispatch penalty is still large though since after "refreshing" the function by reevaluating we have

julia> @btime Base.print_to_string('a')
  98.597 ns (4 allocations: 176 bytes)
"a"

@bkamins
Copy link
Member

bkamins commented Aug 24, 2018

Interesting - I presumably saw the second test in my execution. Can you explain why reevaluating @btime speeds things up because I do not know the reason.

Nevertheless, I think that the slowness comes from the keyword argument env. When I specialize print_to_string not to take a keyword argument things seem to work fast on the first run. Can you confirm this (I stopped believing in my @btime 😄).

If yes then two changes would be in order:

  • increase size hint (8 seems to work fine for Char, but maybe we should increase it in general to some minimal value greater than 0);
  • have print_to_string not accept env and implement string_with_env separately (to make it create an appropriate IOContext).

What do you think?

EDIT: and do you know the reason of dynamic dispatch in this case? It should dispatch statically.

@KristofferC
Copy link
Member

Nevertheless, I think that the slowness comes from the keyword argument env. When I specialize print_to_string not to take a keyword argument things seem to work fast on the first run. Can you confirm this (I stopped believing in my @Btime

After JuliaLang/julia#28683 I only trust the result from a freshly recompiled sysimg.

With

diff --git a/base/strings/io.jl b/base/strings/io.jl
index 64b9a613c0..75b79c02fa 100644
--- a/base/strings/io.jl
+++ b/base/strings/io.jl
@@ -103,31 +103,35 @@ function sprint(f::Function, args...; context=nothing, sizehint::Integer=0)
     String(resize!(s.data, s.size))
 end

-tostr_sizehint(x) = 0
+tostr_sizehint(x) = 8
 tostr_sizehint(x::AbstractString) = lastindex(x)
 tostr_sizehint(x::Float64) = 20
 tostr_sizehint(x::Float32) = 12

-function print_to_string(xs...; env=nothing)
+function print_to_string(xs...)
     if isempty(xs)
         return ""
     end
     # specialized for performance reasons
     s = IOBuffer(sizehint=tostr_sizehint(xs[1]))
-    if env !== nothing
-        env_io = IOContext(s, env)
-        for x in xs
-            print(env_io, x)
-        end
-    else
-        for x in xs
-            print(s, x)
-        end
+    for x in xs
+        print(s, x)
     end
     String(resize!(s.data, s.size))
 end

-string_with_env(env, xs...) = print_to_string(xs...; env=env)
+function string_with_env(env, xs...)
+    if isempty(xs)
+        return ""
+    end
+    # specialized for performance reasons
+    s = IOBuffer(sizehint=tostr_sizehint(xs[1]))
+    env_io = IOContext(s, env)
+    for x in xs
+        print(env_io, x)
+    end
+    String(resize!(s.data, s.size))
+end

 """
     string(xs...)

I am getting

julia> @btime Base.print_to_string('a')
  77.522 ns (3 allocations: 160 bytes)
"a"

julia> @btime Base.print_to_string("", 'a')
  165.129 ns (4 allocations: 192 bytes)
"a"

julia> @btime Base.print_to_string('b', 'a')
  89.995 ns (3 allocations: 160 bytes)
"ba"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants