Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to hoist array loads from loop for certain element types #23042

Closed
KristofferC opened this issue Jul 30, 2017 · 2 comments
Closed

Failure to hoist array loads from loop for certain element types #23042

KristofferC opened this issue Jul 30, 2017 · 2 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@KristofferC
Copy link
Member

KristofferC commented Jul 30, 2017

The code below compares a loop over an array vs a loop of the array wrapped in a struct.

struct Foo{T<:Number, A<:AbstractMatrix{T}}
    data::A
end

Foo(data::AbstractMatrix) = Foo{eltype(data), typeof(data)}(data)


function test(a, b)
    for i in 1:length(a.data)
        @inbounds a.data[i] = b.data[i]
    end
    a
end

function test2(a, b)
    for i in 1:length(a)
        @inbounds a[i] = b[i]
    end
    a
end

using BenchmarkTools

for T in (Float32, Float64, Complex{Float32}, Complex{Float64})
    b = rand(T, 128, 128)
    a = similar(b)
    t_arr = BenchmarkTools.time(minimum(@benchmark test2($a, $b)))
    t_foo = BenchmarkTools.time(minimum(@benchmark test($(Foo(a)), $(Foo(b)))))

    @printf "%s %8.5f μs  %8.5f μs\n" lpad(string(T), 16) 10.0^(-3)*t_foo 10.0^(-3)*t_arr

end

On 0.6 this gives (first column is wrapped array):

         Float32  2.49689 μs   2.46456 μs
         Float64  5.91617 μs   5.86180 μs
Complex{Float32}  5.95367 μs   5.96100 μs
Complex{Float64} 25.24600 μs  16.05200 μs

so for Complex{Float64} it is suddenly slower with the wrapped array.

On 0.7:

         Float32  2.45578 μs   2.48711 μs
         Float64  6.28000 μs   6.25880 μs
Complex{Float32} 16.33900 μs   5.82650 μs
Complex{Float64} 25.21700 μs  16.17500 μs

So on 0.7 this also happens for Complex{Float32} which is why I mark this as a regression.

Reported at https://discourse.julialang.org/t/slower-indexing-on-custom-type-for-complex-64/5131

@KristofferC KristofferC added potential benchmark Could make a good benchmark in BaseBenchmarks performance Must go faster regression Regression in behavior compared to a previous version labels Jul 30, 2017
@yuyichao
Copy link
Contributor

Looks like LLVM optimization order, memcpy tbaa issue....

@KristofferC
Copy link
Member Author

Fixed

         Float32  2.41989 μs   2.41467 μs
         Float64  5.75717 μs   5.75667 μs
Complex{Float32}  6.74100 μs   6.71540 μs
Complex{Float64} 15.24900 μs  15.18300 μs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

2 participants