Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bsxfun performance issue #2991

Closed
lindahua opened this issue May 2, 2013 · 8 comments
Closed

bsxfun performance issue #2991

lindahua opened this issue May 2, 2013 · 8 comments
Labels
performance Must go faster

Comments

@lindahua
Copy link
Contributor

lindahua commented May 2, 2013

Issue #1838 mentioned the performance issue of bsxfun when there are more than three arguments. However, I found that when in common cases where there are only two arguments, bsxfun is also considerably slower.

Here is the code to show this issue:

function subcol_forloop{T<:Real}(x::Matrix{T}, y::Vector{T})
    m = size(x, 1)
    n = size(x, 2)

    r = similar(x)
    for j = 1 : n
        for i = 1 : m
            r[i,j] = x[i,j] - y[i]
        end
    end
    r
end

subcol_bsxfun{T<:Real}(x::Matrix{T}, y::Vector{T}) = bsxfun(-, x, y)

a = rand(10^4, 10^4)
b = rand(10^4)

# warming up
subcol_forloop(a, b)
subcol_bsxfun(a, b)

@time subcol_forloop(a, b)   # this takes 0.48 sec
@time subcol_bsxfun(a, b)   # this takes 1.43 sec

The output on my macbook pro (using latest Julia commit) is

elapsed time: 0.482314519 seconds
elapsed time: 1.435632678 seconds

So, bsxfun is 3x slower. Considering that there are 10^8 elements in a, the overhead of bsxfun in figuring out shapes is unlikely to be the main culprit here. It is likely that the inner loop of bsxfun is not as efficient as the hand crafted one.

@lindahua
Copy link
Contributor Author

lindahua commented May 2, 2013

I tested this in MATLAB as

tic; bsxfun(@minus, a, b); toc

This takes 0.4419 second, comparable but slightly faster than my hand crafted for loop.

@dmbates
Copy link
Member

dmbates commented May 2, 2013

Could the order of accessing elements of a be the problem with the Julia bsxfun in this case?

Also, have you tried a comprehension to see how it performs? For me

function subcol_compr{T<:Real}(a::Matrix{T},b::Vector{T})
    [a[i,j] - b[i] for i in 1:size(a,1), j in 1:size(a,2)]
end

is as fast as the for loop. Matlab programmers may instinctively use bsxfun - I don't know because it has been decades since I have programmed in Matlab - but the julian approach to many such calculations should be to use a comprehension I would think.

@lindahua
Copy link
Contributor Author

lindahua commented May 2, 2013

The list comprehension version is slightly faster than the for-loop.

This is not big deal for me. I can devectorize such calculations (or use comprehensions) if they sit in the way of a critical path. However, it would still be nice to have a bsxfun that is as efficient, as in many cases bsxfun is more concise.

@toivoh
Copy link
Contributor

toivoh commented May 2, 2013

I also would like to have a good bsxfun, and more importantly .+ .* ./ etc. that do bsxfun for the given operator.
Looking at the current implementation, I'm surprised that it is only 3x slower. I think that these broadcasting operations should probably be implemented with staged functions or something similar, that can generate code similar to the hand crafted special cases. bsxfun will of course still suffer from not knowing the operation until invocation time, but .* shouldn't have to. I might take a stab at it.

@bsxfan
Copy link

bsxfan commented May 7, 2013

The comprehension solution is somewhat inconvenient, because it can surprise you with its choice of result type. The function subcol_compr() as declared above, returns a Matrix as one would want and expect, but the same code without type specifiers returns a clunky Array{Any}:

julia> A = randn(2,3)
julia> b = randn(2)
julia> typeof([A[i,j] - b[i] for i in 1:size(A,1), j in 1:size(A,2)])
Array{Any,2}

In contrast, bsxfun returns a Matrix{Float64} as desired:

julia> typeof(bsxfun(-,A,b))
Array{Float64,2}

@lindahua
Copy link
Contributor Author

lindahua commented May 7, 2013

@bsxfan if you put the comprehension statement inside a function, it would return an instance of Array{Float64, 2}. That is

function subcol_compr{T<:Real}(a::Matrix{T},b::Vector{T})
    [a[i,j] - b[i] for i in 1:size(a,1), j in 1:size(a,2)]
end
A = randn(2, 3)
b = randn(2)
subcol_compr(A, b)  # this returns Array{Float64, 2}

The thing is that Julia treats global variable differently in terms of type inference. So you see different results when you write the list comprehension at top level and within a function. Or, you can make it work with global variables in the following way

Float64[a[i,j] - b[i] for i in 1:size(a,1), j in 1:size(a,2)]

This is explained in the Notes here: http://docs.julialang.org/en/release-0.1/manual/arrays/#comprehensions

@JeffBezanson
Copy link
Member

@toivoh worth closing this?

@toivoh
Copy link
Contributor

toivoh commented Jun 14, 2013

Yes, this should be fixed with broadcast.

@toivoh toivoh closed this as completed Jun 14, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants