bsxfun performance issue #2991

lindahua · 2013-05-02T13:55:56Z

Issue #1838 mentioned the performance issue of bsxfun when there are more than three arguments. However, I found that when in common cases where there are only two arguments, bsxfun is also considerably slower.

Here is the code to show this issue:

function subcol_forloop{T<:Real}(x::Matrix{T}, y::Vector{T})
    m = size(x, 1)
    n = size(x, 2)

    r = similar(x)
    for j = 1 : n
        for i = 1 : m
            r[i,j] = x[i,j] - y[i]
        end
    end
    r
end

subcol_bsxfun{T<:Real}(x::Matrix{T}, y::Vector{T}) = bsxfun(-, x, y)

a = rand(10^4, 10^4)
b = rand(10^4)

# warming up
subcol_forloop(a, b)
subcol_bsxfun(a, b)

@time subcol_forloop(a, b)   # this takes 0.48 sec
@time subcol_bsxfun(a, b)   # this takes 1.43 sec

The output on my macbook pro (using latest Julia commit) is

elapsed time: 0.482314519 seconds
elapsed time: 1.435632678 seconds

So, bsxfun is 3x slower. Considering that there are 10^8 elements in a, the overhead of bsxfun in figuring out shapes is unlikely to be the main culprit here. It is likely that the inner loop of bsxfun is not as efficient as the hand crafted one.

The text was updated successfully, but these errors were encountered:

lindahua · 2013-05-02T14:00:41Z

I tested this in MATLAB as

tic; bsxfun(@minus, a, b); toc

This takes 0.4419 second, comparable but slightly faster than my hand crafted for loop.

dmbates · 2013-05-02T15:29:12Z

Could the order of accessing elements of a be the problem with the Julia bsxfun in this case?

Also, have you tried a comprehension to see how it performs? For me

function subcol_compr{T<:Real}(a::Matrix{T},b::Vector{T})
    [a[i,j] - b[i] for i in 1:size(a,1), j in 1:size(a,2)]
end

is as fast as the for loop. Matlab programmers may instinctively use bsxfun - I don't know because it has been decades since I have programmed in Matlab - but the julian approach to many such calculations should be to use a comprehension I would think.

lindahua · 2013-05-02T16:32:29Z

The list comprehension version is slightly faster than the for-loop.

This is not big deal for me. I can devectorize such calculations (or use comprehensions) if they sit in the way of a critical path. However, it would still be nice to have a bsxfun that is as efficient, as in many cases bsxfun is more concise.

toivoh · 2013-05-02T18:07:40Z

I also would like to have a good bsxfun, and more importantly .+ .* ./ etc. that do bsxfun for the given operator.
Looking at the current implementation, I'm surprised that it is only 3x slower. I think that these broadcasting operations should probably be implemented with staged functions or something similar, that can generate code similar to the hand crafted special cases. bsxfun will of course still suffer from not knowing the operation until invocation time, but .* shouldn't have to. I might take a stab at it.

bsxfan · 2013-05-07T10:27:51Z

The comprehension solution is somewhat inconvenient, because it can surprise you with its choice of result type. The function subcol_compr() as declared above, returns a Matrix as one would want and expect, but the same code without type specifiers returns a clunky Array{Any}:

julia> A = randn(2,3)
julia> b = randn(2)
julia> typeof([A[i,j] - b[i] for i in 1:size(A,1), j in 1:size(A,2)])
Array{Any,2}

In contrast, bsxfun returns a Matrix{Float64} as desired:

julia> typeof(bsxfun(-,A,b))
Array{Float64,2}

lindahua · 2013-05-07T11:50:09Z

@bsxfan if you put the comprehension statement inside a function, it would return an instance of Array{Float64, 2}. That is

function subcol_compr{T<:Real}(a::Matrix{T},b::Vector{T})
    [a[i,j] - b[i] for i in 1:size(a,1), j in 1:size(a,2)]
end
A = randn(2, 3)
b = randn(2)
subcol_compr(A, b)  # this returns Array{Float64, 2}

The thing is that Julia treats global variable differently in terms of type inference. So you see different results when you write the list comprehension at top level and within a function. Or, you can make it work with global variables in the following way

Float64[a[i,j] - b[i] for i in 1:size(a,1), j in 1:size(a,2)]

This is explained in the Notes here: http://docs.julialang.org/en/release-0.1/manual/arrays/#comprehensions

JeffBezanson · 2013-06-14T05:25:37Z

@toivoh worth closing this?

toivoh · 2013-06-14T10:55:24Z

Yes, this should be fixed with broadcast.

toivoh mentioned this issue May 14, 2013

Staged bsxfun and other broadcast operations #3100

Merged

toivoh closed this as completed Jun 14, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bsxfun performance issue #2991

bsxfun performance issue #2991

lindahua commented May 2, 2013

lindahua commented May 2, 2013

dmbates commented May 2, 2013

lindahua commented May 2, 2013

toivoh commented May 2, 2013

bsxfan commented May 7, 2013

lindahua commented May 7, 2013

JeffBezanson commented Jun 14, 2013

toivoh commented Jun 14, 2013

bsxfun performance issue #2991

bsxfun performance issue #2991

Comments

lindahua commented May 2, 2013

lindahua commented May 2, 2013

dmbates commented May 2, 2013

lindahua commented May 2, 2013

toivoh commented May 2, 2013

bsxfan commented May 7, 2013

lindahua commented May 7, 2013

JeffBezanson commented Jun 14, 2013

toivoh commented Jun 14, 2013