Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of splatting a SVector #361

Closed
evetion opened this issue Jan 20, 2018 · 11 comments
Closed

Performance of splatting a SVector #361

evetion opened this issue Jan 20, 2018 · 11 comments
Labels
fix-in-base Fix needs some work in Base performance runtime performance

Comments

@evetion
Copy link

evetion commented Jan 20, 2018

It seems that splatting could be very fast, as the size is known beforehand. On julia 0.6.2 however, the following code;

using BenchmarkTools, Compat
using StaticArrays

const array = collect(1:100)

const index_tuple = (2, 1)
const index_simple = [2, 1]
const index_svector = SVector{2, Int}(2, 1)

println("\nIndexing takes:")
@btime array[2, 1]
println("\nIndexing with splatting a tuple takes:")
@btime array[index_tuple...]
println("\nIndexing with splatting an array takes:")
@btime array[index_simple...]
println("\nIndexing with splatting a SVector takes:")
@btime array[index_svector...]
println("\nIndexing with converted SVector takes:")
@btime array[Tuple(index_svector)...]

gives the following results:

Indexing takes:
  1.694 ns (0 allocations: 0 bytes)

Indexing with splatting a tuple takes:
  1.419 ns (0 allocations: 0 bytes)

Indexing with splatting an array takes:
  64.604 ns (0 allocations: 0 bytes)

Indexing with splatting a SVector takes:
  570.745 ns (8 allocations: 384 bytes)

Indexing with converted SVector takes:
  3.388 ns (0 allocations: 0 bytes)

It seems that indexing with an SVector is several factors slower than normal indexing in this case.
I've used a workaround by converting an SVector to a NTuple first and wrapping it in an CartesianIndex for this specific use case, which brings the performance back within normal bounds.

I'm not sure where to optimize this, or whether this is some compiler specific issues that we can't workaround. At the least a warning in the documentation could be given not to splat SVectors (and related types).

@tkoolen
Copy link
Contributor

tkoolen commented Jan 21, 2018

I'm just a curious bystander, but does your actual use case involve a possibly multi-dimensional array? If you know it's a Vector, then (2, n) is not a valid index into array for n not equal to 1, and you could just throw an error if n != 1 and then use array[index_svector[1]]. array[Tuple(index_svector)...] is another option by the way.

@evetion
Copy link
Author

evetion commented Jan 22, 2018

@tkoolen I've changed the example to the simpler Tuple()... which seems just as fast. So the question would be if we could convert to a Tuple before splatting to increase performance.

With regards to my specific use case (out of scope for this issue): It does involve multi-dimensional arrays, which I don't want to reshape and thus sometimes index Vectors with indices of multi-dimensional arrays of the same length. I'd like it to work for any N dimensional array, so I'd rather not write any dimension specific code. Splatting is a very good fit for this use case.

@antoine-levitt
Copy link
Contributor

Just ran into that. Apparently splatting goes through iteration, so maybe static arrays don't implement fully the iteration protocol? I tried monkey-patching the only missing method that I've seen, with Base.eltype(a::Tuple{Int64,Tuple{SOneTo{2},Int64}}) = Int64, but that doesn't seem to do anything for performance.

@c42f
Copy link
Member

c42f commented Jul 25, 2019

Splatting of SVector lowers to a call to Core._apply and it appears that the compiler gets stuck there for user defined types. Compare:

julia> foo(A, I) = A[I...]
foo (generic function with 1 method)

julia> foo(rand(10,10), SVector(1,2))
0.9605820417688018

julia> @code_warntype foo(rand(10,10), SVector(1,2))
Body::Float64
1%1 = (Core.tuple)(A)::Tuple{Array{Float64,2}}%2 = (Core._apply)(Base.getindex, %1, I)::Float64
└──      return %2

julia> @code_warntype foo(rand(10,10), (1,2))
Body::Float64
1%1 = (getfield)(I, 1)::Int64%2 = (getfield)(I, 2)::Int64%3 = (Base.arrayref)(true, A, %1, %2)::Float64
└──      return %3

I guess this probably needs changes in the compiler.

@c42f
Copy link
Member

c42f commented Jul 25, 2019

Related

It looks like people have suggested both tweaking lowering and teaching inference more about _apply. Seems to me that it's better in this case to tweak lowering so that f(x...) turns into something like Core._apply(f, Base.splatarg(x)).

Where Base.splatarg(x) = x by default and we could override Base.splatarg(v::StaticVector) = Tuple(v) or some such.

[Edit: Base.splat is already taken... changed it to splatarg]

@andyferris
Copy link
Member

andyferris commented Jul 25, 2019

I have to admit, I don't think I understand _apply, or the way splatting of tuples works. In the past, because Pair{T1, T2} seemed to splat well, I assumed that inference speculatively unrolled the iteration loop when splatting, or some such, but apparently not?

The splat lowering idea is interesting. I also wonder if we can give the compiler the information it needs - for example, could be safe to see if inference can infer Base.length(x) as a Const?

@c42f
Copy link
Member

c42f commented Jul 26, 2019

I suspect we can't get around this in StaticArrays itself, we really need compiler support either in lowering or inference. See the hardcoded test at

https://github.com/JuliaLang/julia/blob/3c8119f4c1fe3a83a488e04caa834f26432317a2/base/compiler/ssair/inlining.jl#L831

@c42f c42f added fix-in-base Fix needs some work in Base performance runtime performance labels Jul 31, 2019
@lassepe
Copy link

lassepe commented Aug 20, 2019

I'm not sure whether this is a useful contribution because it's the first @generated function that I have written in Julia but maybe this is helpful for some people.

You could define something like this:

@generated function static_splat(f::Function, v::SVector{S, T}) where {S, T}
    expr = Expr(:call, :f)
    for i in 1:S
        push!(expr.args, :(v[$i]))
    end
    return expr
end

And perform splatting for cases where the splatted vector is the only argument to the function like in this example:

V = @SVector [(@SVector [1,2]), (@SVector[3,4])]
V_matrix = static_splat(hcat, B)

For some cases this is good enough. I don't know enough about meta programming in Julia to come up with a better way to do this. Maybe there is also a nice Macro-way of doing this. Also, maybe simply doing Tuple(V)... is often good enough and more readable.

@c42f
Copy link
Member

c42f commented Aug 21, 2019

Hi @lassepe! That generated function looks correct, though it's not really necessary to use a generated function for this. I think the better workaround is to just use Tuple(V)... for now.

If you do want to automate the Tuple workaround, the appropriate way is to transform the syntax of the function call. That means using a macro:

splat_args(x) = x
splat_args(x::StaticVector) = Tuple(x)

macro splat(ex)
    # A fun exercise for the reader :-)
end

@splat hcat(a, bs..., c, ds...)
# transform that hcat syntax into 
hcat(a, splat_args(bs)..., c, splat_args(ds)...)

This works for any type (other types can easily extend splat_args) and for any function call with splatting in it.

@antoine-levitt
Copy link
Contributor

Is this still relevant after JuliaLang/julia#36684?

@c42f
Copy link
Member

c42f commented Jul 21, 2020

Indeed that upstream fix is the one we needed :-)

@c42f c42f closed this as completed Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix-in-base Fix needs some work in Base performance runtime performance
Projects
None yet
Development

No branches or pull requests

6 participants