RFC: add SubVector (fast 1d slices) #3224

timholy · 2013-05-28T10:30:06Z

Here's a baby step towards making Arrays just pointers + size & stride information. With it I'm getting ~2x - 2.5x speedups over mapslices in a recent application that looks a lot like this:

    notd = setdiff([1:nd], d)
    indexes[d] = 1:size(A, d)
    for c in Counter(szA[notd])
        indexes[notd] = c
        s = SubVector(Aout, indexes...)
        filt!(b, a, s)
    end

Counter comes from Grid.jl.

ViralBShah · 2013-05-28T13:10:55Z

i have been wanting this for a while.

StefanKarpinski · 2013-05-28T13:43:03Z

Very nice. Here's some relevant work I did recently, showing that any linear transformation of coordinates can be expressed as a linear transformation of their linear index, integer-divided by various products of dimensions:

# (dimensions, coordinates) => linear index
function lin(d::Vector{Int}, c::Vector{Int})
    @assert length(d) == length(c)
    l, p = 0, 1
    for i=1:length(d)
        @assert 0 <= c[i] < d[i]
        l += c[i]*p
        p *= d[i]
    end
    return l
end

# dot product in terms of linear index and dimensions
function lindot{T}(x::Vector{T}, d::Vector{Int}, l::Int)
    @assert length(x) == length(d)
    t, p = x[1]*l, d[1]
    for i = 2:length(x)
        t += (x[i]-x[i-1]*d[i-1])*div(l,p)
        p *= d[i]
    end
    return t
end

# test that lindot actually gives the dot product
for _ in 1:100000
    d = rand(1:10, rand(1:10))
    c = [ rand(0:v-1) for v=d ]
    x = rand(-10:10, length(d))
    l = lin(d,c)
    @assert lindot(x,d,l) == dot(x,c)
end

Note that in real code, the lindot loop code would ideally be unrolled for each number of dimensions, and in the 2d case, in particular would be just a few fast integer operations. (@JeffBezanson: This would be a great application of staged functions.)

Since array views are all expressible as linear transformations of input coordinates to memory offsets, this means that all array views can be linearly indexed fairly efficiently. It also means that integer/range slices of views can also always be represented as views since both operations are linear.

We should take a look at SubArrays in light of this, make them a bit more general while making linear indexing efficient, and consider making more widespread use of them. I think we might want to make all integer/range slicing operations produce views by default.

JeffBezanson · 2013-05-28T16:15:25Z

This implementation is not safe --- a SubVector does not reference the original Array, so it can be GC'd out from under you. Hopefully we can make SubArray as fast as this.

timholy · 2013-05-28T16:30:09Z

Yes, you have to hold the array separately. I initially implemented a version that had a parent::Array{T} field, but for that version the immutable appeared to get constructed (rather than optimized away). Hence it was slow. I didn't know how to fix that.

An alternative I've used in the past is to use types (rather than immutables) and implement a reslice function. Because that won't have to allocate any memory, it can be reasonably fast. There may be some code doing that in Images.jl already.

If there's no good solution to the construction problem---or if this is simply too specialized on one dimension to end up in base---I can close this. It's already in Images.jl and can stay there quite happily :-). But I figured it was worth bringing up for discussion.

JeffBezanson · 2013-05-28T16:43:45Z

I can probably improve the compiler to handle rooting of immutables more efficiently. We can't have the unsafe version in Base; manually keeping a reference to one object as long as another exists is not a workable API. It's equivalent to manual memory management. Actually much harder to do than keeping an object alive over a function call, as in many ccalls.

ViralBShah · 2013-06-22T09:23:55Z

@JeffBezanson Can we improve the GC rooting here, or perhaps do special rooting of subarrays? At least where contiguous blocks of an array are extracted through getindex, it would be nice to return a subarray.

ViralBShah · 2013-06-22T09:25:08Z

Some discussion of views in here, and @lindahua also is using similar ideas in NumericFunctors.

https://groups.google.com/forum/?fromgroups=#!topic/julia-dev/MIooix5yi3k

ViralBShah · 2013-06-22T09:25:26Z

Also, @tanmaykm is doing similar things in ChainedVectors.

timholy · 2013-06-22T11:34:22Z

Even if we don't use immutables (for which the gc changes might take some heavy lifting, I imagine), @StefanKarpinski, it does seem like SubArray could be redesigned along the lines you propose. I also agree that a SubArray of a SubArray should be massaged to have the same Array parent with different slicing.

I'm not certain I fully understand the details of your lindot, but certainly the general principle seems great.

The key in views is to avoid allocations (which was the point of SubVector). For many iterative applications (e.g., performing an operation on each column), "reslicing" might be a pretty good solution: if you can reuse an existing SubArray and just change the view, you've accomplished the same thing. However, for reslicing our current SubArrays, there's one big hitch: indexes is a tuple, not a vector, and currently I don't see a way from a typing perspective to have it be anything else. So I think the SubArray type would have to be redesigned from the ground up. It would be a little ugly, but rather than using a tuple of RangeIndex, one could probably use 4 vectors (ugh): 3 to encode the start, step, and len elements of each dimension's Range, and (likely) an extra to encode whether any singleton dimension is subbed or sliceed. (Perhaps you might be able to get around the last one, I'm not sure.)

timholy · 2013-06-22T12:08:48Z

Of course, there's a problem with reslice: it doesn't play nicely with convenient syntax like A[2:2:end,4:7] unless the "reuse memory where applicable" problem is also solved. It does work for people who are willing to do these things manually, but I think that's not the long-term vision (and I agree with that perspective).

So probably the best is to shoot for the immutable version, because it makes everything else easy.

lindahua · 2013-06-22T13:32:16Z

I wrote a benchmark script on Gist to compare the performances of different approaches:

Here is what I get:

Benchmark with long columns
with_ref!:  slow-down = 11.1970x     # use a[:,i]
with_sub!:  slow-down = 114.4806x    # use sub(a, 1:m, i)
with_sub1!: slow-down = 1.1974x      # use pointer_to_array
with_sub2!: slow-down = 2.5286x      # use immutable sub-vector

Benchmark with short columns
with_ref!:  slow-down = 261.4301x
with_sub!:  slow-down = 1256.6697x
with_sub1!: slow-down = 38.8170x
with_sub2!: slow-down = 19.0293x

We can observe:

The current version of sub, which creates SubArray is way too slow -- the overhead and indexing cost just dominate.
Using a[:, i] which makes copies also kills performance
Both immutable sub-vector and using pointer_to_array to create a shared-memory view work much better. That said, both lead to significant overhead when the columns are short.
Overall, the performance of creating a shared-memory array using pointer is better -- after all, you simply get the performance of builtin array. The limitation is that this applies only to contiguous blocks.

So, what about doing this?

For cases that always result in contiguous block, we simply return a shared-view memory array as like what I did in NumericFunctors.jl
For other cases, we use SubArray -- of course, this type needs to be reworked.

lindahua · 2013-06-22T13:43:29Z

Please move the discussion to #3496.

timholy · 2013-06-22T16:23:41Z

Since this point is SubVector-specific...did you profile the immutable subvector? If it was not getting constructed (which is how we want it to behave), there should be no overhead whatsoever with this approach. It's only when the compiler decides it actually has to build the SubVector that something needs to be allocated.

In contrast, the pointer_to_array approach does require an allocation (creation of a new array, with a pointer to shared data). So in principle, this has to be slower than the SubVector approach. (Unless I screwed up my implementation somehow, which is entirely possible.)

That said, it all depends on the state of the Julia compiler, at the time the tests are run.

lindahua · 2013-06-22T16:31:57Z

@timholy I tried the one you proposed in the PR. It yields performance similar to my immutable subvector.

I posted the performance results in the table above. There you can see about 2x slow-down using this sub-vector as opposed to using the Julia array.

I suspect that the compiler (at current version) does not get the getindex methods inlined and it also does not elide the construction of the immutable sub-vector.

lindahua · 2013-06-22T16:36:43Z

Having said that, I do believe this is the way to go. In #3496, I am considering extending the approach to create efficient immutable subarrays of various kind. The difference is that those types maintain the parent array instead of just the pointer, which would preclude the parent from being gc'ed during the life cycle of the subvector.

I think the 2x slow down of calling getindex of the sub-vector/sub-array will be addressed through improvement on the compiler part.

timholy · 2013-06-22T16:37:04Z

In my application, at the time, I was able to see that it was elided. As you say, there may have been compiler changes since then.

There's a tension between two approaches: "doing it in the way that should in principle be best, if the memory-safety issues can be resolved" (immutables) and "doing it in the way that works today" (like your #3496). For the record, I do lean towards the latter, but the performance results have to be put in a context where we are all aware of the more fundamental considerations.

timholy · 2013-06-22T16:37:25Z

I see we're both agreeing with each other while typing our responses!

timholy · 2013-06-22T16:37:53Z

Until the memory-safety issues are resolved, we can't go with the immutable version, unfortunately.

timholy · 2013-06-22T16:41:25Z

Ah, now I see you put the whole parent in. That does solve it, perhaps at the cost of performance. But much more sustainable.

lindahua · 2013-06-22T16:44:06Z

In #3496

what I proposed is as below

immutable SubVec{A<:Abstract}
    parent::A
    dim::Int
    offset::Int
end

The only difference is that I maintain a parent array instead of the pointer, so there should be no problem of memory safety. I do see both solutions similar -- neither is in principle better than the other. Anyway, let's move the discussion to #3496.

Add SubVector (fast 1d slices)

279b6e3

JeffBezanson mentioned this pull request Jun 1, 2013

BitArray performance enhancements via introduction of UnsafeArray and UnsafeBitArray #3265

Closed

JeffBezanson closed this Jun 2, 2013

StefanKarpinski mentioned this pull request Jun 3, 2013

embed tensor-like objects as higher-dimensional objects with trailing singleton dimensions #3262

Closed

lindahua mentioned this pull request Jun 22, 2013

Re-working SubArray #3496

Closed

kmsquire mentioned this pull request Jul 31, 2013

refactor I/O to do more buffering #3887

Open

timholy mentioned this pull request Dec 2, 2014

Non allocating rand!(A) (get rid of pointer_to_array) #9174

Closed

tlienart mentioned this pull request Dec 31, 2020

Vector is to Array as XX is to SubArray #39059

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: add SubVector (fast 1d slices) #3224

RFC: add SubVector (fast 1d slices) #3224

timholy commented May 28, 2013

ViralBShah commented May 28, 2013

StefanKarpinski commented May 28, 2013

JeffBezanson commented May 28, 2013

timholy commented May 28, 2013

JeffBezanson commented May 28, 2013

ViralBShah commented Jun 22, 2013

ViralBShah commented Jun 22, 2013

ViralBShah commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

RFC: add SubVector (fast 1d slices) #3224

RFC: add SubVector (fast 1d slices) #3224

Conversation

timholy commented May 28, 2013

ViralBShah commented May 28, 2013

StefanKarpinski commented May 28, 2013

JeffBezanson commented May 28, 2013

timholy commented May 28, 2013

JeffBezanson commented May 28, 2013

ViralBShah commented Jun 22, 2013

ViralBShah commented Jun 22, 2013

ViralBShah commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013