-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: add SubVector (fast 1d slices) #3224
Conversation
i have been wanting this for a while. |
Very nice. Here's some relevant work I did recently, showing that any linear transformation of coordinates can be expressed as a linear transformation of their linear index, integer-divided by various products of dimensions: # (dimensions, coordinates) => linear index
function lin(d::Vector{Int}, c::Vector{Int})
@assert length(d) == length(c)
l, p = 0, 1
for i=1:length(d)
@assert 0 <= c[i] < d[i]
l += c[i]*p
p *= d[i]
end
return l
end
# dot product in terms of linear index and dimensions
function lindot{T}(x::Vector{T}, d::Vector{Int}, l::Int)
@assert length(x) == length(d)
t, p = x[1]*l, d[1]
for i = 2:length(x)
t += (x[i]-x[i-1]*d[i-1])*div(l,p)
p *= d[i]
end
return t
end
# test that lindot actually gives the dot product
for _ in 1:100000
d = rand(1:10, rand(1:10))
c = [ rand(0:v-1) for v=d ]
x = rand(-10:10, length(d))
l = lin(d,c)
@assert lindot(x,d,l) == dot(x,c)
end Note that in real code, the Since array views are all expressible as linear transformations of input coordinates to memory offsets, this means that all array views can be linearly indexed fairly efficiently. It also means that integer/range slices of views can also always be represented as views since both operations are linear. We should take a look at SubArrays in light of this, make them a bit more general while making linear indexing efficient, and consider making more widespread use of them. I think we might want to make all integer/range slicing operations produce views by default. |
This implementation is not safe --- a |
Yes, you have to hold the array separately. I initially implemented a version that had a An alternative I've used in the past is to use If there's no good solution to the construction problem---or if this is simply too specialized on one dimension to end up in base---I can close this. It's already in |
I can probably improve the compiler to handle rooting of |
@JeffBezanson Can we improve the GC rooting here, or perhaps do special rooting of subarrays? At least where contiguous blocks of an array are extracted through getindex, it would be nice to return a subarray. |
Some discussion of views in here, and @lindahua also is using similar ideas in NumericFunctors. https://groups.google.com/forum/?fromgroups=#!topic/julia-dev/MIooix5yi3k |
Also, @tanmaykm is doing similar things in ChainedVectors. |
Even if we don't use immutables (for which the gc changes might take some heavy lifting, I imagine), @StefanKarpinski, it does seem like SubArray could be redesigned along the lines you propose. I also agree that a SubArray of a SubArray should be massaged to have the same Array parent with different slicing. I'm not certain I fully understand the details of your The key in views is to avoid allocations (which was the point of SubVector). For many iterative applications (e.g., performing an operation on each column), "reslicing" might be a pretty good solution: if you can reuse an existing SubArray and just change the view, you've accomplished the same thing. However, for reslicing our current SubArrays, there's one big hitch: indexes is a tuple, not a vector, and currently I don't see a way from a typing perspective to have it be anything else. So I think the SubArray type would have to be redesigned from the ground up. It would be a little ugly, but rather than using a tuple of |
Of course, there's a problem with So probably the best is to shoot for the immutable version, because it makes everything else easy. |
I wrote a benchmark script on Gist to compare the performances of different approaches: Here is what I get: Benchmark with long columns
with_ref!: slow-down = 11.1970x # use a[:,i]
with_sub!: slow-down = 114.4806x # use sub(a, 1:m, i)
with_sub1!: slow-down = 1.1974x # use pointer_to_array
with_sub2!: slow-down = 2.5286x # use immutable sub-vector
Benchmark with short columns
with_ref!: slow-down = 261.4301x
with_sub!: slow-down = 1256.6697x
with_sub1!: slow-down = 38.8170x
with_sub2!: slow-down = 19.0293x We can observe:
So, what about doing this?
|
Please move the discussion to #3496. |
Since this point is SubVector-specific...did you profile the immutable subvector? If it was not getting constructed (which is how we want it to behave), there should be no overhead whatsoever with this approach. It's only when the compiler decides it actually has to build the SubVector that something needs to be allocated. In contrast, the That said, it all depends on the state of the Julia compiler, at the time the tests are run. |
@timholy I tried the one you proposed in the PR. It yields performance similar to my immutable subvector. I posted the performance results in the table above. There you can see about I suspect that the compiler (at current version) does not get the |
Having said that, I do believe this is the way to go. In #3496, I am considering extending the approach to create efficient immutable subarrays of various kind. The difference is that those types maintain the parent array instead of just the pointer, which would preclude the parent from being gc'ed during the life cycle of the subvector. I think the 2x slow down of calling |
In my application, at the time, I was able to see that it was elided. As you say, there may have been compiler changes since then. There's a tension between two approaches: "doing it in the way that should in principle be best, if the memory-safety issues can be resolved" (immutables) and "doing it in the way that works today" (like your #3496). For the record, I do lean towards the latter, but the performance results have to be put in a context where we are all aware of the more fundamental considerations. |
I see we're both agreeing with each other while typing our responses! |
Until the memory-safety issues are resolved, we can't go with the immutable version, unfortunately. |
Ah, now I see you put the whole parent in. That does solve it, perhaps at the cost of performance. But much more sustainable. |
In #3496 what I proposed is as below
The only difference is that I maintain a parent array instead of the pointer, so there should be no problem of memory safety. I do see both solutions similar -- neither is in principle better than the other. Anyway, let's move the discussion to #3496. |
Here's a baby step towards making Arrays just pointers + size & stride information. With it I'm getting ~2x - 2.5x speedups over
mapslices
in a recent application that looks a lot like this:Counter
comes fromGrid.jl
.