-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stack(vec_of_vecs)
for vcat(vec_of_vecs...)
#21672
Comments
And return an |
For the base implementation, this would just convert a vector of vectors to matrices, as a specific case. I think whether it stacks in column major or row major order should probably be controlled by a dimension argument. There are generalizations one can imagine doing here. |
There's a general issue with the names of reduction operations on collections of things (e.g. one-argument Examples are:
These are not so bad because the names are similar and standard, but In my opinion, a better name for For example, the task of "make all these vectors reduce(vcat, map(transpose, xs)) which is quite readable, combines existing vocabulary and is nevertheless (thanks to |
FWIW |
I agree with @StefanKarpinski that we need a single-word function for this. |
What about |
FWIW, numpy uses |
|
The function |
I've been wondering if it's better to first support a "view" which you can copy, rather than the reverse process (where we have a function in Thus, we could potentially have m = NestedArray([[1, 2], [3, 4]]) # A 2x2 AbstractMatrix{Int} which is a "view", perhaps called `StackedArray`
copy(m) # contiguous in memory, `Matrix{Int}`. Perhaps `Array(m)` would be clearer/allow possibility for `similar` to return a nested structure. and no |
See also CatViews.jl by @ahwillia. |
Yes, that's certainly possible. I just wanted to mention this because if other terms are available it would be better not to mix these two different operations (I wasn't aware of the Numpy precedent). FWIW, data frames/tables already implement |
Another idea that occurred to me yesterday is to just have specialized methods for |
What does |
I think |
Yes, that's what @TotalVerb proposed above, right?
This is |
Correct me ifI'm wrong, but I thought the "stacked" form of a data frame was a bit like a sparse representation of the table, storing |
Yes, that's more or less the result of |
It's rather unfortunate that we can't use |
Sure, the existence of the data frames/tables function shouldn't be a blocker if Sorry, I didn't expect my original comment to generate such a long sub-thread. |
I'm not sure One possible alternative: I think I'd also lean towards doing the |
The term "stack" makes sense to me because you're taking a bunch of separate vectors/slices of some shape and stacking them along an axis to create a structure with the same shape but an additional "stacked" dimension. I agree that it makes less sense when you're thinking of it in terms of indices, where nesting and unnesting is clearer – that could be a better way to think about it. |
One way to express this could be how you'd like indices to be nested in the output. I.e. you could write |
The Assuming it's a view - one gotcha is that (these days) indices for different array types and dimensions could be of different types, so the
Right, this functionality seems perfect for being developed upon outside of |
I think the most straightforward approach here would be to optimize these two operations in Base: julia> vov = [[1,2,3],[4,5,6],[7,8,9]]
3-element Array{Array{Int64,1},1}:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
julia> reduce(hcat, vov)
3×3 Array{Int64,2}:
1 4 7
2 5 8
3 6 9
julia> mapreduce(transpose, vcat, vov)
3×3 Array{Int64,2}:
1 2 3
4 5 6
7 8 9 Since these operations already produce the desired result, having optimized implementations for these and then telling people to use these as the idiom for doing this would work nicely without any risk of breaking anyone's code. There's also no reason not to optimize this. Any more sophisticated functionality can be developed out of Base in a package. Summary: the action item here, rather than introduce any new verbs into the standard library, is to write optimized methods of |
While at it, can we have ( Perhaps that should be a separate issue, but it is closely linked, |
There aren't any resizeable multidimensional arrays in Base, is there? (It would be a good feature for a nested array view/wrapper type, though!) |
Yes, but there are resizeable 1D Arrays.
Sure, would. |
Again, StackViews is relying on indexing: https://github.com/JuliaArrays/StackViews.jl/blob/master/src/StackViews.jl#L128-L133 . So |
There's two interesting things that we could want here: an eager and a lazy version. It seems simpler to have Base implement the eager version, ie |
My round-up of the half-dozen packages for this is at https://github.com/mcabbott/LazyStack.jl#other-packages . When last I checked the behaviour of |
From triage:
|
Cool, thanks for discussing it! @mcabbott do you have code for this at the ready? If not I can try to make a PR. |
There's code in my PR for the case that the dimensions are ordered If this new function doesn't take a If it does have some way to control in which dimensions the slices lie, then that requires more thought. The present xs = [rand(3,4) for _ in 1:5];
size(func(xs; dims=3)) == (3, 4, 5)
size(func(xs; dims=1)) == (5, 3, 4) # pushes other dims
size(func(xs; dims=4)) == (3, 4, 1, 5) # inserts trivial dim Shouldn't be hard to code this efficiently; I guess it needs elements of both Maybe #31644 (trying to make |
I didn't consider stacking eg matrices of matrices, but you're right there's nothing preventing stack from supporting this. Regarding a specific way of control the dimensions, what would be the use case? It seems to me this would be too confusing for people to actually use and they'd be better off writing their own routines. I know I always have to lookup usage and think about it for a while when using
|
Sometimes people wish to assemble a matrix from rows, you can do My package went for the no-options, everything N-dimensional, version. Since it's lazy, allowing options just means you end up re-writing |
Following the whole discussion I see two needed applications for arbitrary arrays of arrays:
This task can be generalized to arbitrary multidimensional arrays and condensing two (or more) indices (columnwise So an efficient function for nr 1. (turning (md)arrays of (md)arrays into md arrays) which is the latest suggestion for |
I think your 1. is all that's planned; whether to take For your 2. there are quite a lot of possible ways to do that in N dims. The notation you use here is almost exactly that of TensorCast.jl, for example:
|
When I think of "stack" I think about making a df taller via batches of concatenated rows, which isn't really what this is about. This is more of a grouping/ nesting within a newly created dimension? E.g. a 2D new is made where existing 1Ds are nested in it, or a 3D new is made where existing 2Ds are nested in it. |
Yeah I was thinking this also actually... I'm not a native english speaker but it seems to me stack can have two meanings : stack a and b next to other, and stack a and b on top of each other. Reasonably, stack(rand(2), rand(2)) could be a Vector of size 4, or a 2x2 matrix, so it's maybe bad to pick one? |
When comparing a definition to
is the main benefit of the first that it is easier to avoid method ambiguity errors? I think it can be a good idea to have abbreviated names for more generic operations, the same way people tend to prefer |
Another approach: a zero-positional-arg version of
|
This doesn't seem to be clearly stated above, but one complaint about The other complaint is that it doesn't generalise very easily. You can write So more likely the
This was one of the ideas explored in #37196, and one objection there is that |
I believe this issue can be closed now that we have #43334 |
This comes up periodically and it's a bit embarrassing that our official answer right now involves a inefficient splatting. We should have a function – maybe call it
stack
? – that efficiently does whatvcat(vec_of_vecs...)
does without the splatting. It should also stack in slices in different dimensions so that you can either stack slices as rows or columns, etc.The text was updated successfully, but these errors were encountered: