Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version of reshape that preserves keys? #42

Open
rofinn opened this issue Jan 20, 2021 · 3 comments
Open

Version of reshape that preserves keys? #42

rofinn opened this issue Jan 20, 2021 · 3 comments

Comments

@rofinn
Copy link
Collaborator

rofinn commented Jan 20, 2021

I'm not sure how helpful this would be generally, but a lot of ML APIs restrict input arrays to matrices. It'd be kind of nice if we could support at least a subset of the reshape behaviour such that axis keys are merged. For example:

julia> ka = KeyedArray(rand(4, 3, 2); time=1:4, obj=[:a, :b, :c], loc=[:x, :y])
3-dimensional KeyedArray(NamedDimsArray(...)) with keys:
   time  4-element UnitRange{Int64}
   obj  3-element Vector{Symbol}
□   loc  2-element Vector{Symbol}
And data, 4×3×2 Array{Float64,3}:
[:, :, 1] ~ (:, :, :x):
      (:a)       (:b)        (:c)
 (1)   0.416197   0.327252    0.14608
 (2)   0.706717   0.0045184   0.055459
 (3)   0.487265   0.879403    0.121894
 (4)   0.156394   0.431853    0.0756667

[:, :, 2] ~ (:, :, :y):
      (:a)       (:b)       (:c)
 (1)   0.507      0.803645   0.411088
 (2)   0.92779    0.284998   0.418833
 (3)   0.137591   0.415834   0.194712
 (4)   0.785161   0.436941   0.996514

julia> reshape(ka, 4, :)
4×6 Array{Float64,2}:
 0.416197  0.327252   0.14608    0.507     0.803645  0.411088
 0.706717  0.0045184  0.055459   0.92779   0.284998  0.418833
 0.487265  0.879403   0.121894   0.137591  0.415834  0.194712
 0.156394  0.431853   0.0756667  0.785161  0.436941  0.996514

I feel like in these cases it would be nice if we could get something like:

julia> KeyedArray(reshape(ka, 4, :); time=1:4, obj_loc=[:a_x, :b_x, :c_x, :a_y, :b_y, :c_y])
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
   time  4-element UnitRange{Int64}
   obj_loc  6-element Vector{Symbol}
And data, 4×6 Array{Float64,2}:
      (:a_x)     (:b_x)      (:c_x)      (:a_y)     (:b_y)     (:c_y)
 (1)   0.416197   0.327252    0.14608     0.507      0.803645   0.411088
 (2)   0.706717   0.0045184   0.055459    0.92779    0.284998   0.418833
 (3)   0.487265   0.879403    0.121894    0.137591   0.415834   0.194712
 (4)   0.156394   0.431853    0.0756667   0.785161   0.436941   0.996514

Either with reshape directly or at least with a separate, more restrictive, function call. If this seems like it could be useful for other folks I'm happy to open a PR with a suggested function name?

@mcabbott
Copy link
Owner

Yes something like this could be nice to have.

The simple one which takes integer sizes can't be type-stable & I'd be a little hesitant to make that Base.reshape, although possibly you could talk me into it. It's going to resolve the :, and then work in from each end of the dimensions, keeping keys if dimensions match... combing them when it can? For reshape(ka, 4, :, 3) presumably it keeps only the first.

Maybe one inspiration here is things like this:

julia> reshape('a':'f', 0:1, 11:13)
2×3 OffsetArray(reshape(::StepRange{Char, Int64}, 2, 3), 0:1, 11:13) with eltype Char with indices 0:1×11:13:
 'a'  'c'  'e'
 'b'  'd'  'f'

Making reshape(ka, ka.time, :) the interface wouldn't quite fit, but perhaps this is a reason to revive #6?

At some point I wrote split / join functions for named dimension reshaping, but did not find them all that useful, and possibly they have rotted:
https://github.com/mcabbott/NamedPlus.jl/blob/master/src/reshape.jl#L154

@rofinn
Copy link
Collaborator Author

rofinn commented Jan 20, 2021

It's going to resolve the :, and then work in from each end of the dimensions, keeping keys if dimensions match... combing them when it can?

Yeah, pretty much. I guess the trick would be with deciding on some basic promotion/merge rules for various types. I was kind of thinking that for string and symbols it'd be easy to just use a separator, but the fallback for other types would probably need to be either string interpolation or a tuple.

I might need to think about it some more, but I feel like the KeyedUnitRanges is largely independent of this one?

At some point I wrote split / join functions for named dimension reshaping, but did not find them all that useful, and possibly they have rotted:
https://github.com/mcabbott/NamedPlus.jl/blob/master/src/reshape.jl#L154

Interesting, this is almost identical to what I was thinking about, just with a different separator symbol and only on dims. Did you just end up writing your code differently to avoid needing to collapse the dims or was there something particularly cumbersome about that workflow?

@mcabbott
Copy link
Owner

mcabbott commented Jan 23, 2021

Re the KeyedUnitRanges story, I guess ka.time would always remain the key vector, and so that would have to read reshape(ka, axes(ka, :time), :) if it was to pass along the complete, recognisable, object.

for string and symbols it'd be easy to just use a separator, but the fallback

Yes, that sounds sensible, strings if string-ish, tuples if not. And sometimes, when the reshape doesn't just map two dimensions to one, but messes them up, the fallback fallback would be to drop them & use OneTo(n).

just end up writing your code differently

Yes, the honest answer is that when I have enough dimensions to get confused, mostly I reach for index-notation packages as a way to keep things straight (and line up with what's on paper). I haven't found names as useful as I hoped, maybe because you can't see them when looking at the function, only when actually running it? Maybe they are better suited to messy repl-data-wrangling tasks (with visual feedback) than to core-of-computation tasks (where accidentally introducing a permutedims can be pretty expensive). For such use, worrying too much about type-stability is a waste.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants