Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe, lazy unwrapping #65

Closed
mcabbott opened this issue Aug 22, 2019 · 13 comments
Closed

Safe, lazy unwrapping #65

mcabbott opened this issue Aug 22, 2019 · 13 comments

Comments

@mcabbott
Copy link
Collaborator

It would be nice to have an unwrapping function which is guaranteed to leave the dimensions in the specified order. This would be like parent(permutedims(A, (:i, :j)) but I think it should:

  1. return always a view of the data, not a copy
  2. only return a wrapper Transpose or PermutedDimsArray if this is necessary
  3. eventually understand multiple wildcards

My proposal was to call this unname(A, (:i, :j)). Then 2 can't strictly be type-stable but perhaps some wizardry can be done to make it work well.

@oxinabox
Copy link
Member

I wonder if a better way to do this might be entirely seperate to NamedDims.jl
Have a package with 1 function: simplify_wrapper_arrays
and it knows about all the Base/LinearAlgebra
wrapper array types,
and it applies rules to simplify them.

For example making this transform to remove a rundant permutation that is mixed in with views.

julia> x
3×4 Array{Int64,2}:
 11  12  13  14
 21  22  23  24
 31  32  33  34

julia> PermutedDimsArray(view(PermutedDimsArray(x, (2,1)), :, 1:2), (2,1))
2×4 PermutedDimsArray(view(PermutedDimsArray(::Array{Int64,2}, (2, 1)), :, 1:2), (2, 1)) with eltype Int64:
 11  12  13  14
 21  22  23  24

julia> PermutedDimsArray(view(PermutedDimsArray(x, (2,1)), :, 1:2), (2,1)) |> typeof
PermutedDimsArray{Int64,2,(2, 1),(2, 1),SubArray{Int64,2,PermutedDimsArray{Int64,2,(2, 1),(2, 1),Array{Int64,2}},Tuple{Base.Slice{Base.OneTo{Int64}},UnitRange{Int64}},false}}

julia> view(x, 1:2, :)
2×4 view(::Array{Int64,2}, 1:2, :) with eltype Int64:
 11  12  13  14
 21  22  23  24

julia> view(x, 1:2, :) |> typeof
SubArray{Int64,2,Array{Int64,2},Tuple{UnitRange{Int64},Base.Slice{Base.OneTo{Int64}}},false}

Then one would largely just do:
unname(simplify_wrapper_arrays(unname(A)))
or something to that effect.

Further more if simplifying arrays it is written to be extensible NamedDIms.jl
could plug into it, and have rules that say when simplifying wrapper types that have a NamedDimsArray as one of the layers,
try and push that layer to be futher out.
Which is generally preferable as it means the named indexing works.

@nickrobinson251
Copy link
Contributor

nickrobinson251 commented Aug 22, 2019

yes AFAICT this is orthogonal to NamedDims (but a cool idea, and one that NamedDims should play nice with, if/when it exists)

@mcabbott
Copy link
Collaborator Author

views are pretty smart about overlapping, but PermutedDimsArray (and also reshape) not so much. The package Strided.jl handles such things pretty well, although not in quite the style you suggest:

julia> using Strided

julia> y = @strided permutedims(view(permutedims(x, (2,1)), :, 1:2), (2,1))
2×4 StridedView{Int64,2,Array{Int64,1},typeof(identity)}:
 11  12  13  14
 21  22  23  24

julia> y[1,1] = 99
99

julia> x
3×4 Array{Int64,2}:
 99  12  13  14
 21  22  23  24
 31  32  33  34

@mcabbott
Copy link
Collaborator Author

mcabbott commented Aug 22, 2019

One more NamedDims-like issue is whether there ought to be a way of normalising so that the parent is an Array, not a lazy wrapper.

julia> NamedDimsArray(ones(3,1)', (:i, :j)) 
1×3 NamedDimsArray{(:i, :j),Float64,2,LinearAlgebra.Adjoint{Float64,Array{Float64,2}}}:
 1.0  1.0  1.0

Should there be a function name_the_parent(A, names) which unwraps on creation, to give 3×1 NamedDimsArray{(:j, :i),...? So that code like sum(nda, dims=:j) always gets the simplest operation.

Or should we regard things like this as bugs to be solved upstream:

rr = rand(100,100,100);
@btime sum($rr, dims=(1,3)) # 279.192 μs (6 allocations: 1.06 KiB)
pp = PermutedDimsArray(rr, (3,2,1));
@btime sum($pp, dims=(1,3)) # 2.078 ms (6 allocations: 1.06 KiB)

Edit: JuliaLang/julia#33029 is the issue; and here is a go at the normalising function suggested.

@mcabbott
Copy link
Collaborator Author

Maybe we got off track here. While better stacking of views would be nice, my point here is something different. I'd like this to work:

m = NamedDimsArray(rand(2,3), (:i, :j))
tm = rand()<0.5 ? m : copy(transpose(m))
unname(tm, (:i, :j))     # un-named array, always 2 x 3, sometimes ::Transpose

That is, given something with the right names, but in unknown order, I want to produce an AbstractArray with indices guaranteed to be in the order I specify, to pass to some calculation which cares about order not names.

More broadly I guess I'd like ways to deal with indices only by name. This package allows some things in this direction (like tm[i=2]) but it could go further, to write things not only clearly but independent of storage order.

@nickrobinson251
Copy link
Contributor

I want to produce an AbstractArray with indices guaranteed to be in the order I specify

I certainly see the appeal of this!

...to pass to some calculation which cares about order not names.

But i wonder what functions you have in mind here? Do you have a specifc example? I'm wondering if these functions could just be extended to handle names?

@nickrobinson251
Copy link
Contributor

For posterity, here's the current situation (where unname doesn't take a second argument):

julia> m = NamedDimsArray(rand(2, 3), (:i, :j))
2×3 NamedDimsArray{(:i, :j),Float64,2,Array{Float64,2}}:
 0.596994  0.0550656  0.204249
 0.211499  0.734237   0.364742

julia> tm = transpose(m)
3×2 NamedDimsArray{(:j, :i),Float64,2,LinearAlgebra.Transpose{Float64,Array{Float64,2}}}:
 0.596994   0.211499
 0.0550656  0.734237
 0.204249   0.364742

julia> unname(tm)
3×2 LinearAlgebra.Transpose{Float64,Array{Float64,2}}:
 0.596994   0.211499
 0.0550656  0.734237
 0.204249   0.364742

julia> tm = copy(transpose(m))
3×2 NamedDimsArray{(:j, :i),Float64,2,Array{Float64,2}}:
 0.596994   0.211499
 0.0550656  0.734237
 0.204249   0.364742

julia> unname(tm)
3×2 Array{Float64,2}:
 0.596994   0.211499
 0.0550656  0.734237
 0.204249   0.364742

@mcabbott
Copy link
Collaborator Author

...to pass to some calculation which cares about order not names.

But i wonder what functions you have in mind here? Do you have a specifc example? I'm wondering if these functions could just be extended to handle names?

For instance A * B contracts neighbouring indices. It does handle names (passes them forward & errors on a clash), but it doesn't decide what to do based on the names.

Which is correct, for the rule that code not mentioning names is supposed to treat named and un-named arrays alike (apart from passing along, and checking for clashes). You could imagine other behaviours for *, but that's another issue really.

If I need to call someone else's Package.fun which contains x ./ sum(x, dims=1) somewhere, then writing fun(unname(tm, (:i, :j))) is a safe way to get the right answer. If Package is sufficiently generic, perhaps I can call fun(permutelabels(tm, (:i, :j))) with some function permutelabels which returns a NamedDimsArray guaranteed to be in the order mentioned, and hope to get back something with names.

@nickrobinson251
Copy link
Contributor

If Package is sufficiently generic, perhaps I can call fun(permutelabels(tm, (:i, :j))) with some function permutelabels which returns a NamedDimsArray guaranteed to be in the order mentioned, and hope to get back something with names.

is this not permutedims (optionally combined with unname if you don't want a NamedDimsArray back)?

julia> nda = NamedDimsArray{(:w, :x, :y, :z)}(ones(10, 20, 30, 40));

julia> pnda = permutedims(nda, (:x, :y, :z, :w));

julia> typeof(pnda)
NamedDimsArray{(:x, :y, :z, :w),Float64,4,Array{Float64,4}}

julia> size(pnda)
(20, 30, 40, 10)

julia> typeof(unname(pnda))
Array{Float64,4}

@mcabbott
Copy link
Collaborator Author

mcabbott commented Aug 28, 2019

It's close, but permutedims always copies -- so it fails my point 1 above. It's no good for Package.fun!, and in general may add avoidable memory usage.

@nickrobinson251
Copy link
Contributor

with apologies for going round in a circle here... how it this unname related?

In this case don't we just want

julia> nda = NamedDimsArray{(:w, :x, :y, :z)}(ones(10, 20, 30, 40));

julia> pnda = PermutedDimsArray(nda, (2, 3, 4, 1));  # should add a method for Symbol names

julia> typeof(pnda)
PermutedDimsArray{Float64,4,(2, 3, 4, 1),(4, 1, 2, 3),NamedDimsArray{(:w, :x, :y, :z),Float64,4,Array{Float64,4}}}

julia> size(pnda)
(20, 30, 40, 10)

@mcabbott
Copy link
Collaborator Author

mcabbott commented Aug 28, 2019

This is my point 2 -- unfortunately quite a few operations on PermutedDimsArray fall back to slower variants than Transpose or Array.

It could be spelled something like parent(permutenames(tm, (:i, :j))), I agree. But if you have that, then you may as well have a method for unname as well.

Edit: Sometimes PermutedDimsArray is so slow that a copy is worth it. I don't think there's a great built-in way of saying "copy this only if it's not an array". Perhaps that's another question to consider here, lazy=false or something.

@oxinabox
Copy link
Member

oxinabox commented Sep 1, 2019

Was this closed on purpose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants