Cleaning up indexing and dimension mapping code. #181

Tokazama · 2021-08-03T16:32:07Z

tldr; I was cleaning up indexing/dimension stuff and did the following:

Commented out doc strings that I intentionally never put on the README so that people wouldn't mistakenly start using less stable internals.
A lot of this is inherently related to index to dimension mapping. We can't do anything about how end and begin get lowered here, but the related issue of mapping each indexing argument to the appropriate dimension of an array can be accomplished here (reference: A[i, end] assumes that i indexes one dimension JuliaLang/julia#35681, Elipsis and [CartesianIndex()] not working SciML/EllipsisNotation.jl#26).

Now for the IALR (is a long read). Mapping each indexing argument to the dimensions/axes of an indexed array involves finding out:

How many dimensions each index spans (Int -> 1, CartesianIndex{3} -> 3)
How many dimensions in the subset of the array are represented by each index (CartesianIndices{3} -> 3, CartesianIndex{3} -> 0).

There are also indices that trail past the last dimension of the indexed array that are either dropped (Int) or result in final dimension that's one element long (OneTo(1), :).
Most of this is handled in to_indices right now, which I've found easily results in ambiguities (an example in the wild).
Also, to_indices isn't as optimized as one might hope.
I've found some success in the past with small improvements to the performance of ArrayInterface.to_indices (e.g., mostly performing a few basic operations prior to the recursive to_indices call).

So the problems are:

The current approach for indices to array mapping has some odd corner cases.
Implementing unique behaviors for this isn't user friendly due to the propensity for ambiguities. It's also worth noting that the to_indices(::MyArray, axes, inds::Tuple{IndexType,Vararg{Any}}) pattern is somewhat against the design patterns we're aiming for in ArrayInterface.jl because anything that wraps MyArray will lose this relationship with IndexType.
to_indices has some performance draw backs (but it sounds like that will improve someday with more compiler work).

If you look in the original comments about problems with end there's some dislike of going into the type domain.
However, I think it's pretty reasonable (and useful) to require there is a way to map each index type to the dimensions of an array type.
In the current draft this is being done using ndsim_index.
This is a pretty straightforward setup with the exception of accommodating things like EllipsisNotation.jl where the number of dimensions that EllipsisNotation.Ellipsis maps to is dependent on which dimensions all the other indices map to.
The best solution I've come up with that permits contextual dimension mapping is using LazyAxis to keep track of the number of dimensions.

@inline function ndims_index(::Type{LazyAxis{N,A}}, ::Type{I}) where {N,A,I1,I<:Tuple{I1,Vararg{Any}}}
    NI = ndims_index(LazyAxis{N,A}, I1)
    (NI, ndims_index(LazyAxis{NI + N, A}, Base.tuple_type_tail(I))...)
end

The only other idea I have right now is using axes_types types so that users don't have to worry about using LazyAxis as a counter.
I generally try to avoid forcing anyone to directly interact with Tuple to work with an interface.
On the other hand, if someone isn't comfortable digging into that deep into the type system they probably shouldn't be changing something as fundamental as index to dimension mapping.

I'm going to sit on this for a day or two in hopes that I have an epiphany that solves this all perfectly (unless someone has a suggestion that does that for me).
Nothing here should be breaking BTW.
All the stuff I removed wasn't being used anywhere else yet.

* Realized that a lot of things I had documented internally are now in the online docs, likely making them appear more stable and public than they are right now. * `argdims` was ambiguously documented and not used elsewhere so I split it into two separate/more useful methos: ndims_index and ndims_subset * Deleted/consolidated a bunch of indexing code

codecov · 2021-08-03T16:35:30Z

Codecov Report

Merging #181 (12fc825) into master (2c1fcbb) will increase coverage by 0.55%.
The diff coverage is 88.33%.

@@            Coverage Diff             @@
##           master     #181      +/-   ##
==========================================
+ Coverage   83.57%   84.13%   +0.55%     
==========================================
  Files          11       11              
  Lines        1644     1607      -37     
==========================================
- Hits         1374     1352      -22     
+ Misses        270      255      -15

Impacted Files	Coverage Δ
src/ArrayInterface.jl	`80.80% <ø> (ø)`
src/indexing.jl	`76.56% <87.75%> (-1.02%)`	⬇️
src/dimensions.jl	`96.21% <90.90%> (+0.24%)`	⬆️
src/axes.jl	`67.85% <0.00%> (+4.92%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c1fcbb...12fc825. Read the comment docs.

* removed ndims_subset for now. * added canonicalize tests

Tokazama · 2021-08-04T14:43:39Z

We could potentially do something like this

@inline function ndims_index(axs::Tuple{Vararg{Any,NA}}, inds::Tuple{I1,I2}) where {NA,I1,I2} 
    ndi = ndims_index(getfield(axs, 1), getfield(inds, 1))
    if ndi === nothing
        ndi = ndims_index(getfield(axs, NA), getfield(inds, 2))
        return (ndi - static(NA), ndi)
    else
        _, axstail = Base.IteratorsMD.split(axs, Val(known(ndi)))
        return (ndi, ndims_index(tail(axstail), (getfield(inds, 2),))...)
    end
end
@inline function ndims_index(axs::Tuple{Vararg{Any,NA}}, inds::Tuple{Vararg{Any,NI}}) where {NA,NI}
    ndi = ndims_index(getfield(axs, 1), getfield(inds, 1))
    if ndi === nothing
        ndi = ndims_index(getfield(axs, NA), getfield(inds, NI))
        N = NA - ndi
        if N < 0
            return (ndims_index((), Base.front(inds))..., ndi)
        else
            _, axstail = Base.IteratorsMD.split(axs, Val())
            return (ndims_index(axstail, Base.front(inds))..., ndi)
        end
    else
        _, axstail = Base.IteratorsMD.split(axs, Val(known(ndi)))
        return (ndi, ndims_index(tail(axstail), tail(inds))...)
    end
end

Which would allow Ellipsis to account for CartesianIndices in latter dimensions.
However, I'm leaning towards one of the following:

Implementing Ellipsis here so that it doesn't have to part of a public API
Disallowing contextual multidimensional indexing entirely so that it's not a problem

Tokazama · 2021-08-05T02:09:01Z

For now I think we should just keep index->dim mapping an internal design. I've been quite verbose here, so it should be pretty easy to come back and implement the real deal in the future.

chriselrod · 2021-08-07T18:05:48Z

A lot of this is inherently related to index to dimension mapping. We can't do anything about how end and begin get lowered here, but the related issue of mapping each indexing argument to the appropriate dimension of an array can be accomplished here (reference: A[i, end] assumes that i indexes one dimension JuliaLang/julia#35681, Elipsis and [CartesianIndex()] not working SciML/EllipsisNotation.jl#26).

LoopVectorization also has code for handling this right now, but I hadn't gotten around to actually integrating it with begin and end support (i.e. you can't use begin or end as indices in an @turbo loop at the moment).

EDIT: Although probably not worth trying to rebase it on top of this.

chriselrod

Looks good.

Tokazama added 3 commits August 2, 2021 03:55

Clean up index-dims mapping

107e041

Lispy approach to ndims on tuple of inds

6f005e1

Tokazama added 2 commits August 4, 2021 04:36

Drop support for contectual index mapping for now.

2dbb3db

Clean up and more tests

11f8727

* removed ndims_subset for now. * added canonicalize tests

Tokazama marked this pull request as ready for review August 5, 2021 02:07

version bump

266f138

Tokazama requested a review from chriselrod August 6, 2021 17:09

Merge branch 'master' into dim-api

12fc825

Tokazama requested a review from ChrisRackauckas August 7, 2021 02:39

chriselrod approved these changes Aug 7, 2021

View reviewed changes

Tokazama merged commit 4b8c553 into master Aug 7, 2021

chriselrod deleted the dim-api branch August 23, 2021 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaning up indexing and dimension mapping code. #181

Cleaning up indexing and dimension mapping code. #181

Tokazama commented Aug 3, 2021

codecov bot commented Aug 3, 2021 •

edited

Loading

Tokazama commented Aug 4, 2021

Tokazama commented Aug 5, 2021

chriselrod commented Aug 7, 2021 •

edited

Loading

chriselrod left a comment

Cleaning up indexing and dimension mapping code. #181

Cleaning up indexing and dimension mapping code. #181

Conversation

Tokazama commented Aug 3, 2021

codecov bot commented Aug 3, 2021 • edited Loading

Codecov Report

Tokazama commented Aug 4, 2021

Tokazama commented Aug 5, 2021

chriselrod commented Aug 7, 2021 • edited Loading

chriselrod left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 3, 2021 •

edited

Loading

chriselrod commented Aug 7, 2021 •

edited

Loading