-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to do about nonscalar indexing? #30845
Comments
Very nice write-up. Just thinking out loud
julia> getindex.(Ref(randn(4,4)), 1:4, (1:4)')
4×4 Array{Float64,2}:
0.698382 -0.450998 0.00277274 0.265284
1.39098 -0.274713 0.947628 0.853339
0.232767 0.800546 1.53554 1.21039
1.46495 -1.86763 -1.17628 -0.228153
|
It is a very interesting read, thank you.
Does it mean
Nit picking, but did you mean
I'm guessing it's equivalent to
I cannot help thinking that transducer (sorry, that's my library) is the right approach for fusing mapping, filtering and reduction. Currently it can fuse operations like |
Awesome writeup. |
Thanks for the comments. On orthogonal slicing:
Yes, exactly. That's my thought behind the
This is different in two ways from my proposal:
No, my thought is that this is a parser-level transform that automagically will add the correct number of singleton dimensions such that the flagged argument is in a higher dimension than all previous ones. So it's not just adding one singleton dimension — it'd be There would be something nice about it simply meaning add one singleton dimension — it's simple and straightforward, and it'd allow you to do things like And yes, supporting On fusing into indices and logical indexingYeah, logical indexing is something that just "feels" like it'd be nice to fuse and avoid the allocation of the intermediate logical array. The point is somewhat moot because the same problem also applies to |
On the last point: If logical indices always get transformed into intermediate lazy iterables, it seems like little would be lost when one converts the current syntax for logical indexing( |
That was indeed my purpose. I do like APL-indexing.
You might be right but I'm not sure it will be confusing. The two concepts will have separate syntax ( |
Regarding syntax, I kind of like the idea of the number of |
If Another important consideration: how would |
Would there be a large difference between |
No, that's not what I'm proposing: what I'm proposing is that the number of prefix |
Interesting stuff. I'm noticing that almost all the pain comes from transitioning the meaning of julia> @generated function indextuple(A::AbstractArray{T,N}) where {T,N}
idxs = [gensym(:i) for i = 1:N]
quote
$idxs
end
end
indextuple (generic function with 1 method)
julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> macro ⋅(A)
f = gensym(A)
quote
let A = $(esc(A))
idxs = indextuple(A)
f($(idxs...)) = A[$(idxs...)]
end
end
end
@⋅ (macro with 1 method)
julia> i, j = 1:2, 1:2
(1:2, 1:2)
julia> (@⋅A).(i, j) .* i .* j'
2×2 Array{Int64,2}:
1 2
8 16
julia> j = [2,1]
2-element Array{Int64,1}:
2
1
julia> (@⋅A).(i, j) .* i .* j'
2×2 Array{Int64,2}:
4 2
12 6 |
@StefanKarpinski Ooooh, interesting! That's very clever and creative, but I'm not sure that folks index with matrices enough for the additional complexity there to be worthwhile. I think I'd prefer to try the simpler rule that each @peterahrens On views I don't have a firm enough sense of what @timholy The |
Some thoughts from a Python/NumPy perspective (tl;dr there's a potential ambiguity, but I'd be fine with it being resolved either possible way):
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> x
array([[1, 2, 3],
[4, 5, 6]])
>>> x[[0, 1], [0, 1]]
array([1, 5])
>>> x[0:2, 0:2]
array([[1, 2],
[4, 5]])
In particular, Julia Note that this might be okay, e.g. I think I'm fine with saying that if you really need to zip together nonadjacent nonscalar indices you should (I also don't think it's a good idea to deprecate nonscalar |
Famous last words 😁 |
While we are "pie-in-the-sky brainstorming": Much of the complication here comes from multidimensionality... in the purely 1D ( Thus, I wonder if we should think of marshalling multi-dimensional indexing into an appropriate "single dimensional" form. For example, the APL rules are stated to give "the Cartesian product of the input indices" so let's use precisely this. Instead of the current non-scalar matrix indexing
I'm thinking of something like
where We could keep the surface-level syntax of scalar multidimensional indexing exactly as is, so that While we are at it, we could even think about distinguishing the syntax I've been thinking of this stuff because I also want the concept of multi-dimensional dictionaries, and this is the best I've come up with so far. I think it makes the concepts of broadcasting and multi-dimensional indexing more orthogonal, thus they can compose together more cleanly with each other as well as other concepts. |
I forgot to mention another thought - I wonder if |
This came up when decompressing a sparse matrix. JuliaDiff/SparseDiffTools.jl#38 (comment) . rows_index, cols_index, val = findnz(J)
J[rows_index, cols_index] .+= (color[cols_index] .== color_i) .* dx[rows_index] would be natural, but instead I'm left with: @.. setindex!((J,),getindex((J,),rows_index, cols_index) + (getindex((color,),cols_index) == color_i) * getindex((dx,),rows_index),rows_index, cols_index) which is quite nasty. This issue is starting to come up a lot with GPU parallelism asking for things to be vectorized, but sparse matrix calculations require logical or index-based indexing, so broadcasting this and things like Anyways, any meaning other than |
Consensus seems to be that the |
I agree this in general. But let me try the opposite (sorry!), to see how far I can get. It may already be obvious for people here but I just want to note that logical indexing can be implemented in terms of lowering to struct LogicalIndex{T}
select::Bool
index::T
end
struct LogicalIndexer{T} # wraps a broadcastable object (with Bool eltype)
bc::T
end
function If end
Broadcast.broadcasted(::typeof(If), x) = LogicalIndexer(x)
# Forward some interfaces to `.bc`:
Broadcast.instantiate(li::LogicalIndexer) =
LogicalIndexer(Broadcast.instantiate(li.bc))
Broadcast.BroadcastStyle(::Type{<:LogicalIndexer{T}}) where T =
Broadcast.BroadcastStyle(T)
Broadcast.combine_axes(li::LogicalIndexer) = Broadcast.combine_axes(li.bc)
Broadcast.broadcastable(li::LogicalIndexer) = li
Base.size(li::LogicalIndexer) = size(li.bc)
Base.getindex(li::LogicalIndexer, I...) = LogicalIndex(li.bc[I...], I)
Base.setindex!(arr::AbstractArray, value, idx::LogicalIndex) =
if idx.select
arr[idx.index...] = value
end
x = [-1, 0, 1]
setindex!.(Ref(x), 0, If.(x .> 0)) # x.[If.(x .> 0)] .= 0 (kind of)
@show x One disadvantage is that, if we define this for We can also lower x.[if x .> 0] .= 0 Handling the case that abstract type MaybeSelected end
struct Selected{T} <: MaybeSelected
value::T
end
struct Masked <: MaybeSelected end
Base.getindex(arr, idx::LogicalIndex) =
idx.select ? Selected(arr[idx.index]) : Masked() (this is untested) and automatically lift functions to |
flipped_xs = Buffer(xs)
for t ∈ reverse(eachindex(xs))
flipped_xs[t] = f(xs[t])
end to obscure broadcasted form: rev_time = reverse(eachindex(xs))
getindex.(Ref(f.(getindex.(Ref(xs), rev_time))), rev_time) for performance reasons. Being able to write those using something like rev_time = reverse(eachindex(xs))
(f.(xs.[rev_time])).[rev_time] I will paste below examples from a real project that uses function flip(f, xs)
rev_time = reverse(eachindex(xs))
return getindex.(Ref(f.(getindex.(Ref(xs), rev_time))), rev_time)
# the same as
# flipped_xs = Buffer(xs)
# @inbounds for t ∈ rev_time
# flipped_xs[t] = f(xs[t])
# end
# return copy(flipped_xs)
# but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
end
function (m::BLSTM)(Xs::T₃)::T₃ where T₃ <: DenseArray{<:Real,3}
# preallocate output buffer
Ys = Buffer(Xs, 2m.dim_out, size(Xs,2), size(Xs,3))
axisYs₁ = axes(Ys, 1)
time = axes(Ys, 2)
rev_time = reverse(time)
@inbounds begin
# get forward and backward slice indices
slice_f = axisYs₁[1:m.dim_out]
slice_b = axisYs₁[(m.dim_out+1):end]
# bidirectional run step
setindex!.(Ref(Ys), m.forward.(view.(Ref(Xs), :, time, :)), Ref(slice_f), time, :)
setindex!.(Ref(Ys), m.backward.(view.(Ref(Xs), :, rev_time, :)), Ref(slice_b), rev_time, :)
# the same as
# @views for (t_f, t_b) ∈ zip(time, rev_time)
# Ys[slice_f, t_f, :] = m.forward(Xs[:, t_f, :])
# Ys[slice_b, t_b, :] = m.backward(Xs[:, t_b, :])
# end
# but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
end
return copy(Ys)
end
function (m::PBLSTM)(xs::VM)::VM where VM <: DenseVector
# reduce time duration by half by restacking consecutive pairs of input along the time dimension
xxs = vcat.(getindex.(Ref(xs), 1:2:lastindex(xs)), getindex.(Ref(xs), 2:2:lastindex(xs)))
# the same as
# @views xxs = @inbounds(vcat.(xs[1:2:end], xs[2:2:end]))
# but implemented via broadcasting for performance reasons
return vcat.(m.forward.(xxs), flip(m.backward, xxs))
end
Hs_buffer = Buffer(h, size(h,1), batch_size, length(hs))
setindex!.(Ref(Hs_buffer), hs, :, :, axes(Hs_buffer, 3))
# the same as
# for k ∈ eachindex(hs)
# Hs_buffer[:,:,k] = hs[k]
# end
# but implemented via broadcasting as Zygote differentiates loops much slower than broadcasting
Hs = copy(Hs_buffer) |
I haven't spent more than 2 seconds thinking about this, but isn't fixing Zygote the best approach? |
It's not just Zygote that depends upon this — GPU vectorization also heavily depends upon broadcasting. |
Is it mostly a missing hoisting optimization? Why doesn't LICM do this for us? |
I think it is just easier to overload and know what a broadcast expression will do than to try pattern match a loop. A loop is a pretty low level abstraction compared to broadcasting. |
Fair enough. But it would be giving up a lot to resign to not having fast loops anymore. I don't doubt it would be hard, but I wonder if having our own LICM pass that runs before inlining is something we'll eventually decide we need. LLVM's pattern matcher might be specialized for what it thinks of as a number, and Julia's notion of |
I instantly fell in love with the suggestion of making it explicit by @andyferris (#30845 (comment)). The only missing thing would be an ascii variant which is easy enough to type and remember that no one will complain about dropping APL style at some time. I mostly like that approach out of the same reason why I like broadcasting, CartesianIndex/Indices and the whole promotion/conversion mechanism. Make the behaviour more explicit while increasing elegance. Have code working by default due to clever mechanics. ProposalI suggest to remove much of the index handling from new array typeDefine getindex(A::AbstractArray, ::LinearIndex) #1 if IndexStyle==IndexLinear
getindex(A::AbstractArray, ::CartesianIndex) #2 if IndexStyle==IndexCartesian
getindex(A::AbstractArray, ii::IterableIndex) = collect(idx->A[idx], ii) #3 default
getindex(A::AbstractArray, specialized::IndexBuilder) = getindex(A, build(IndexStyle(A), A, specialized))) #4 default
getindex(A, args...) = getindex(A, DefaultBuilder(A)(args...)) #redirecting fallback or via lowering
DefaultBuilder(A::AbstractArray) = tuple #current default/fallback You may define new builder type (like APL, broadcasting, compiletime, whatsoever)Define build(A::AbstractArray, ib::IndexBuilder)::IndexType #5
build(s::IndexStyle, A::AbstractArray, ib::IndexBuilder)::IndexType(s) = convert(s, build(A, ib)) #6 default
build(A::AbstractArray, t::Tuple) = to_indices(A, t) #current default/fallback comparions to current situationThose, who know the fallback behaviour of comparison to suggested behavioursHave All those rewrites of the |
FYI, there were some discussions around awkward-array https://github.com/scikit-hep/awkward-array in andyferris/Dictionaries.jl#19 (comment) with me and @andyferris. It is a Python library for nested arrays and tables. If you are curious, I think this Strange Loop talk "Jagged, ragged, awkward arrays" by Jim Pivarski - YouTube is a great introduction to it. In particular, it also touches on the growing need and attention of such data structure in technical computing. I wonder if nonscalar indexing syntax @andyferris Re #36105 (comment)
I'd imagine we need to call a strictly lazy version of |
Yes, I think that sounds right. |
So with broadcasting dots, we've managed to excise nearly all sometimes-scalar/sometimes-vectorized functions in Julia to great benefit. There are two notable stragglers:
getindex
andsetindex!
. Now, since they have their own syntaxes and support so many fancy things it hasn't always been (and really still isn't) completely obvious that it should be deprecated in favor of broadcasting. But they really are just "broadcasting" scalar indexing over the cartesian product of the indices.The advantages
There'd be a number of advantages to spelling nonscalar indexing as some form of broadcasting. First and foremost: fusion. Forget the whole view/copy debate — straight-up fusion could be faster than either. For example, we could define the new syntax
A.[I...]
to meanbroadcast((idxs...)->A[idxs...], I...)
. Then, to extract the first 10 elements from the first column ofA
, you writeA.[1:10, 1]
. This means that you could fuse in additional operations without any temporaries or views whatsoever with, e.g.,sqrt.(A.[1:10, 1])
.The other huge advantage is how this would then generalize these sorts of accesses to all data structures, including dictionaries and more. Cf. #24019.
The challenges and their potential solutions
Now, there are also some fairly major challenges in making it easy to express all that indexing does with a broadcasting semantic.
APL Indexing: The existing
A[1:10, 1:10]
takes the cartesian product of the two indices, returning a 2-dimensional 10x10 array. I'd expect a broadcastedA.[1:10, 1:10]
to use broadcasting's shape-matching semantic and return the diagonal. In general, APL indexing can be thought of as a broadcast where each successive index is "lifted" to a higher dimension than the sum of dimensionalities of all arguments that preceded it. We could potentially have a simple wrapper that flags arguments to be "orthogonalized" before participating in broadcast — options for spelling this wrapper could include⟂
or^
. Thus,A[I, J]
becomesA.[I, ^J]
. This is a generally useful operation... and for my purposes would completely obviate the pain from the recursive transpose/adjoint fallback removal as you could lift arguments to orthogonal dimensions anywhere:f.(1:10, ^array_of_strings)
. That said, the change in default operation here from APL-like to broadcasting-like may prove to be very painful...Index conversions: The existing indexing API supports many types of indices and even an extensible interface for adding more.
:
is actually a function, but when used as an index it expands out to the entire axis. With broadcast, however, it acts like a function — as a scalar.Resolving this one means unfortunately giving up something fairly major: I don't believe it'll ever be possible to support all these features and also support broadcast fusion through to an index computation. So while it would be oh-so-cool to have an expression like
A.[clamp.(idx .+ (-N:N), 1, end)].^2
(to examine a window about someidx
without worrying about bounds) fuse the whole way down, it simply isn't extensible to the very next thing you'd want to have fuse:A.[A .> 0]
. That would be a nightmare to figure out how to fuse.Bounds checking: In comparison to the other issues, this one feels much simpler, but it's still gonna be a bit of a pain. Non-scalar indexing is able to perform bounds checking at the level of the collections of indices and then perform the scalar indexing with bounds checks off. This is a huge win for things like
:
(no checks),1:10
(just check endpoints), and logical masks (just check the shape). I think this may be possible to still do — again, if we don't fuse through to the index computations. But I've not completely seen to the end of this tunnel.Returned array type: We’ll need to find a solution to
broadcast
forAbstractArray
types #17533;similar
’s first major use was in defining the output type for nonscalar indexing. Lots of folks specialized it for that reason, and broadcast doesn’t use it.Backwards compatibility: For the most part, this is entirely a new syntax with a straight-forward deprecation. There's one place where I was worried about a clash, though: the
@.
macro. That currently leaves indexing expressions alone, but once we introduce theA.[]
operator, I'd expect it to dot those bracket operations. It seems like we have sufficient leeway in the macro to do some fancy detection, though, allowing us to insert appropriate handling code if any arguments are arrays and inserting deprecations appropriately.Indexed Assignment: So far I've really only talked about getindex, but all the same applies to
setindex!
, too. It actually becomes quite the advantage as the dots can tell the whole story about what's scalar and whats not right where you'd expect (adapted from the sister issue What to do about multivalue setindex!? #24086):A[i] = x
: Always setsx
at the scalar indexi
.A[i] .= X
: Always broadcastsX
across the element at locationi
(always mutating the object inA[i]
).A.[I] = x
: Assignsx
to every index inI
.A.[I] .= X
: Always broadcastsX
across the indices selected byI
(always mutatingA
).What to do:
So, bringing this all together, I propose we aim to:
A.[I, J, K]
to mean:f.(a, ^b, c, ^d)
to mean:References and prior discussions
This issue brings together lots of iterations of thought that have been spread about through many issues and discourse communications.
getindex
as compared toview
(which is always nonscalar).The text was updated successfully, but these errors were encountered: