Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vectors as indices to NamedTuple #32664

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cmcaine
Copy link
Contributor

@cmcaine cmcaine commented Jul 24, 2019

Fix #32662

@cmcaine cmcaine changed the title Allow indexing namedtuples with vectors Support vectors as indices to NamedTuple Jul 24, 2019
@cmcaine cmcaine force-pushed the index-namedtuples branch from 3cdb89f to 6045cff Compare July 24, 2019 00:58
@cmcaine
Copy link
Contributor Author

cmcaine commented Jul 24, 2019

I think what I should probably do instead of this code is have indexing with a vector return a new named tuple. This matches the subsetting behaviour that tuples and arrays already exhibit when indexed with a vector.

(1, 3, 5, 7)[2:3] == (3, 5)
(x = 1, y = 3, z = 5)[1:1] == (x = 1,)
(x = 1, y = 3, z = 5)[1:2] == (x = 1, y = 3)

@andyferris
Copy link
Member

@cmcaine I still think the semantics for generalized non-scalar indexing I proposed in the Julep #24019 will be useful and simple to understand, which in this case would mean returning a vector.

Would you consider indexing a named tuple with a named tuple to return a new named tuple as reasonable?

@@ -103,6 +103,8 @@ firstindex(t::NamedTuple) = 1
lastindex(t::NamedTuple) = nfields(t)
getindex(t::NamedTuple, i::Int) = getfield(t, i)
getindex(t::NamedTuple, i::Symbol) = getfield(t, i)
getindex(t::NamedTuple, v::AbstractVector{Symbol}) = [t[i] for i in v]
Copy link
Member

@andyferris andyferris Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maps all v::AbstractVector to Vector.

I wonder if we can make this more generic, as in something like map(Fix1(getindex, t), v)? For example, that definition would mean if v were a SVector then the output could also be a SVector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied below. This seems like a fine implementation if we want to move towards the semantics in your Julep.

@@ -103,6 +103,8 @@ firstindex(t::NamedTuple) = 1
lastindex(t::NamedTuple) = nfields(t)
getindex(t::NamedTuple, i::Int) = getfield(t, i)
getindex(t::NamedTuple, i::Symbol) = getfield(t, i)
getindex(t::NamedTuple, v::AbstractVector{Symbol}) = [t[i] for i in v]
getindex(t::NamedTuple, i) = values(t)[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the semantics of this are? What are some examples where this helps you, and what do you expect as output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ararslan suggested the latter as a fallback so that indexing methods for Tuple will be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing NamedTuples only differs from that for Tuples when indexing with Symbols. Everything else should be identical, so we can get it all for free by using a fallback like this.

@cmcaine
Copy link
Contributor Author

cmcaine commented Jul 24, 2019

From the linked issue:

...provide that for out = a[b]:

  • The output container out shares the indices of b (note: these are CartesianRange for arrays)
  • The values out[i] correspond to a[b[i]].

Does this describe the behaviour you are suggesting?

nt = (x=1, y=3, z=5)
nt[[1,2]] == nt[1:2] == [1,3]
nt[(1,2)] == (1,2)

nt[(:x, :y)] == (1,2)
nt[[:x, :y]] == [1,2]

nt[(a=:x, b=:z)] == (a=1, b=5)
nt[(x=:x, y=:y)] == (x=1, y=3)

# And not this one (because b does not have the indices :x, :y)
nt[(:x, :y)] == (x=1, y=3)

I think I prefer this last interpretation for a vector of symbols, and maybe (probably not) it is OK because out does share all the indices of b? It just has some extra aliases on top.

I dislike that the current way to subset a namedtuple is NamedTuple{(:x, :y)}((x=1, y=3, z=5)) == (x=1, y=3), but I'm happy with either a subsetting convention or your convention.

What do you think?

@cmcaine
Copy link
Contributor Author

cmcaine commented Jul 24, 2019

Supporting more exotic vector types does not seem to be free:

julia> map(Base.Fix1(getindex, nt), OffsetVector([:x, :y], -1))
OffsetArray(::Array{Real,1}, 0:1) with eltype Real with indices 0:1:
 #undef
  -1   

julia> map(Base.Fix1(getindex, nt), OffsetVector([:x, :y], 0))
OffsetArray(::Array{Real,1}, 1:2) with eltype Real with indices 1:2:
 π = 3.1415926535897...
 -1                    

julia> map(Base.Fix1(getindex, nt), OffsetVector([:x, :y], 1))
OffsetArray(::Array{Real,1}, 2:3) with eltype Real with indices 2:3:
 π = 3.1415926535897...
 -1                    

julia> map(Base.Fix1(getindex, nt), OffsetVector([:x, :y], 2))
ERROR: BoundsError: attempt to access OffsetArray(::Array{Real,1}, 3:4) with eltype Real with indices 3:4 at index [3:5]
Stacktrace:
 [1] copyto!(::OffsetArray{Real,1,Array{Real,1}}, ::Int64, ::OffsetArray{Irrational{:π},1,Array{Irrational{:π},1}}, ::Int64, ::Int64) at ./abstractarray.jl:785
 [2] collect_to!(::OffsetArray{Irrational{:π},1,Array{Irrational{:π},1}}, ::Base.Generator{OffsetArray{Symbol,1,Array{Symbol,1}},Base.Fix1{typeof(getindex),NamedTuple{(:x, :y, :z),Tuple{Irrational{},Int64,Char}}}}, ::Int64, ::Tuple{Base.IdentityUnitRange{UnitRange{Int64}},Int64}) at ./array.jl:641
 [3] collect_to_with_first!(::OffsetArray{Irrational{:π},1,Array{Irrational{:π},1}}, ::Irrational{:π}, ::Base.Generator{OffsetArray{Symbol,1,Array{Symbol,1}},Base.Fix1{typeof(getindex),NamedTuple{(:x, :y, :z),Tuple{Irrational{},Int64,Char}}}}, ::Tuple{Base.IdentityUnitRange{UnitRange{Int64}},Int64}) at ./array.jl:630
 [4] _collect(::OffsetArray{Symbol,1,Array{Symbol,1}}, ::Base.Generator{OffsetArray{Symbol,1,Array{Symbol,1}},Base.Fix1{typeof(getindex),NamedTuple{(:x, :y, :z),Tuple{Irrational{},Int64,Char}}}}, ::Base.EltypeUnknown, ::Base.HasShape{1}) at ./array.jl:624
 [5] collect_similar(::OffsetArray{Symbol,1,Array{Symbol,1}}, ::Base.Generator{OffsetArray{Symbol,1,Array{Symbol,1}},Base.Fix1{typeof(getindex),NamedTuple{(:x, :y, :z),Tuple{Irrational{},Int64,Char}}}}) at ./array.jl:548
 [6] map(::Function, ::OffsetArray{Symbol,1,Array{Symbol,1}}) at ./abstractarray.jl:2018
 [7] top-level scope at none:0

SVectors work.

@@ -103,6 +103,8 @@ firstindex(t::NamedTuple) = 1
lastindex(t::NamedTuple) = nfields(t)
getindex(t::NamedTuple, i::Int) = getfield(t, i)
getindex(t::NamedTuple, i::Symbol) = getfield(t, i)
getindex(t::NamedTuple, v::AbstractVector{Symbol}) = [t[i] for i in v]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong and/or surprising. I'd do it as

Suggested change
getindex(t::NamedTuple, v::AbstractVector{Symbol}) = [t[i] for i in v]
getindex(t::NamedTuple, v::AbstractVector{Symbol}) = (; map(i->i => t[i], t)...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, indexing a tuple with a vector gives you back a tuple, so by analogy, indexing a named tuple with a vector should give you back a named tuple.

@ararslan ararslan added needs compat annotation Add !!! compat "Julia x.y" to the docstring needs news A NEWS entry is required for this change needs tests Unit tests are required for this change labels Jul 24, 2019
@andyferris
Copy link
Member

andyferris commented Jul 25, 2019

@ararslan I see what you mean by the analogy with tuples, but maybe it wouldn’t surprise you that I think the non-scalar tuple indexing is slightly wrong :)

Basically I’d love to see the language evolve towards the kinds of ideas in #24019 rather than diffuse away from that. The tuple-vector indexing isn’t so bad because it actually preserves the key values (it happens that Vector and Tuple both use one-based indexing). Here, if indexing with a vector returned a named tuple, say, then the keys of the indexer wouldn’t be those of the output.

(Note: I’m also unclear why named tuples need two sets of indices - symbols and integers - when values is convenient for getting it as a Tuple)

@cmcaine
Copy link
Contributor Author

cmcaine commented Jul 25, 2019

I feel like there needs to be consensus on what outputs to expect from the test cases I listed in order to continue.

Here are those test cases from the earlier comment with outputs as I imagined appropriate for the rule that @andyferris suggested in their Julep:

nt = (x=1, y=3, z=5)
nt[[1,2]] == [1,3]
nt[(1,2)] == (1,2)
nt[1:2] == ???

nt[(:x, :y)] == (1,2)
nt[[:x, :y]] == [1,2]

nt[(a=:x, b=:z)] == (a=1, b=5)
nt[(x=:x, y=:y)] == (x=1, y=3)

# And not this one (because b does not have the indices :x, :y)
nt[(:x, :y)] == (x=1, y=3)

My understanding is that @ararslan is suggesting this:

nt = (x=1, y=3, z=5)
nt[[1,2]] == nt[1:2] == nt[(1,2)] == nt[(:x, :y)] == (x=1, y=3)

# And these are probably errors:
nt[(a=:x, b=:z)]
nt[(x=:x, y=:y)]

@vtjnash
Copy link
Member

vtjnash commented Jul 25, 2019

Closes #27021
See also #31423

@ararslan
Copy link
Member

My understanding is that @ararslan is suggesting this:

nt = (x=1, y=3, z=5)
nt[[1,2]] == nt[1:2] == nt[(1,2)] == nt[(:x, :y)] == (x=1, y=3)

Correct, I am. Though a case could be made to have nt[(:x, :y)] return (1, 3). nt[[:x, :y]] should definitely not be [1, 3] though, c.f. (1, 3)[1:2].

And these are probably errors:

 nt[(a=:x, b=:z)]
 nt[(x=:x, y=:y)]

Yes definitely.

@cmcaine cmcaine force-pushed the index-namedtuples branch 2 times, most recently from ac36608 to 598dd41 Compare July 27, 2019 23:56
@cmcaine
Copy link
Contributor Author

cmcaine commented Jul 27, 2019

Latest commit adds tests and implements this behaviour:

nt = (x=1, y=3, z=5)
nt[[1,2]] == nt[1:2] == nt[[1,2]] == nt[[:x, :y]] == (x=1, y=3)

# Indexing with a tuple is an error, as for Tuples.
nt[(1,2)]
nt[(:x, :y)]

# And these are errors:
nt[(a=:x, b=:z)]
nt[(x=:x, y=:y)]

I've removed the fallback method for now.

@cmcaine cmcaine force-pushed the index-namedtuples branch from 598dd41 to aed81fd Compare August 2, 2019 13:32
@cmcaine
Copy link
Contributor Author

cmcaine commented Aug 2, 2019

Is this blocked waiting on a resolution to #30845 or not?

There are now tests and NEWS. I haven't added a compat note in a docstring because there are no getindex docstrings in namedtuple.jl and it's not clear where one should go. The behaviour is basically the same as that already documented for abstract arrays and implemented for arrays and tuples.

Copy link
Member

@mbauman mbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little annoying, but since Bool <: Integer, we need to also consider the logical indexing case. It's just as easy to support it as disallow it — and as much as it gives me the heeby-geebies, we support it for tuples. So may as well support it here.

Also since you support vectors of Integer, we should probably also support all Integers in the scalar case. I think I got these ad-hoc changes right here.

base/namedtuple.jl Show resolved Hide resolved
base/namedtuple.jl Outdated Show resolved Hide resolved
test/namedtuple.jl Show resolved Hide resolved
@mbauman mbauman added needs docs Documentation for this change is required and removed needs news A NEWS entry is required for this change needs tests Unit tests are required for this change labels Aug 2, 2019
@mbauman
Copy link
Member

mbauman commented Aug 2, 2019

I'm not sure where to best put the docs/compat annotation. I guess we could just add a new method-specific docstring for getindex(::NamedTuple, ...) but it's a little funny in that we now just work more like a cross between the two fully-generic behaviors (both dict-like and array-like) that are already documented.

base/namedtuple.jl Outdated Show resolved Hide resolved
@cmcaine
Copy link
Contributor Author

cmcaine commented Aug 2, 2019

Should we support indexing with LogicalIndex too?

Tuple(1:3)[Base.to_index([true, true, false])] == (1,2)
Tuple(1:3)[Base.to_index([true])] == (1,)

@cmcaine
Copy link
Contributor Author

cmcaine commented Aug 2, 2019

There's this difference of behaviour between tuple and named tuple when indexing with repeated indices too that I'd like to check everyone is OK with.

julia> (1, 2, 3)[[1,1,1]]
(1, 1, 1)

julia> (x=1, y=-1, z=3)[[1,1]]
(x = 1,)

@cmcaine cmcaine force-pushed the index-namedtuples branch from 5c6be15 to 6945414 Compare August 2, 2019 22:28
@cmcaine
Copy link
Contributor Author

cmcaine commented Aug 2, 2019

I've merged the changes suggested by @mbauman and @vtjnash. Thanks!

@cmcaine
Copy link
Contributor Author

cmcaine commented Aug 20, 2019

Is there anything outstanding for this to be merged?

@vtjnash
Copy link
Member

vtjnash commented Aug 20, 2019

I think the main outstanding question is just whether this now does what anyone expected of it. Beyond the generic map-like or broadcast-like (iteration-based or indexing-based) possibilities for this (or deprecating this entirely for all A[]), this proposes a third option which I’ll call NamedTuple-like as it presents a hybrid interface that seems consistent with the ambiguity in the operation of NT (is it a normal tuple with names added to a side-channel, or should it instead act like a normal set of pairs put in a dict with observable ordering guarantees). Should we put this up for triage and/or vote? Must we we wait for further design proposal and analysis work on A.[] replacement before we can adopt this?

@mbauman mbauman added the triage This should be discussed on a triage call label Aug 21, 2019
@JeffBezanson
Copy link
Member

I don't see any examples of compelling use cases for this, and it's not clear how it should work so I think we can wait on this.

@fredrikekre fredrikekre removed the triage This should be discussed on a triage call label Aug 29, 2019
@mbauman
Copy link
Member

mbauman commented Aug 29, 2019

As I reconsider this, I'm now against the semantics as they're implemented here and completely agree with Andy's initial comment (#32664 (comment)).

We've found the idiom "the axes of the indices become the axes of the result" to be a very clear and powerful design to help steer these sorts of questions — and this behavior goes against that. So if (x=1,y=2,z=3)[2:3] doesn't return a named tuple, and instead needs returns something with keys matching keys(2:3), should that thing be a vector or a tuple? I say we not worry about this until we either have a clear use-case or an overarching design strategy (e.g., with #30845 and/or #24019).

@kescobo
Copy link
Contributor

kescobo commented Feb 20, 2023

This PR was linked to from slack just now, and I was confused until I saw how old it is.

I don't think I ever saw this discussion, and it seems like no one that reviewed #38878 did either (or didn't remember it). Perhaps it doesn't actually matter, since that PR only implements getindex on a vector of symbols, but reading this, it seems like maybe there wasn't complete consensus on how this should work, and I'm a bit mortified that it locked in 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs compat annotation Add !!! compat "Julia x.y" to the docstring needs docs Documentation for this change is required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support vectors as indices to NamedTuple
8 participants