-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dictionary
that can be indexed either with a key or with an integer
#113
Comments
@ParadaCarleton I'm not 100% sure what the suggestion is - do you have a code example? For the moment, with |
As in, add a new data structure (
Alternatively, we could have integer indexing work for a normal |
So overloading It is necessary to do one of these:
Does that make sense? |
|
Hmm. While I can see why you could desire that and see it as a worthwhile tradeoff, I can't see that we'll be able to cleanly support a second meaning to For other meanings we need to use another interface. For example we have For a generic iterable (that supports In the very short term, keep in mind that often with a |
To be clear where my thinking is at, I think this is a case where it's cleaner to use a new generic function as the interface rather than a wrapper type. With the wrapper type you'd have to keep in mind that it is wrapped when you use Also I did some digging - |
The main point of this issue was to let code written with the assumption that Vectors/Tables/etc. will be indexed by integers work when possible (and less importantly to let indexing with integers as easy as with keys). I don't think there's much mental overhead added as long as we throw a type error whenever users try constructing anything weird--"This dictionary can't hold integer keys" isn't much more complicated to deal with than "This dictionary can only hold keys of a single type." |
I think I've found a very nice solution in this package; would you be interested in a PR to support it? |
I've been thinking about similar things in relation to designing labelled graph types, i.e. a graph with vertex labels that could be anything, as opposed to the Something I've considered is having an indexing type1, so then I could use syntax like Relatedly, in the case I describe above, the Dictionary has a very particular structure, which is that the keys can be anything but the values are always Footnotes
|
I think that ideally, the best combination would be something like OrdinalIndices.jl (for explicitly indexing at a fixed position) and DimensionalData.jl (for indexing with symbols, strings, etc.). (But in the case of graphs, I'm not sure what the natural ordering on those would be?) |
Good point that this is related to DimensionalData.jl and other packages with named dimensions/indices (like AxisKeys.jl, which seems a bit more flexible?). Here's a fun demonstration: using AxisKeys
using BenchmarkTools
using SparseArrays
using Dictionaries
using UniqueVectors
struct IndicesVector{I} <: AbstractVector{I}
inds::Indices{I}
end
IndicesVector(data::Vector) = IndicesVector(Indices(data))
Base.size(inds::IndicesVector) = (length(inds.inds),)
Base.getindex(inds::IndicesVector, index::Integer) = inds.inds.values[index]
Base.findfirst(pattern::Base.Fix2{typeof(isequal)}, inds::IndicesVector) = gettoken(inds.inds, pattern.x)[2][2] Which allows us to do: julia> A = KeyedArray(sprandn(4, 4, 0.5), (IndicesVector(["a", "b", "c", "d"]), IndicesVector(["a", "b", "c", "d"])))
2-dimensional KeyedArray(...) with keys:
↓ 4-element IndicesVector{String}
→ 4-element IndicesVector{String}
And data, 4×4 SparseMatrixCSC{Float64, Int64} with 7 stored entries:
("a") ("b") ("c") ("d")
("a") 0.0 0.0 0.0 -0.510721
("b") 0.0 0.0 0.600002 0.653016
("c") 0.0 0.0592783 2.38575 0.0
("d") -0.899354 0.0 -0.101834 0.0
julia> A("a", "d")
-0.5107207258723061 And a benchmark: function main(d)
inds_data = string.(1:d)
data = sprandn(d, d, 0.1)
inds = IndicesVector(inds_data)
A = KeyedArray(data, (inds, inds))
@btime $A($("50"), $("50"))
inds = UniqueVector(inds_data)
A = KeyedArray(data, (inds, inds))
@btime $A($("50"), $("50"))
A = KeyedArray(data, (inds_data, inds_data))
@btime $A($("50"), $("50"))
end which outputs: julia> main(100)
66.224 ns (0 allocations: 0 bytes)
53.414 ns (0 allocations: 0 bytes)
314.233 ns (0 allocations: 0 bytes) Basically a labelled/named graph (i.e. a graph with general vertex labels or names) can be represented by the object I'm creating above, a sparse array encoding the adjacency matrix and then labelled or named dimensions. UniqueVectors.jl seems to be closely related to the |
Yes, sometimes you want an "array that can be either indexed with a integer or a key" rather than a "dictionary that can be either indexed with a key or an integer" :) Especially if you want the integer indices to be dense. Another package in this space that you might consider is AcceleratedArrays.jl (shameless plug; note it hasn't received much attention for a while). |
Both packages are almost identical, except that DimensionalData.jl is more fully-featured and fully-developed at this point; I generally recommend it because it's more widely used throughout the rest of the ecosystem. (And if you only need to label columns, DataFrames.jl and StructArrays.jl are better-supported.) The difference between array-labeling and a Dictionary is whether you plan to be doing a lot of insertions at runtime. DimensionalData.jl et al. are fundamentally array packages, which means they're designed for (mostly) static axes. When you index an array with something like |
Could be useful for something like
DictTables
--it'd be super useful if I could index thei
th row of aDictTable
with an integer, or instead index it with a symbol. (For this type, integer keys shouldn't be allowed, as that obviously causes ambiguity.)The text was updated successfully, but these errors were encountered: