Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AxisKeys play nicely with AcceleratedArrays #13

Open
kcajf opened this issue May 9, 2020 · 1 comment
Open

Make AxisKeys play nicely with AcceleratedArrays #13

kcajf opened this issue May 9, 2020 · 1 comment

Comments

@kcajf
Copy link
Contributor

kcajf commented May 9, 2020

Due to how lookups are handled in AK, it seems like in many cases the acceleration of AcceleratedArrays isn't used. For example:

using Dates
using AcceleratedArrays
using AxisKeys

dates = accelerate(Date(2018, 1, 1):Day(1):Date(2019, 2, 3), UniqueSortIndex)

sd = Date(2018, 3, 3)
ed = Date(2018, 8, 4)

i = findall(in(AcceleratedArrays.Interval(sd, ed)), dates) 
println(i)
@assert isa(i, StepRange) # nice, just 2 searchsorteds happened

A = KeyedArray(rand(length(dates)); date = dates)

# Not happy: AxisKeys is blind to AcceleratedArrays.Interval, crashes
# A(AcceleratedArrays.Interval(sd, ed), :)

# Works, but inefficient - the view created is not a range, but a Vector of Ints, since it just does Interval inclusion in a loop:
# https://github.com/mcabbott/AxisKeys.jl/blob/master/src/selectors.jl#L6
A(AxisKeys.Interval(sd, ed), :)

Could the lookup logic in AK be abstracted a bit, to allow other libraries to implement efficient axis lookup for their types? I think findall function is probably the correct interface, for example: https://github.com/kcajf/AcceleratedArrays.jl/blob/master/src/UniqueSortIndex.jl#L135. The currently implemented lookup logic could be reimplemented in terms of findall too, as a fallback.

@mcabbott
Copy link
Owner

mcabbott commented May 9, 2020

I agree it would be nice to make this all work. Trying to recover what's actually going on first, this works:

a = AcceleratedArrays.Interval(sd, ed)
A(in(a), :) # in(...) isa Function, so this goes to:
findall(in(a), dates) # isa StepRange, could be UnitRange

but without the in it gets treated as a scalar, which fails:

a |> typeof |> supertype # , isn't anything recognised
AxisKeys.findindex(a, axiskeys(A,1)) # error, from this:
findfirst(isequal(a), axiskeys(A,1))

For IntervalSets.Interval there's a special findindex method to avoid this:
https://github.com/mcabbott/AxisKeys.jl/blob/master/src/selectors.jl#L6
It doesn't call findall(in(s), r), because that would be an error:

findall(in(IntervalSets.Interval(sd, ed)), dates.parent)
findall(in(IntervalSets.Interval(3,5.5)), 0:10) # same error without Dates

and instead falls back to giving a vector, as you observe. The issue to fix that is JuliaMath/IntervalSets.jl#52 (which should become a PR).

But it remains to figure out what to do with AcceleratedArrays.Interval, without depending on that package. Why does this differ anyway, it's some issue about isless vs <? But it could still subtype AbstractInterval, that would be nice because IntervalSets.jl is a small package for precisely this purpose.

Is there another way to catch this? We could send z::Any to finall(in(z), axiskeys(A,1)) instead of findfirst(isequal(z), axiskeys(A,1)), i.e. treat objects whose type doesn't match eltype of keys as collections, not individual values, but that doesn't seem ideal. For example push!(KeyedArray(rand(3), [:a, :b, :c]), 0.44) has eltype Any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants