Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickly find elements of vector covered by interval #52

Closed
jagot opened this issue Jul 15, 2019 · 26 comments
Closed

Quickly find elements of vector covered by interval #52

jagot opened this issue Jul 15, 2019 · 26 comments

Comments

@jagot
Copy link
Member

jagot commented Jul 15, 2019

In various settings, I need to efficiently find the indices of all elements of a sorted vector/range that belong to an interval, e.g. within_interval(1:0.5:8, 2..4) -> 3:7, whereas within_interval(1:0.5:8, Interval{:closed,:open}(2,4) -> 3:6. Would this be an interesting addition to IntervalSets.jl?

For ranges, one can do a little bit of algebra to find the indices [O(1) complexity], whereas for sorted vectors I guess bisection [O(N*log(N))] is the best one can do.

@dlfivefifty
Copy link
Member

Are you sure you don't just want intersect(1:0.5:8, 2..4)?

@jagot
Copy link
Member Author

jagot commented Jul 15, 2019

What the function is called is less important, but I am sure that I need the indices of the elements. intersect(1:0.5:8, 2..4) would conceivably yield the values (which I also need, but those are easy to find using the indices). intersect could be implemented via within_interval.

@dlfivefifty
Copy link
Member

Ah, so you want findall(in(2..4), 1:0.5:8). I.e. a continuous version of

julia> findall(in(-3:3), 1:0.5:5)
3-element Array{Int64,1}:
 1
 3
 5

@dlfivefifty
Copy link
Member

dlfivefifty commented Jul 15, 2019

I'm surprised it doesn't already work (at least returning a Vector):

julia> findall(in(2..4), 1:0.5:8)
ERROR: MethodError: no method matching iterate(::Interval{:closed,:closed,Int64})
Closest candidates are:
  iterate(::Core.SimpleVector) at essentials.jl:568
  iterate(::Core.SimpleVector, ::Any) at essentials.jl:568
  iterate(::ExponentialBackOff) at error.jl:199
  ...
Stacktrace:
 [1] union!(::Set{Int64}, ::Interval{:closed,:closed,Int64}) at ./abstractset.jl:80
 [2] Set{Int64}(::Interval{:closed,:closed,Int64}) at ./set.jl:10
 [3] _Set(::Interval{:closed,:closed,Int64}, ::Base.HasEltype) at ./set.jl:23
 [4] Set(::Interval{:closed,:closed,Int64}) at ./set.jl:21
 [5] _findin(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Interval{:closed,:closed,Int64}) at ./array.jl:2204
 [6] findall(::Base.Fix2{typeof(in),Interval{:closed,:closed,Int64}}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at ./array.jl:2265
 [7] top-level scope at none:0

Though hiding the in works:

julia> findall(x -> in(2..4)(x), 1:0.5:8)
5-element Array{Int64,1}:
 3
 4
 5
 6
 7

EDIT: Looks like we need to override Base._findin.

@jagot
Copy link
Member Author

jagot commented Jul 15, 2019

As long as the complexity is constant for ranges and N*log(N) for sorted vectors, I'm happy. However, I would also like the return value to be a range where possible, although that is inconsistent with findall.

@dlfivefifty
Copy link
Member

There’s no reason to be consistent: any code will be recompiled and it’s highly unlikely that a library code will assume mutability of a find call (which is arguably a bug and they should convert(Vector, ...) to insist on mutability).

@jagot
Copy link
Member Author

jagot commented Jul 15, 2019

Well, if one is considering vectors, intersecting with a sorted vector may return a range, whereas an unsorted vector necessarily returns a vector, so I guess to enforce type stability we can only return vectors (unless we error out if the vector is unsorted?).

@dlfivefifty
Copy link
Member

Oh i thought you meant special casing ranges. If the types are the same then yes we should return the same type.

@dlfivefifty
Copy link
Member

Lazy arrays might be a way to do this:

findall(in(1..2), applied(sort, v))

@jagot
Copy link
Member Author

jagot commented Jul 15, 2019

Where applied(sort, ::Range) will simply be a Range?

@dlfivefifty
Copy link
Member

Only when materialised.

@jagot
Copy link
Member Author

jagot commented Jul 15, 2019

I think I got the implementation working for increasing ranges (most likely wrong for decreasing ranges):

"""
    within_interval(x, interval)

Return the indices of the elements of `x` that lie within the given
closed `interval`.
"""
function within_interval(x::AbstractRange, interval::Interval{L,R}) where {L,R}
    N = length(x)
    δx = step(x)
    l = leftendpoint(interval)
    r = rightendpoint(interval)
    a = max(1, ceil(Int, (l-x[1])/δx) + 1)
    b = min(N, ceil(Int, (r-x[1])/δx))
    a + (x[a] == l && L == :open):b + (b < N && x[b+1] == r && R == :closed)
end

This can be tested with the edge cases I managed to conjure up:

function within_interval_linear(x, interval)
    N = length(x)+1
    i = N
    j = 0
    for (k,e) in enumerate(x)
        if e  interval
            if k < i
                i = k
            else
                j = k
            end
        end
    end

    i:j
end

@testset "Interval coverage" begin
    x = range(0, stop=1, length=21)

    @testset "Two intervals" begin
        @test within_interval(x, 0..0.5) == 1:11 == within_interval_linear(x, 0..0.5)
        @test within_interval(x, Interval{:closed,:open}(0,0.5)) == 1:10
        @test within_interval(x, Interval{:closed,:open}(0,0.5)) == within_interval_linear(x, Interval{:closed,:open}(0,0.5))
        @test within_interval(x, Interval{:closed,:open}(0.25,0.5)) == 6:10
        @test within_interval(x, Interval{:closed,:open}(0.25,0.5)) == within_interval_linear(x, Interval{:closed,:open}(0.25,0.5))
    end
    
    @testset "Three intervals" begin
        @test within_interval(x, Interval{:closed,:open}(0,1/3)) == 1:7
        @test within_interval_linear(x, Interval{:closed,:open}(0,1/3)) == 1:7
        @test within_interval(x, Interval{:closed,:open}(1/3,2/3)) == 8:14
        @test within_interval_linear(x, Interval{:closed,:open}(1/3,2/3)) == 8:14
        @test within_interval(x, 2/3..1) == 15:21
        @test within_interval_linear(x, 2/3..1) == 15:21
    end
    
    @testset "Open interval" begin
        @test within_interval(x, OpenInterval(0.2,0.4)) == 6:8
    end
    
    @testset "Random intervals" begin
        @testset "L=$L" for L=[:closed,:open]
            @testset "R=$R" for R=[:closed,:open]
                for i = 1:1 # 20
                    interval = Interval{L,R}(minmax(rand(),rand())...)
                    @test within_interval(x, interval) == within_interval_linear(x, interval)
                end
            end
        end
    end
end

@mcabbott
Copy link
Contributor

Bump, this would be nice to have. One edge case is:

within_interval(2:2:10, 4..Inf)  # InexactError: trunc(Int64, Inf), from ceil 
within_interval(2:2:10, 4..1e20) # same

It doesn't work at all for negative step, but it shouldn't be hard to just call itself on reverse(x), and adjust.

within_interval(10:-2:2, 4..5)   # 4:3, which is empty
within_interval(10:-2:2, 4..999) # BoundsError: attempt to access 5-element StepRange{Int64,Int64} at index [-493]

@jagot
Copy link
Member Author

jagot commented Mar 25, 2020

This should guard against large values:

function within_interval(x::AbstractRange, interval::Interval{L,R}) where {L,R}
    N = length(x)
    δx = step(x)

    lx,rx = x[1], x[end]
    
    l = max(leftendpoint(interval), lx)
    r = min(rightendpoint(interval), rx)

    (l > rx || r < lx) && return 1:0
    
    a = max(1, ceil(Int, (l-x[1])/δx) + 1)
    b = min(N, ceil(Int, (r-x[1])/δx))
    a + (x[a] == l && L == :open):b + (b < N && x[b+1] == r && R == :closed)
end
@testset "Large intervals" begin
    @test within_interval(2:2:10, 4..Inf) == 2:5
    @test within_interval(2:2:10, 4..1e20) == 2:5
    @test isempty(within_interval(2:2:10, -Inf..(-1e20)))
end

Decreasing ranges yet remain.

@mcabbott
Copy link
Contributor

mcabbott commented Mar 25, 2020

After fiddling a bit, here's a version which handles decreasing ranges, and also offset indexing:

using IntervalSets, Test, OffsetArrays

function within_interval(x::AbstractRange, interval::Interval{L,R}) where {L,R}
    il, ir = firstindex(x), lastindex(x)
    δx = step(x)
    if δx < 0
        rev = within_interval(reverse(x), interval)
        return (il+ir)-last(rev):(il+ir)-first(rev)
    end

    lx, rx = first(x), last(x)
    l = max(leftendpoint(interval), lx-1)
    r = min(rightendpoint(interval), rx+1)

    (l > rx || r < lx) && return 1:0
    
    a = max(il, ceil(Int, (l-lx)/δx) + il)
    b = min(ir, ceil(Int, (r-lx)/δx))
    a + (x[a] == l && L == :open):b + (b < ir && x[b+1] == r && R == :closed)
end

ax1 = axes(OffsetArray(ones(10), -1),1)
eachindex(ax1) == 0:9
ax2 = axes(OffsetArray(ones(12), -2),1)

@testset "with reverse" begin
    for x in [1:10, 1:3:10, 2:3:11, -1:9, -2:0.5:5, ax1, ax2]
        for lo in -3:4, hi in 5:13
            for L in [:closed, :open], R in [:closed, :open]
                int = Interval{L,R}(lo,hi)
                @test within_interval(x, int) == findall(i -> x[i] in int, eachindex(x))
                r = reverse(x)
                @test within_interval(r, int) == findall(i -> r[i] in int, eachindex(r))
            end
        end
    end
end

Base.findall(p::Base.Fix2{typeof(in),<:Interval}, r::AbstractRange) = within_interval(r, p.x)

@mcabbott
Copy link
Contributor

mcabbott commented May 9, 2020

@jagot, what thoughts on making this a PR?

@jagot
Copy link
Member Author

jagot commented May 9, 2020

Yes, I have been meaning to get around to it.

  • Are we happy with the name?
  • Have we covered all bases? (Does the reverse version work fully? I have not yet had the need for it)
  • @dlfivefifty Any comments?

@dlfivefifty
Copy link
Member

Why isn’t it intersect ?

@jagot
Copy link
Member Author

jagot commented May 9, 2020

#52 (comment)

What the function is called is less important, but I am sure that I need the indices of the elements. intersect(1:0.5:8, 2..4) would conceivably yield the values (which I also need, but those are easy to find using the indices). intersect could be implemented via within_interval.

@dlfivefifty
Copy link
Member

dlfivefifty commented May 9, 2020

findall(in(2..4), 1:0.5:8), e.g., replicating:

julia> findall(in(2:3),1:0.5:8)
2-element Array{Int64,1}:
 3
 5

@jagot
Copy link
Member Author

jagot commented May 9, 2020

So basically

Base.findall(i::Interval, v::AbstractVector) = ...

?

Feels more natural, I agree

@dlfivefifty
Copy link
Member

No, within_interval(x::AbstractRange, interval::Interval) would be defined instead as:

function Base.findall(in_d::Base.Fix2{typeof(in),Interval{:closed,:closed,Int64}}, x::AbstractVector) 
interval = in_d.x
...
end

@mcabbott
Copy link
Contributor

mcabbott commented May 9, 2020

Right, I don't think we should do findall(i::Interval, v::AbstractVector) since i isn't callable. The name within_interval seems fine, it won't be exported anyway. (Or I see you just avoid it, which is fine too.)

I don't remember the details, but tried hard to test many reversed cases above. Including the extra-weird ranges which are the axes of OffsetArrays.

@dlfivefifty
Copy link
Member

Please read the comment more closely: in(2..4) is callable

@mcabbott
Copy link
Contributor

mcabbott commented May 9, 2020

I did! I was objecting to the form without the in, hence agreeing with you.

@jagot
Copy link
Member Author

jagot commented May 9, 2020

I implemented the in form in the PR. I deliberately left out the OffsetArrays examples, but it's easy enough to add the extra dependency to the test target.

@jagot jagot closed this as completed in 162bd0d May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants