Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve middle(::AbstractRange) performance #116

Merged
merged 7 commits into from
Jun 21, 2022
Merged

Conversation

vyu
Copy link
Contributor

@vyu vyu commented Jun 15, 2022

Currently, the behavior of middle is inconsistent between ranges and vectors with non-Real eltypes:

julia> using Statistics

julia> r = LinRange(1im, 2im, 2)
2-element LinRange{ComplexF64, Int64}:
 0.0+1.0im,0.0+2.0im

julia> middle(r)
0.0 + 1.5im

julia> middle(collect(r))
ERROR: MethodError: no method matching isless(::ComplexF64, ::ComplexF64)
[...]

This PR fixes this by restricting middle(::AbstractRange) to Real eltypes. Performance is improved by calling mean, which elides bounds checks and does not call length (which is slow for StepRange).

Also, the doc for middle(range) is incorrect since ranges are not always sorted (there might not be a canonical order on the eltype, as above), so I've removed it and moved its example to the middle(array) doc.

@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2022

Codecov Report

Merging #116 (17bc7a7) into master (576db0f) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   96.93%   96.94%   +0.01%     
==========================================
  Files           1        1              
  Lines         424      426       +2     
==========================================
+ Hits          411      413       +2     
  Misses         13       13              
Impacted Files Coverage Δ
src/Statistics.jl 96.94% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 576db0f...17bc7a7. Read the comment docs.

@@ -801,6 +789,11 @@ julia> middle(a)
"""
middle(a::AbstractArray) = ((v1, v2) = extrema(a); middle(v1, v2))

function middle(a::AbstractRange{<:Real})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point about the consistency between real and non-real types but I think this technically constitutes a breaking change, since code could theoretically be relying on the existing behavior. As I understand it, we still need to be quite conservative with changes to existing behavior since this package is versioned in lock-step with Julia itself as part of the stdlib. (Perhaps @nalimilan can correct me if I'm wrong here.)

Perhaps an easier course of action to facilitate #115 would be to define middle for ranges not to use indexing (which actually seems incorrect for AbstractRange due to the assumption of the first element being at index 1) and use first/last instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I've reverted the restriction to real eltypes, so it now preserves previous behavior and this PR just improves middle's performance by calling mean. How does it look now?

@vyu vyu changed the title Restrict middle(::AbstractRange) eltype and improve performance Improve middle(::AbstractRange) performance Jun 16, 2022
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Though I don't understand what bug you want to fix here. Could you post an example (or add a test) of something that didn't work (or worked incorrectly) but does with the PR?

test/runtests.jl Outdated
Comment on lines 25 to 26
@test_throws Exception middle(Int[])
@test_throws Exception middle(1:0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@test_throws Exception middle(Int[])
@test_throws Exception middle(1:0)
@test_throws MethodError middle(Int[])
@test_throws ArgumentError middle(1:0)

Can you also add a test covering the behavior you aimed to fix in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that testing for a generic Exception here would be more semantically appropriate because the fact that empty vectors and ranges return different exceptions is incidental rather than intentional. But I don't mind changing to the concrete exceptions if you prefer that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it's preferable to test the precise exception type you expect since it helps to ensure you're testing the code path you intended to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense. Changed.

vyu and others added 2 commits June 16, 2022 14:23
@vyu
Copy link
Contributor Author

vyu commented Jun 16, 2022

Thanks for the review @nalimilan! I've applied some suggestions.

Sorry for the confusion about the intent of this PR. This was originally intended to restrict the eltype of middle(::AbstractRange) as well, but @ararslan pointed out that's a breaking change, so I've removed that and this PR now has the following effects:

  • Improve the performance of middle(::AbstractRange), which incidentally fixes two unlikely bugs:
    • Fix middle for non-1-based ranges as @ararslan pointed out, but the AbstractRanges in Base are all 1-based so I don't have a simple test for this.
    • Fix middle for ranges with non-representable lengths (e.g., middle(0:typemax(Int)) errors before this PR). But this is an obscure interaction that I hope to remove in Base by throwing an exception on construction of such ranges, which would break tests that I add here.
  • Remove incorrect middle(range) doc.

The performance improvement is from eliding bounds checks and avoiding calling length (from a[end], which calls lastindex(a)), which is slow for StepRange as it requires an integer division. Before:

julia> using BenchmarkTools

julia> using Statistics

julia> @btime middle($(Ref(1:1))[]);
  5.054 ns (0 allocations: 0 bytes)

julia> @btime middle($(Ref(1:1:1))[]);
  14.374 ns (0 allocations: 0 bytes)

After:

julia> @btime middle($(Ref(1:1))[]);
  3.627 ns (0 allocations: 0 bytes)

julia> @btime middle($(Ref(1:1:1))[]);
  4.341 ns (0 allocations: 0 bytes)

I can't really think of a test to add for performance. I wouldn't be adverse to adding a benchmark suite, but I've written two elsewhere (JuliaArrays/StaticArrays.jl#952, JuliaCollections/DataStructures.jl#641) and neither got merged, so I'm hesitant to write another one. 😅

Co-authored-by: Milan Bouchet-Valat <[email protected]>
@ararslan
Copy link
Member

Fix middle for non-1-based ranges as @ararslan pointed out, but the AbstractRanges in Base are all 1-based so I don't have a simple test for this.

Who needs Base anyway? 😄

struct WeirdRange{T} <: AbstractRange{T}
    r::UnitRange{T}
end

Base.firstindex(wr::WeirdRange) = -2

Base.lastindex(wr::WeirdRange) = firstindex(wr) + length(wr) - 1

Base.length(wr::WeirdRange) = length(wr.r)

Base.axes(wr::WeirdRange) = (firstindex(wr):lastindex(wr),)

Base.step(wr::WeirdRange) = step(wr.r)

function Base.getindex(wr::WeirdRange, i::Integer)
    i in eachindex(wr) || throw(BoundsError(wr, i))
    return wr.r[i - firstindex(wr) + 1]
end

wr = WeirdRange(0:3)
middle(wr[1], wr[end])  # WRONG: returns 3.0
mean(wr)                # CORRECT: returns 1.5

@nalimilan
Copy link
Member

OK. Not sure it's worth testing non-1 based ranges if they don't exist anywhere TBH... But middle(0:typemax(Int)) sounds easy enough to test.

@vyu
Copy link
Contributor Author

vyu commented Jun 19, 2022

Okay, I've added a test for middle(0:typemax(Int)). Also fixed a CI failure on 1.7 due to a change of exception type in 1.8 by adding a version check.

@nalimilan nalimilan requested a review from ararslan June 20, 2022 20:40
Copy link
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@ararslan ararslan merged commit 0588f2c into JuliaStats:master Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants