-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve middle(::AbstractRange)
performance
#116
Conversation
Codecov Report
@@ Coverage Diff @@
## master #116 +/- ##
==========================================
+ Coverage 96.93% 96.94% +0.01%
==========================================
Files 1 1
Lines 424 426 +2
==========================================
+ Hits 411 413 +2
Misses 13 13
Continue to review full report at Codecov.
|
src/Statistics.jl
Outdated
@@ -801,6 +789,11 @@ julia> middle(a) | |||
""" | |||
middle(a::AbstractArray) = ((v1, v2) = extrema(a); middle(v1, v2)) | |||
|
|||
function middle(a::AbstractRange{<:Real}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a good point about the consistency between real and non-real types but I think this technically constitutes a breaking change, since code could theoretically be relying on the existing behavior. As I understand it, we still need to be quite conservative with changes to existing behavior since this package is versioned in lock-step with Julia itself as part of the stdlib. (Perhaps @nalimilan can correct me if I'm wrong here.)
Perhaps an easier course of action to facilitate #115 would be to define middle
for ranges not to use indexing (which actually seems incorrect for AbstractRange
due to the assumption of the first element being at index 1) and use first
/last
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I've reverted the restriction to real eltypes, so it now preserves previous behavior and this PR just improves middle
's performance by calling mean
. How does it look now?
middle(::AbstractRange)
eltype and improve performancemiddle(::AbstractRange)
performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Though I don't understand what bug you want to fix here. Could you post an example (or add a test) of something that didn't work (or worked incorrectly) but does with the PR?
test/runtests.jl
Outdated
@test_throws Exception middle(Int[]) | ||
@test_throws Exception middle(1:0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@test_throws Exception middle(Int[]) | |
@test_throws Exception middle(1:0) | |
@test_throws MethodError middle(Int[]) | |
@test_throws ArgumentError middle(1:0) |
Can you also add a test covering the behavior you aimed to fix in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that testing for a generic Exception
here would be more semantically appropriate because the fact that empty vectors and ranges return different exceptions is incidental rather than intentional. But I don't mind changing to the concrete exceptions if you prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it's preferable to test the precise exception type you expect since it helps to ensure you're testing the code path you intended to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense. Changed.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Thanks for the review @nalimilan! I've applied some suggestions. Sorry for the confusion about the intent of this PR. This was originally intended to restrict the eltype of
The performance improvement is from eliding bounds checks and avoiding calling julia> using BenchmarkTools
julia> using Statistics
julia> @btime middle($(Ref(1:1))[]);
5.054 ns (0 allocations: 0 bytes)
julia> @btime middle($(Ref(1:1:1))[]);
14.374 ns (0 allocations: 0 bytes) After: julia> @btime middle($(Ref(1:1))[]);
3.627 ns (0 allocations: 0 bytes)
julia> @btime middle($(Ref(1:1:1))[]);
4.341 ns (0 allocations: 0 bytes) I can't really think of a test to add for performance. I wouldn't be adverse to adding a benchmark suite, but I've written two elsewhere (JuliaArrays/StaticArrays.jl#952, JuliaCollections/DataStructures.jl#641) and neither got merged, so I'm hesitant to write another one. 😅 |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Who needs Base anyway? 😄 struct WeirdRange{T} <: AbstractRange{T}
r::UnitRange{T}
end
Base.firstindex(wr::WeirdRange) = -2
Base.lastindex(wr::WeirdRange) = firstindex(wr) + length(wr) - 1
Base.length(wr::WeirdRange) = length(wr.r)
Base.axes(wr::WeirdRange) = (firstindex(wr):lastindex(wr),)
Base.step(wr::WeirdRange) = step(wr.r)
function Base.getindex(wr::WeirdRange, i::Integer)
i in eachindex(wr) || throw(BoundsError(wr, i))
return wr.r[i - firstindex(wr) + 1]
end
wr = WeirdRange(0:3)
middle(wr[1], wr[end]) # WRONG: returns 3.0
mean(wr) # CORRECT: returns 1.5 |
OK. Not sure it's worth testing non-1 based ranges if they don't exist anywhere TBH... But |
Okay, I've added a test for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Currently, the behavior of
middle
is inconsistent between ranges and vectors with non-Real
eltypes:This PR fixes this by restricting
middle(::AbstractRange)
toReal
eltypes. Performance is improved by callingmean
, which elides bounds checks and does not calllength
(which is slow forStepRange
).Also, the doc for
middle(range)
is incorrect since ranges are not always sorted (there might not be a canonical order on the eltype, as above), so I've removed it and moved its example to themiddle(array)
doc.