Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use old searchsorted() implementation until julia 1.11 #168

Closed
wants to merge 4 commits into from

Conversation

aplavin
Copy link
Contributor

@aplavin aplavin commented Nov 9, 2023

Looks like the recently merged #111 has a significant overhead on Julia 1.10(rc), 1.9, and earlier. It's only in 1.11 with JuliaLang/julia#50365 where the overhead disappears.
Here, I revert to the old searchsorted_interval implementation for older Julias.

Not just ranges – probably any type that is not an Array (?):

needle = 1..10

haystack = 1:10^5
@btime searchsorted_interval($haystack, $needle)
# IntervalSets 0.7.7 or this PR:
  2.541 ns (0 allocations: 0 bytes)
# IntervalSets 0.7.8:
  83.808 ns (2 allocations: 64 bytes)

using FlexiMaps
haystack = mapview(x -> x, collect(1:10^5))
@btime searchsorted_interval($haystack, $needle)
# IntervalSets 0.7.7 or this PR:
   50.997 ns (0 allocations: 0 bytes)
# IntervalSets 0.7.8:
  123.564 ns (2 allocations: 32 bytes)

For a package that's supposed to be used in all kinds of code, including performance-sensitive, these are highly noticeable.

@hyrodium
Copy link
Collaborator

hyrodium commented Nov 10, 2023

Reverting the function may break the zero-step range test (#100), right?

I think adding the following description to the documentation would be reasonable.


There was a performance degression in IntervalSets.jl v0.7.8 due to the Base implementation (JuliaLang/julia#50365).

  • For users who really care about the performance
    • and would like to use the latest version of this package:
      • Use upcoming Julia v1.11.
    • and are using Julia earlier than v1.11:
      • Install this package with ]add IntervalSets#v0.7.7
      • Or, add IntervalSets = "=0.7.7" to compat table in your Project.toml.
  • For users who don't care about the performance:
    • Install the latest version of IntervalSets.jl on your Julia environment as usual.

See #111 for more discussion.

@dlfivefifty
Copy link
Member

I agree with @hyrodium: trying to keep performance high on every supported version of Julia is going to add a lot of burden to both the code writer and maintainers.

Is there evidence that this code is impacting other packages out in the wild? If not I think if its fine if its slow until Julia v1.11.

@aplavin
Copy link
Contributor Author

aplavin commented Nov 10, 2023

trying to keep performance high on every supported version of Julia is going to add a lot of burden to both the code writer and maintainers

On every Julia version – yeah probably it's too much to ask for. But as it is now, the regression happens on all released Julia versions, and it will be the case for another ~year to come (even 1.10 isn't released yet, so who knows when 1.11 will be).

For users who don't care about the performance: ...

Pretty sure lots of Julia users do :)

Is there evidence that this code is impacting other packages out in the wild?

I actually found this regression by running FlexiJoins.jl tests, and noticed they fail because of way more allocations. Had to restrict IntervalSets compat (JuliaRegistries/General#95080) to avoid this performance drop.

Well I guess if the decision is to keep this regression in IntervalSets, I'll just keep using 0.7.7 for another year or two (until 1.10- becomes irrelevant). In my view, regressions (and fixing them whenever possible) have higher priorities than potential new features.

@dlfivefifty
Copy link
Member

Ah ok thanks for the explanation! Yes I agree if its impacting your code we should def fix it

@aplavin
Copy link
Contributor Author

aplavin commented Nov 11, 2023

Fixing zero-step ranges seems trivial, should be ready to go now.

Copy link

codecov bot commented Nov 11, 2023

Codecov Report

Attention: Patch coverage is 79.48718% with 8 lines in your changes missing coverage. Please review.

Project coverage is 96.06%. Comparing base (9f3b49a) to head (8401b59).
Report is 17 commits behind head on master.

Files with missing lines Patch % Lines
src/findall.jl 79.48% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #168      +/-   ##
==========================================
- Coverage   99.10%   96.06%   -3.05%     
==========================================
  Files           4        4              
  Lines         224      254      +30     
==========================================
+ Hits          222      244      +22     
- Misses          2       10       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +116 to +119
searchsorted_interval(X, i::Interval{:closed, :closed}) = searchsortedfirst(X, leftendpoint(i)) :searchsortedlast(X, rightendpoint(i))
searchsorted_interval(X, i::Interval{:closed, :open}) = searchsortedfirst(X, leftendpoint(i)) :(searchsortedfirst(X, rightendpoint(i)) - 1)
searchsorted_interval(X, i::Interval{ :open, :closed}) = (searchsortedlast(X, leftendpoint(i)) + 1):searchsortedlast(X, rightendpoint(i))
searchsorted_interval(X, i::Interval{ :open, :open}) = (searchsortedlast(X, leftendpoint(i)) + 1):(searchsortedfirst(X, rightendpoint(i)) - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the keyword argument rev, that would be breaking. Could you add the keyword argument, or bump the version to v0.8.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev support was added in #111, and it's exactly the change that causes performance regression

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the situation. If we revert #111, then we lose rev, and that must be considered as a breaking change. If you would like to avoid the breaking release, this PR should implement rev keyword argument without performance regression.

I'm okay with both choices: breaking release or adding the rev argument.

else
function searchsorted_interval(X, i::Interval{L, R}; rev=false) where {L, R}
if rev === true
_searchsorted_begin(X, rightendpoint(i), Val(R); rev):_searchsorted_end(X, leftendpoint(i), Val(L); rev)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not tested on CI. Could you remove this # in CI.yml?

@hyrodium
Copy link
Collaborator

I'm not sure that reopening the issue with Unitful.jl is considered as breaking.
#111 (comment)

@aplavin
Copy link
Contributor Author

aplavin commented Jan 3, 2024

I'm generally indifferent to which decision is made regarding this PR (eg, release "breaking" version, or "non-breaking" one). I find it highly improbable that anyone already relies on features from #111, so would slightly prefer a "non-breaking" update even if technically breaking, just to reduce the burden across the ecosystem.
So, leaving it to IntervalSets devs.

And even if this PR ends up simply closed I myself will be fine with that: will just use IntervalSets 0.7.7 until Julia 1.9 and 1.10 becomes irrelevant for me, then switch to Julia 1.11 and update IntervalSets. It's unlikely that IntervalSets gets many new important features in the next year, it's quite a stable package.
Still, more generally I find this scenario unfortunate, especially given that 1.10 is likely to become the LTS version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants