-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace dynamic dispatch with runtime branch on rev
keyword in sortperm
#47966
Conversation
@nanosoldier |
This improves the situation with #47811, but a regression remains for sizes 32–256. julia> for i in 1:12
n = 2^i
print(lpad(n, 4)); @btime sortperm(x) setup=(x=rand($n));
end
2 224.951 ns (2 allocations: 96 bytes) => 481.651 ns (4 allocations: 128 bytes) => 52.206 ns (1 allocation: 80 bytes)
4 236.767 ns (2 allocations: 112 bytes) => 494.113 ns (4 allocations: 144 bytes) => 64.007 ns (1 allocation: 96 bytes)
8 261.952 ns (2 allocations: 144 bytes) => 523.632 ns (4 allocations: 176 bytes) => 87.860 ns (1 allocation: 128 bytes)
16 315.431 ns (2 allocations: 208 bytes) => 726.031 ns (5 allocations: 432 bytes) => 246.363 ns (2 allocations: 384 bytes)
32 482.479 ns (2 allocations: 352 bytes) => 1.218 μs (5 allocations: 720 bytes) => 684.515 ns (2 allocations: 672 bytes)
64 858.757 ns (2 allocations: 592 bytes) => 2.011 μs (5 allocations: 1.17 KiB) => 1.609 μs (2 allocations: 1.12 KiB)
128 2.150 μs (2 allocations: 1.08 KiB) => 3.818 μs (5 allocations: 2.17 KiB) => 3.446 μs (2 allocations: 2.12 KiB)
256 5.314 μs (2 allocations: 2.14 KiB) => 7.808 μs (5 allocations: 4.30 KiB) => 7.536 μs (2 allocations: 4.25 KiB)
512 15.903 μs (2 allocations: 4.14 KiB) => 15.624 μs (5 allocations: 8.30 KiB) => 15.077 μs (2 allocations: 8.25 KiB)
1024 45.238 μs (2 allocations: 8.14 KiB) => 32.604 μs (5 allocations: 16.30 KiB) => 33.118 μs (2 allocations: 16.25 KiB)
2048 99.861 μs (2 allocations: 16.14 KiB) => 69.427 μs (5 allocations: 32.30 KiB) => 70.779 μs (2 allocations: 32.25 KiB)
4096 219.132 μs (3 allocations: 32.06 KiB) => 150.960 μs (7 allocations: 64.14 KiB) => 152.937 μs (4 allocations: 64.09 KiB)
julia> VERSION
v"1.8.3" => v"1.10.0-DEV.181" => PR |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
Performance diff vs master on JuliaCI/BaseBenchmarks.jl#305 is free of regressions and shows some improvements
|
As expected, @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
There are no remaining regressions vs 1.8 in BaseBenchmarks' Some |
Bump |
ready to merge? |
I think so; I'm not entirely sure how this PR improves the situation, but I'm pretty confident that it does. I imagine compiler heuristics about inlining and constant propagation are involved.
|
base/sort.jl
Outdated
end | ||
|
||
# TODO stop using these three hacks | ||
# but check performance, especially unexpected allocations, when removing | ||
Base.@assume_effects :nothrow very_unsafe_copyto!(a, b) = a .= b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is scary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... @aviatesk, is this sort of thing acceptable or is there another way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My local test shows that perhaps @noinline
is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why we do code review. Thanks!
I think I introduced these when they were necessary and then added branching on rev
which is more impactful and fixes the root of the issue so now none of these hacks are necessary at all, and CI checks for unexpected allocations automatically :)
Thanks @N5N3 for pointing out that they are unnecessary
rev
keyword in sortperm
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
This is good to go now, right? |
Comparing to before the recent changes, when performance was good to go but there were still unnecessary unsafe operations. @nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. |
lgtm |
This eliminates unexpected allocations in
sortperm
andsortperm!
, sometimes substantially improving runtime for small inputs:Fixes #47949.
Outdated OP:
I think this might avoid compiler heuristics that arise due to keyword arguments in sortperm, but I still don't understand why the allocations were there in the first place. Fixes #47949.- add strange hacks to avoid compiler heuristics?- add allocation testsThe situation is still not perfect (e.g.@test_broken 0 == @allocations sortperm!(i, v, rev=true)
), but it is better according to allocation tests and some benchmarks.