Replace dynamic dispatch with runtime branch on `rev` keyword in sortperm #47966

LilithHafner · 2022-12-22T11:56:57Z

This eliminates unexpected allocations in sortperm and sortperm!, sometimes substantially improving runtime for small inputs:

julia> v = rand(10); ix = similar(eachindex(v));

julia> @btime sortperm!(copyto!($ix, eachindex($v)), $v);
  571.896 ns (3 allocations: 48 bytes) => 106.496 ns (0 allocations: 0 bytes)

julia> @btime sortperm($v);
  598.118 ns (4 allocations: 192 bytes) => 139.678 ns (1 allocation: 144 bytes)

Fixes #47949.

Outdated OP:

~~I think this might avoid compiler heuristics that arise due to keyword arguments in sortperm, but I still don't understand why the allocations were there in the first place. Fixes #47949.~~

~~- add strange hacks to avoid compiler heuristics?~~
~~- add allocation tests~~

~~The situation is still not perfect (e.g. @test_broken 0 == @allocations sortperm!(i, v, rev=true)), but it is better according to allocation tests and some benchmarks.~~

LilithHafner · 2022-12-22T12:18:46Z

@nanosoldier runbenchmarks("sort", vs=":master")

LilithHafner · 2022-12-22T12:31:19Z

This improves the situation with #47811, but a regression remains for sizes 32–256.

julia> for i in 1:12
           n = 2^i
           print(lpad(n, 4)); @btime sortperm(x) setup=(x=rand($n));
       end
   2  224.951 ns (2 allocations: 96 bytes)  => 481.651 ns (4 allocations: 128 bytes) => 52.206 ns (1 allocation: 80 bytes)
   4  236.767 ns (2 allocations: 112 bytes) => 494.113 ns (4 allocations: 144 bytes) => 64.007 ns (1 allocation: 96 bytes)
   8  261.952 ns (2 allocations: 144 bytes) => 523.632 ns (4 allocations: 176 bytes) => 87.860 ns (1 allocation: 128 bytes)
  16  315.431 ns (2 allocations: 208 bytes) => 726.031 ns (5 allocations: 432 bytes) => 246.363 ns (2 allocations: 384 bytes)
  32  482.479 ns (2 allocations: 352 bytes) => 1.218 μs (5 allocations: 720 bytes)   => 684.515 ns (2 allocations: 672 bytes)
  64  858.757 ns (2 allocations: 592 bytes) => 2.011 μs (5 allocations: 1.17 KiB)    => 1.609 μs (2 allocations: 1.12 KiB)
 128  2.150 μs (2 allocations: 1.08 KiB)    => 3.818 μs (5 allocations: 2.17 KiB)    => 3.446 μs (2 allocations: 2.12 KiB)
 256  5.314 μs (2 allocations: 2.14 KiB)    => 7.808 μs (5 allocations: 4.30 KiB)    => 7.536 μs (2 allocations: 4.25 KiB)
 512  15.903 μs (2 allocations: 4.14 KiB)   => 15.624 μs (5 allocations: 8.30 KiB)   => 15.077 μs (2 allocations: 8.25 KiB)
1024  45.238 μs (2 allocations: 8.14 KiB)   => 32.604 μs (5 allocations: 16.30 KiB)  => 33.118 μs (2 allocations: 16.25 KiB)
2048  99.861 μs (2 allocations: 16.14 KiB)  => 69.427 μs (5 allocations: 32.30 KiB)  => 70.779 μs (2 allocations: 32.25 KiB)
4096  219.132 μs (3 allocations: 32.06 KiB) => 150.960 μs (7 allocations: 64.14 KiB) => 152.937 μs (4 allocations: 64.09 KiB)

julia> VERSION
v"1.8.3" => v"1.10.0-DEV.181" => PR

nanosoldier · 2022-12-22T12:55:18Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

LilithHafner · 2022-12-22T13:14:20Z

Performance diff vs master on JuliaCI/BaseBenchmarks.jl#305 is free of regressions and shows some improvements

"insertionsort", "sortperm! reverse" => TrialJudgement(-23.35% => improvement)
"insertionsort", "sortperm forwards" => TrialJudgement(-18.31% => invariant)
"length = 3", "sortperm(rand(length))" => TrialJudgement(-80.03% => improvement)
"length = 10", "sortperm(rand(length))" => TrialJudgement(-68.33% => improvement)
"length = 10", "mixed eltype with by order" => TrialJudgement(-20.71% => improvement)
"length = 30", "sortperm(rand(length))" => TrialJudgement(-44.86% => improvement)
"length = 100", "sortperm(rand(length))" => TrialJudgement(-13.21% => invariant)
"issues", "sortperm on a view (Float64)" => TrialJudgement(-9.12% => invariant)

LilithHafner · 2022-12-22T13:17:51Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

As expected, sortperm! allocations go to 0 for insertion & quick sorts when not reversed

@nanosoldier runbenchmarks("sort", vs = ":release-1.8")

nanosoldier · 2022-12-22T13:54:27Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

LilithHafner · 2022-12-22T20:32:05Z

@nanosoldier runbenchmarks("sort", vs = ":release-1.8")

nanosoldier · 2022-12-22T21:08:34Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

LilithHafner · 2022-12-22T23:41:12Z

There are no remaining regressions vs 1.8 in BaseBenchmarks' sortperm and sorpterm! benchmarks!

Some sort regressions are still detected and some sortperm regressions remain and are detected by the more diverse JuliaCI/BaseBenchmarks.jl#305 benchmarks.

LilithHafner · 2022-12-27T14:26:44Z

Bump

oscardssmith · 2022-12-27T14:33:09Z

ready to merge?

LilithHafner · 2022-12-27T14:37:38Z

I think so; I'm not entirely sure how this PR improves the situation, but I'm pretty confident that it does. I imagine compiler heuristics about inlining and constant propagation are involved.

~~I'd be comfortable with merging this as is, but~~ it would be better to have someone who understands the exact mechanisms at play here.

oscardssmith · 2022-12-27T14:42:54Z

base/sort.jl

 end

+# TODO stop using these three hacks
+# but check performance, especially unexpected allocations, when removing
+Base.@assume_effects :nothrow very_unsafe_copyto!(a, b) = a .= b


this is scary

Yeah... @aviatesk, is this sort of thing acceptable or is there another way?

My local test shows that perhaps @noinline is enough?

This is why we do code review. Thanks!

I think I introduced these when they were necessary and then added branching on rev which is more impactful and fixes the root of the issue so now none of these hacks are necessary at all, and CI checks for unexpected allocations automatically :)

@N5N3

Thanks @N5N3 for pointing out that they are unnecessary

…ormace

LilithHafner · 2022-12-27T18:21:52Z

@nanosoldier runbenchmarks("sort", vs=":master")

nanosoldier · 2022-12-27T19:00:20Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

KristofferC · 2022-12-27T20:05:17Z

This is good to go now, right?

LilithHafner · 2022-12-27T21:23:54Z

Comparing to before the recent changes, when performance was good to go but there were still unnecessary unsafe operations.

@nanosoldier runbenchmarks("sort", vs="@4191e27f71ee7a32e5e6b2a932603ce341070ed1")

nanosoldier · 2022-12-27T22:01:15Z

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

LilithHafner · 2022-12-27T22:21:31Z

lgtm

…perm (#47966) (cherry picked from commit 6ac5159)

Lilith Hafner added 2 commits December 22, 2022 05:45

add strange hacks to avoid compiler heuristics?

70d9aef

add allocation tests

62a5729

LilithHafner added performance Must go faster sorting Put things in order backport 1.9 Change should be backported to release-1.9 labels Dec 22, 2022

LilithHafner marked this pull request as ready for review December 22, 2022 12:20

Lilith Hafner added 2 commits December 22, 2022 08:52

elimiate allocations in the rev::Bool case as well

3ae8d9c

compile before allocation test

f06d197

fixup: fix typo in added tests

4191e27

LilithHafner mentioned this pull request Dec 23, 2022

Make QuickerSort efficient for non-homogonous eltype #47973

Merged

Merge branch 'master' into sortperm-allocs

16136d0

oscardssmith reviewed Dec 27, 2022

View reviewed changes

LilithHafner and others added 2 commits December 27, 2022 10:23

Remove the hacks

9616d05

Thanks @N5N3 for pointing out that they are unnecessary

revert more superfluous changes

da6d39d

LilithHafner added the needs nanosoldier run This PR should have benchmarks run on it label Dec 27, 2022

fixup & put back @inline because it is sometimes necessary for perf…

2a98fbb

…ormace

LilithHafner changed the title ~~Reduce unexpected allocations in sortperm~~ Replace dynamic dispatch with runtime branch on rev keyword in sortperm Dec 27, 2022

LilithHafner mentioned this pull request Dec 27, 2022

"fib" microbenchmark gives false positives JuliaCI/BaseBenchmarks.jl#311

Open

LilithHafner removed the needs nanosoldier run This PR should have benchmarks run on it label Dec 27, 2022

KristofferC merged commit 6ac5159 into master Dec 28, 2022

KristofferC deleted the sortperm-allocs branch December 28, 2022 19:47

KristofferC pushed a commit that referenced this pull request Dec 28, 2022

Replace dynamic dispatch with runtime branch on rev keyword in sort…

ef913f8

…perm (#47966) (cherry picked from commit 6ac5159)

KristofferC mentioned this pull request Dec 28, 2022

Backports for 1.9.0-beta2 #48026

Merged

14 tasks

KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Jan 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace dynamic dispatch with runtime branch on `rev` keyword in sortperm #47966

Replace dynamic dispatch with runtime branch on `rev` keyword in sortperm #47966

LilithHafner commented Dec 22, 2022 •

edited

Loading

LilithHafner commented Dec 22, 2022

LilithHafner commented Dec 22, 2022

nanosoldier commented Dec 22, 2022

LilithHafner commented Dec 22, 2022

LilithHafner commented Dec 22, 2022 •

edited

Loading

nanosoldier commented Dec 22, 2022

LilithHafner commented Dec 22, 2022

nanosoldier commented Dec 22, 2022

LilithHafner commented Dec 22, 2022

LilithHafner commented Dec 27, 2022

oscardssmith commented Dec 27, 2022

LilithHafner commented Dec 27, 2022 •

edited

Loading

oscardssmith Dec 27, 2022

LilithHafner Dec 27, 2022

N5N3 Dec 27, 2022

LilithHafner Dec 27, 2022

LilithHafner commented Dec 27, 2022

nanosoldier commented Dec 27, 2022

KristofferC commented Dec 27, 2022

LilithHafner commented Dec 27, 2022

nanosoldier commented Dec 27, 2022

LilithHafner commented Dec 27, 2022

Replace dynamic dispatch with runtime branch on rev keyword in sortperm #47966

Replace dynamic dispatch with runtime branch on rev keyword in sortperm #47966

Conversation

LilithHafner commented Dec 22, 2022 • edited Loading

LilithHafner commented Dec 22, 2022

LilithHafner commented Dec 22, 2022

nanosoldier commented Dec 22, 2022

LilithHafner commented Dec 22, 2022

LilithHafner commented Dec 22, 2022 • edited Loading

nanosoldier commented Dec 22, 2022

LilithHafner commented Dec 22, 2022

nanosoldier commented Dec 22, 2022

LilithHafner commented Dec 22, 2022

LilithHafner commented Dec 27, 2022

oscardssmith commented Dec 27, 2022

LilithHafner commented Dec 27, 2022 • edited Loading

oscardssmith Dec 27, 2022

Choose a reason for hiding this comment

LilithHafner Dec 27, 2022

Choose a reason for hiding this comment

N5N3 Dec 27, 2022

Choose a reason for hiding this comment

LilithHafner Dec 27, 2022

Choose a reason for hiding this comment

LilithHafner commented Dec 27, 2022

nanosoldier commented Dec 27, 2022

KristofferC commented Dec 27, 2022

LilithHafner commented Dec 27, 2022

nanosoldier commented Dec 27, 2022

LilithHafner commented Dec 27, 2022

Replace dynamic dispatch with runtime branch on `rev` keyword in sortperm #47966

Replace dynamic dispatch with runtime branch on `rev` keyword in sortperm #47966

LilithHafner commented Dec 22, 2022 •

edited

Loading

LilithHafner commented Dec 22, 2022 •

edited

Loading

LilithHafner commented Dec 27, 2022 •

edited

Loading