Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast paths for allunique #43375

Merged
merged 11 commits into from
May 11, 2022
Merged

Fast paths for allunique #43375

merged 11 commits into from
May 11, 2022

Conversation

mcabbott
Copy link
Contributor

@mcabbott mcabbott commented Dec 9, 2021

This adds fast paths for allunique(::Tuple) and allunique(::AbstractArray), which just search linearly instead of making a dictionary. Both are used only for length(x) < 32.

For arrays, the crossover point for the Dict being faster seems to usually be about there, I tried a few old & new computers, timing rand(n) which should be the worst case. If there are many repeats, then linear search is much faster, even on longer arrays.

These times aren't quite the latest version, sadly.

julia> _allunique(x::Tuple) = first(x)  Base.tail(x) && _allunique(Base.tail(x))
julia> _allunique(x::Tuple{}) = true
julia> _allunique(x) = @inbounds all(!(x[i] in @view x[i+1:end]) for i in LinearIndices(x));

julia> for n in 10:10:50
         @show n
         x = Tuple(rand(n))  # tuple with all are unique -- worst case
         @btime allunique($x)
         @btime _allunique($x) # this PR, without Any32 shortcut
       end
n = 10
  min 104.901 ns, mean 118.052 ns (4 allocations, 400 bytes)
  min 21.773 ns, mean 21.966 ns (0 allocations)
n = 20
  min 249.676 ns, mean 281.025 ns (7 allocations, 1.12 KiB)
  min 79.810 ns, mean 80.627 ns (0 allocations)
n = 30
  min 307.438 ns, mean 337.942 ns (7 allocations, 1.12 KiB)
  min 174.695 ns, mean 175.963 ns (0 allocations)
n = 40
  min 358.533 ns, mean 391.323 ns (7 allocations, 1.12 KiB)
  min 21.541 μs, mean 22.229 μs (616 allocations, 18.62 KiB)
n = 50
  min 644.952 ns, mean 1.009 μs (10 allocations, 3.62 KiB)
  min 55.166 μs, mean 56.813 μs (1566 allocations, 47.69 KiB)

julia> for n in 10:10:30
         @show n
         x = Tuple(rand(1:3, n))  # tuple with many repeats, can quit early
         @btime allunique($x)
         @btime _allunique($x) # this PR, without Any32 shortcut
       end
n = 10
  min 67.671 ns, mean 80.307 ns (4 allocations, 400 bytes)
  min 4.291 ns, mean 4.381 ns (0 allocations)
n = 20
  min 64.367 ns, mean 77.756 ns (4 allocations, 400 bytes)
  min 3.709 ns, mean 3.875 ns (0 allocations)
n = 30
  min 67.884 ns, mean 81.512 ns (4 allocations, 400 bytes)
  min 6.250 ns, mean 6.424 ns (0 allocations)

julia> for n in 10:10:50
         @show n
         x = rand(n)  # vector with all are unique -- worst case for linear search
         @btime allunique($x) 
         @btime _allunique($x) # this PR
       end
n = 10
  min 107.474 ns, mean 121.231 ns (4 allocations, 400 bytes)
  min 49.257 ns, mean 49.555 ns (0 allocations)
n = 20
  min 250.886 ns, mean 284.110 ns (7 allocations, 1.12 KiB)
  min 155.307 ns, mean 157.149 ns (0 allocations)
n = 30
  min 303.938 ns, mean 340.002 ns (7 allocations, 1.12 KiB)
  min 323.004 ns, mean 325.665 ns (0 allocations)
n = 40
  min 363.490 ns, mean 397.762 ns (7 allocations, 1.12 KiB)
  min 553.920 ns, mean 559.104 ns (0 allocations)
n = 50
  min 673.123 ns, mean 1.065 μs (10 allocations, 3.62 KiB)
  min 863.695 ns, mean 875.903 ns (0 allocations)

julia> for n in 10:20:100
         @show n
         x = rand(1:3, n)  # vector with many repeats, can quit early
         @btime allunique($x)
         @btime _allunique($x) # this PR, without Any32 shortcut
       end
n = 10
  min 64.201 ns, mean 78.189 ns (4 allocations, 400 bytes)
  min 4.333 ns, mean 4.396 ns (0 allocations)
n = 30
  min 64.155 ns, mean 78.440 ns (4 allocations, 400 bytes)
  min 4.291 ns, mean 4.416 ns (0 allocations)
n = 50
  min 71.636 ns, mean 85.605 ns (4 allocations, 400 bytes)
  min 7.083 ns, mean 7.197 ns (0 allocations)
n = 70
  min 68.182 ns, mean 82.215 ns (4 allocations, 400 bytes)
  min 5.000 ns, mean 5.092 ns (0 allocations)
n = 90
  min 68.352 ns, mean 82.101 ns (4 allocations, 400 bytes)
  min 5.000 ns, mean 5.121 ns (0 allocations)

julia> versioninfo()
Julia Version 1.8.0-DEV.1098
Commit 5387b4de35* (2021-12-04 04:58 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.1.0)
  CPU: Apple M1

Vectors on an older computer:

julia> for n in 10:10:50
         @show n
         x = rand(n)  # vector with all are unique -- worst case for linear search
         @btime allunique($x) 
         @btime _allunique($x) # this PR
       end
n = 10
  min 388.537 ns, mean 443.090 ns (4 allocations, 400 bytes. GC mean 4.03%)
  min 198.582 ns, mean 201.047 ns (0 allocations)
n = 20
  min 973.000 ns, mean 1.301 μs (7 allocations, 1.12 KiB. GC mean 4.52%)
  min 568.451 ns, mean 573.255 ns (0 allocations)
n = 30
  min 1.191 μs, mean 1.648 μs (7 allocations, 1.12 KiB. GC mean 3.86%)
  min 1.163 μs, mean 1.307 μs (0 allocations)
n = 40
  min 1.447 μs, mean 1.810 μs (7 allocations, 1.12 KiB. GC mean 2.65%)
  min 2.103 μs, mean 2.153 μs (0 allocations)
n = 50
  min 3.023 μs, mean 4.352 μs (10 allocations, 3.62 KiB. GC mean 5.86%)
  min 3.270 μs, mean 3.336 μs (0 allocations)

julia> versioninfo()
Julia Version 1.7.0-beta3.0
Commit e76c9dad42 (2021-07-07 08:12 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz

Edit: These are wrong for special floating point values now fixed, and tested.

@mcabbott mcabbott mentioned this pull request Dec 9, 2021
base/set.jl Outdated Show resolved Hide resolved
@mcabbott mcabbott marked this pull request as draft December 9, 2021 04:00
@mcabbott mcabbott marked this pull request as ready for review December 10, 2021 02:28
base/set.jl Outdated Show resolved Hide resolved
@ViralBShah ViralBShah added the performance Must go faster label Dec 12, 2021
@kshyatt
Copy link
Contributor

kshyatt commented Dec 15, 2021

Can we get another review on this?

base/set.jl Show resolved Hide resolved
@mcabbott
Copy link
Contributor Author

mcabbott commented Feb 9, 2022

Pre-1.8 bump?

false
```
"""
function allunique(C)
if haslength(C)
length(C) < 2 && return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will catch length-1 objects which aren't StridedArrays, without first collecting them. I suppose transpose([1]) is an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

Copy link
Member

@oscardssmith oscardssmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to go, other than the one change I requested.

@mcabbott
Copy link
Contributor Author

mcabbott commented May 7, 2022

Bump?

Test failures today seem to be Distributed

@oscardssmith
Copy link
Member

rebasing to hope for clearer CI.

@KristofferC KristofferC merged commit 13ae079 into JuliaLang:master May 11, 2022
@mcabbott mcabbott deleted the allunique branch May 12, 2022 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants