Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster integer hashing (fixes #37800 UB) #38031

Merged
merged 2 commits into from
Oct 30, 2020
Merged

Faster integer hashing (fixes #37800 UB) #38031

merged 2 commits into from
Oct 30, 2020

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Oct 14, 2020

In LLVM (inherited from C), fptosi has undefined behavior if the result
does not fit the integer size after rounding down. But by using the same
strategy as generic hashing of Real values, we actually can end up with
a situation that is not much slower for the CPU to deal with and avoids
the UB.

Refs #6624 (3696968)
Fixes #37800

Benchmarks:

julia> Random.seed!(1); a = rand(Int64, 10^8); b = copy(reinterpret(Float64, a)); c = rand(Float64, length(a)); d = copy(reinterpret(UInt64, a));
julia> @time sum(hash, a); @time sum(hash, b); @time sum(hash, c); @time sum(hash, d);
master:
  0.174493 seconds (1 allocation: 16 bytes)
  0.316409 seconds (1 allocation: 16 bytes)
  0.317264 seconds (1 allocation: 16 bytes)
  0.145938 seconds (1 allocation: 16 bytes)

PR:
  0.119815 seconds (1 allocation: 16 bytes)
  0.687834 seconds (1 allocation: 16 bytes)
  0.468544 seconds (1 allocation: 16 bytes)
  0.210141 seconds (1 allocation: 16 bytes)

Aside: note that integers-as-floating-point aren't affected as much:

julia> e = map(Float64, a);
master:  0.313309 seconds (1 allocation: 16 bytes)
PR:   0.338984 seconds (1 allocation: 16 bytes)

@StefanKarpinski I picked these factors mostly at random. So if you have better ideas for the specific hashing function itself, I'll update to that. This is showing the general idea and testing performance of it.

@jmkuhn
Copy link
Contributor

jmkuhn commented Oct 14, 2020

With this PR you have

julia> hash(0.0) == hash(-0.0)
true

@vtjnash
Copy link
Member Author

vtjnash commented Oct 15, 2020

Ah, you're right. Good catch. I completely botched that and forgot to use isequal with hashing instead of ==.

@vtjnash vtjnash added the triage This should be discussed on a triage call label Oct 19, 2020
base/float.jl Outdated Show resolved Hide resolved
@vtjnash
Copy link
Member Author

vtjnash commented Oct 22, 2020

triage concluded to simplify this (disable the breaking tests), and otherwise to proceed

@vtjnash vtjnash removed the triage This should be discussed on a triage call label Oct 22, 2020
In LLVM (inherited from C), fptosi has undefined behavior if the result
does not fit the integer size after rounding down. But by using the same
strategy as generic hashing of Real values, we actually can end up with
a sitatuion that is faster for the CPU to deal with and avoids the UB.

Refs #6624 (3696968)
Fixes #37800
@Keno
Copy link
Member

Keno commented Oct 29, 2020

What is holding this up? IIRC Triage ok'ed this, so can we just merge it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Floating point hash bug
4 participants