-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash function is invariant to reversal of arrays of symbols #20744
Comments
The root problem here is that hash(:c, hash(:b, hash(:a, zero(UInt)))) == hash(:a, hash(:b, hash(:c, zero(UInt)))) returns |
I think we should bring back the |
@JeffBezanson Where may I see that bitmix(x,y) implementation? |
It called C functions in src/support/hashing.[ch]. |
Using bitmix for mixing hashes was causing us to do about twice as much hashing as necessary, which is why I got rid of it – we were hashing each value and then bitmixing the hashes. With a Merkle-Damgard construction, you only need to do one hash operation per object, not two. The trouble here is that we're doing zero actual hash operations per object and instead relying on |
@StefanKarpinski, this issue has nothing to do with whether julia> badmix(x::UInt, h::UInt) = 3*x - h
badmix (generic function with 1 method)
julia> x1 = rand(UInt); x2 = rand(UInt); x3 = rand(UInt);
julia> badmix(x1, badmix(x2, badmix(x3, zero(UInt)))) == badmix(x3, badmix(x2, badmix(x1, zero(UInt))))
true If we just defined |
I can confirm that using the old julia> int64hash(x) = ccall(:int64hash, UInt, (UInt,), x)
int64hash (generic function with 1 method)
julia> bitmix(a, b) = int64hash(xor(a, bswap(b)))
bitmix (generic function with 1 method)
julia> myhash(x::Symbol, h::UInt) = bitmix(object_id(x), h)
myhash (generic function with 1 method)
julia> myhash(:c, myhash(:b, myhash(:a, zero(UInt)))) == myhash(:a, myhash(:b, myhash(:c, zero(UInt))))
false |
An even simpler solution, that doesn't require the old julia> myhash(x::Symbol, h::UInt) = hash(object_id(x), h)
myhash (generic function with 1 method)
julia> myhash(:c, myhash(:b, myhash(:a, zero(UInt)))) == myhash(:a, myhash(:b, myhash(:c, zero(UInt))))
false Maybe this is what Stefan mean by "using an actual hash function?" I was distinguishing between "hashing |
I already have a PR fixing this. It's essentially the same as your fix except that it uses low-level bit hashing ( |
I was curious about the history of this: it was originally correct but was broken here in order to not pay the hashing cost for objects since |
y'all seem to be satisfied with MurmurHash3 .. why? |
Inertia? That's only used for hashing large data blobs, this is a different hash function entirely. |
Another way to avoid any hashing beyond the computation of |
Yes, I agree your PR using |
From the manual for
hash
While the converse of this statement isn't strictly implied, I think it would be a reasonable thing to expect from a hash function (barring adversarial hash collisions), and this isn't the case for vectors of symbols:
Would it be possible to modify the
hash
function so that it is no longer invariant to commonly used object transformations?The text was updated successfully, but these errors were encountered: