-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Conditional conformance to Hashable for Optional, Dictionary and Array types. #14247
[WIP] Conditional conformance to Hashable for Optional, Dictionary and Array types. #14247
Conversation
…ArraySlice and ContiguousArray
The hashValues are now calculated similar to the automatically synthesized values when conforming to Hashable. This entails using _combineHashValues as values of the collections are iterated - as well as calling _mixInt before returning the hash.
stdlib/public/core/Optional.swift
Outdated
@@ -409,6 +409,28 @@ extension Optional : Equatable where Wrapped : Equatable { | |||
} | |||
} | |||
|
|||
extension Optional : Hashable where Wrapped : Hashable { | |||
/// The hash value for the optional. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this file, like others in the standard library, uses two spaces to indent instead of four.
…ictionary hashValue. Unfortunately it does not work.
stdlib/public/core/Optional.swift
Outdated
result = 1 | ||
result = _combineHashValues(result, wrapped.hashValue) | ||
} | ||
result = _mixInt(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_mixInt
can be removed here as well.
stdlib/public/core/Arrays.swift.gyb
Outdated
for element in self { | ||
result = _combineHashValues(result, element.hashValue) | ||
} | ||
result = _mixInt(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call to _mixInt
here is superfluous -- Set
and Dictionary
already calls it on every hash returned from hashValue
. (It is probably okay to call it twice, but there doesn't seem to be a good reason to do so.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommended that @mortenbekditlevsen model these after synthesized conformances, which do call a final _mixInt
. Part of the motivation here is that _combineHashValues
is deterministic, but _mixInt
can (eventually) have per-execution variation based on the magic number. I think if we're to abandon the final _mixInt
, we ought to consider it wholesale and include the synthesized implementations in the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final _mixInt
is done by Set
and Dictionary
; there is no need to do it twice. (Synthesized hashValues aren't doing the right thing yet.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I’m not sure I understand your comment.
We’re still discussing Array
conformance, right? There are no other calls to _mixInt
here—am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling _mixInt
directly before returning from hashValue
is unnecessary, because Set
and Dictionary
will call it for you (in _squeezeHashValue
, which standard hashed collections use to generate bucket indices).
Doing it anyway does not improve distribution, it just slows things down. As far as I know, _mixInt
wasn't designed to be used in the implementation of hashValue
at all. (Update: Except of course it's useful for cases like the commutative hash below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...got it, thanks. We should remove it from the synthesized implementation as well then.
checkHashable([dd1, dd2, dd3, dd4, dd5, dd6, dd7], equalityOracle: { $0 == $1 }) | ||
|
||
// d5 is equal to d4, but will probably be iterated differently | ||
let d5: Dictionary<Int, String> = [1: "meow", 4: "mooo", 2: "meow"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more reliable way to get different ordering for the same values is to use reserveCapacity
; each capacity gets a potentially different permutation. (This works too, but only as long as some of the keys collide.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lorentey ,
I'm not 100% certain that I understood you correctly, but in my latest commit, I tried instantiating two different mutable Dictionaries and reservedCapacity for 3 elements for each of them. Then I assigned values for the keys in different order and tested this with a version of the Dictionary-hashing, that (intentionally wrongfully) imposes the traversal order on the generated hash value.
But the tests still succeed although I would expect them to fail.
Have I perhaps understood you wrong?
var result: Int = _mixInt(0) | ||
for (k, v) in self { | ||
let combined = _combineHashValues(k.hashValue, v.hashValue) | ||
result ^= _mixInt(combined) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _combineHashValues()
rather than ^=
should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A previous implementation did this, but it was commented that this makes the hash depend on the traversal order.
So that two equal dicts that happen to be traversed in different order would produce different hashes.
…ay types based on recommendations from lorentey
let combined = _combineHashValues(k.hashValue, v.hashValue) | ||
result ^= _mixInt(combined) | ||
} | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lorentey ,
In the current version I removed _mixInt from Array types and from Optional as per your advise.
I am not 100% certain about your recommendation for the Dictionary - should I do the initial _mixInt(0) and the _mixInt(combined) - or are both of these superfluous too (referring to your discussion with @xwu )?
Sincerely,
/morten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _mixInt(0)
is safe to eliminate, but we definitely need the _mixInt(combined)
.
You can think of _mixInt
as a pseudo-random number generator that gives you a different 64-bit random number for every integer on its input. Calling it before the xor operation ensures that the xor won't cancel out bits in case all the elements return hash values within the same limited range. For example, it turns
0 ^ 1 ^ 2 ^ 3 == 0
into
_mixInt(0) ^ _mixInt(1) ^ _mixInt(2) ^ _mixInt(3)
== 0xb2b24f688dc4164d ^ 0x792e33eb068557de ^ 0x66442b5dc51f5134 ^ 0xfbcb7d4146e1660d
== 0x56132a9f08bf76aa,
which looks like a much healthier hash value. (Dictionary
then does another round of _mixInt
, turning it into 0x16bb592381f12d84
, which isn't necessary in this case, but it doesn't really hurt, either.)
d5[2] = "meow" | ||
|
||
var d6: Dictionary<Int, String> = Dictionary() | ||
d6.reserveCapacity(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should reserve a different amount here; for example, d6.reserveCapacity(25)
would arrange elements in a different permutation (at least with the hash function we currently have). Dictionaries with the same capacity and elements will arrange elements in the same order.
An even better solution would be to reuse the same dictionary but reserve progressively larger capacities, checking that the hash value does not change after each reservation:
var d5: Dictionary<Int, String> = [1: "meow", 4: "mooo", 2: "meow"]
let expected = d5.hashValue
for capacity in [4, 8, 16, 32, 64, 128, 256] {
d5.reserveCapacity(capacity)
expectEqual(d5.hashValue, expected)
}
This makes it a lot more likely that the keys will fall into a different order in at least one iteration. To encourage reordering even more, start with a larger dictionary -- 3 elements have just 6 permutations, while, say, 6 elements have 720.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lorentey ,
With your suggestion the test now correctly fails on a hashValue implementation that depends on the traversal order.
let combined = _combineHashValues(k.hashValue, v.hashValue) | ||
result ^= _mixInt(combined) | ||
} | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _mixInt(0)
is safe to eliminate, but we definitely need the _mixInt(combined)
.
You can think of _mixInt
as a pseudo-random number generator that gives you a different 64-bit random number for every integer on its input. Calling it before the xor operation ensures that the xor won't cancel out bits in case all the elements return hash values within the same limited range. For example, it turns
0 ^ 1 ^ 2 ^ 3 == 0
into
_mixInt(0) ^ _mixInt(1) ^ _mixInt(2) ^ _mixInt(3)
== 0xb2b24f688dc4164d ^ 0x792e33eb068557de ^ 0x66442b5dc51f5134 ^ 0xfbcb7d4146e1660d
== 0x56132a9f08bf76aa,
which looks like a much healthier hash value. (Dictionary
then does another round of _mixInt
, turning it into 0x16bb592381f12d84
, which isn't necessary in this case, but it doesn't really hurt, either.)
…pendence on traversal order
…sible permutations - making it more likely to detect order-dependent hashes
In the current implementation I think that I have gathered all suggestions from reviewers so far. |
@swift-ci Please smoke test |
@swift-ci Please smoke benchmark |
@swift-ci please smoke test compiler performance |
Awesome! I filed https://bugs.swift.org/browse/SR-6910 today because I missed that this was still WIP. |
Build comment file:Optimized (O)Regression (2)
Improvement (5)
No Changes (351)
Unoptimized (Onone)Regression (24)
Improvement (12)
No Changes (322)
Hardware Overview
|
I think |
I think the conclusion on swift-evolution was clear -- |
@swift-ci please smoke test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. One comment to silence a warning and one nit.
extension AnyCollection : Equatable where Element : Equatable { | ||
@_inlineable // FIXME(sil-serialize-all) | ||
public static func ==(lhs: AnyCollection<Element>, rhs: AnyCollection<Element>) -> Bool { | ||
let lhsCount = lhs.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this line.
if lhs.count != rhs.count { | ||
return false | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unnecessary empty line.
There are three failures:
|
The third failure you list actually portends a bigger issue, I think. These conditional conformances should not be source-breaking,
Thinko; both conflicting implementations are on the concrete type. The only way we can make it not source breaking is to use |
We can add appropriate
|
I don't think we should change SwiftPM code in a way that it will stop working with older version of (trunk) compiler. Unless it is absolutely necessary to land this PR with broken source compatibility story, we shouldn't change SwiftPM. |
A pragmatic solution would be to leave |
I expect swiftpm is not the only codebase out there that defined ==/hashValue for some of these types, and We don't need to remove anything; we just need to add |
@aciidb0mb3r I believe if you add |
@swift-ci please test source compatibility |
@swift-ci Please test source compatibility |
Does anyone have any pointers to get me started on adding AnyHashable for bridging dictionary and array types from Objective-C as stated in the failing test:
Any help would be greatly appreciated. :-) |
We'll need to implement |
I created a new branch for this work so that we can collaborate on getting the tests fixed and landing this soon. I'll squash and merge this PR to that branch, and I'll open a new PR to continue tracking development. |
PR #14527 is now open; development continues there. |
…d Array types. (swiftlang#14247) * Add conditional Hashable conformance to Optional, Dictionary, Array, ArraySlice and ContiguousArray * Modified hashValue implementations The hashValues are now calculated similar to the automatically synthesized values when conforming to Hashable. This entails using _combineHashValues as values of the collections are iterated - as well as calling _mixInt before returning the hash. * Added FIXMEs as suggested by Max Moiseev * Use checkHashable to check Hashable conformance * Use 2 space indentation * Hashing of Dictionary is now independent of traversal order * Added a test to proof failure of (previous) wrong implementation of Dictionary hashValue. Unfortunately it does not work. * Removed '_mixInt' from 'hashValue' implementation of Optional and Array types based on recommendations from lorentey * Another attempt at detecting bad hashing due to traversal order * Dictionary Hashable validation tests now detect bad hashing due to dependence on traversal order * Removed superfluous initial _mixInt call for Dictionary hashValue implementation. * Add more elements to dictionary in test to increase the number of possible permutations - making it more likely to detect order-dependent hashes * Added Hashable conformance to CollectionOfOne, EmptyCollection and Range types * Fix indirect referral to the only member of CollectionOfOne * Re-added Hashable conformance to Range after merge from master * Change hashValue based on comment from @lorentey * Remove tests for conditional Hashable conformance for Range types. This is left for a followup PR * Added tests for CollectionOfOne and EmptyCollection * Added conditional conformance fo Equatable and Hashable for DictionaryLiteral. Added tests too. * Added conditional Equatable and Hashable conformance to Slice * Use 'elementsEqual' for Slice equality operator * Fixed documentation comment and indentation * Fix DictionaryLiteral equality implementation * Revert "Fix DictionaryLiteral equality implementation" This reverts commit 7fc1510. * Fix DictionaryLiteral equality implementation * Use equalElements(:by:) to compare DictionaryLiteral elements * Added conditional conformance for Equatable and Hashable to AnyCollection * Revert "Use 'elementsEqual' for Slice equality operator" This reverts commit 0ba2278. * Revert "Added conditional Equatable and Hashable conformance to Slice" This reverts commit 84f9934. * Added conditional conformance for Equatable and Hashable for ClosedRange
This PR makes Optional, Dictionary, Array, ArraySlice and ContiguousArray conditionally conform to Hashable.
Resolves SR-NNNN.