-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SE-0143][stdlib] Conditionally conform stdlib types to Hashable #15382
Conversation
This is an updated version of #14527, currently blocked by swiftlang/swift-evolution#808 |
/cc @mortenbekditlevsen, @xwu |
@swift-ci test |
@swift-ci please smoke test compiler performance |
@swift-ci test source compatibility |
…d Array types. (swiftlang#14247) * Add conditional Hashable conformance to Optional, Dictionary, Array, ArraySlice and ContiguousArray * Modified hashValue implementations The hashValues are now calculated similar to the automatically synthesized values when conforming to Hashable. This entails using _combineHashValues as values of the collections are iterated - as well as calling _mixInt before returning the hash. * Added FIXMEs as suggested by Max Moiseev * Use checkHashable to check Hashable conformance * Use 2 space indentation * Hashing of Dictionary is now independent of traversal order * Added a test to proof failure of (previous) wrong implementation of Dictionary hashValue. Unfortunately it does not work. * Removed '_mixInt' from 'hashValue' implementation of Optional and Array types based on recommendations from lorentey * Another attempt at detecting bad hashing due to traversal order * Dictionary Hashable validation tests now detect bad hashing due to dependence on traversal order * Removed superfluous initial _mixInt call for Dictionary hashValue implementation. * Add more elements to dictionary in test to increase the number of possible permutations - making it more likely to detect order-dependent hashes * Added Hashable conformance to CollectionOfOne, EmptyCollection and Range types * Fix indirect referral to the only member of CollectionOfOne * Re-added Hashable conformance to Range after merge from master * Change hashValue based on comment from @lorentey * Remove tests for conditional Hashable conformance for Range types. This is left for a followup PR * Added tests for CollectionOfOne and EmptyCollection * Added conditional conformance fo Equatable and Hashable for DictionaryLiteral. Added tests too. * Added conditional Equatable and Hashable conformance to Slice * Use 'elementsEqual' for Slice equality operator * Fixed documentation comment and indentation * Fix DictionaryLiteral equality implementation * Revert "Fix DictionaryLiteral equality implementation" This reverts commit 7fc1510. * Fix DictionaryLiteral equality implementation * Use equalElements(:by:) to compare DictionaryLiteral elements * Added conditional conformance for Equatable and Hashable to AnyCollection * Revert "Use 'elementsEqual' for Slice equality operator" This reverts commit 0ba2278. * Revert "Added conditional Equatable and Hashable conformance to Slice" This reverts commit 84f9934. * Added conditional conformance for Equatable and Hashable for ClosedRange
…tions Now that Array and Dictionary conform to Hashable, we need to make sure that their bridged counterparts provide the same hash values when converted to AnyHashable.
This assumes these will land in Swift 4.1; the attributes need to be adjusted if that turns out not to be the case. It seems @available for protocol conformances is not yet functional. I added attributes for those anyway, marked with FIXME(conformance-availability). # Conflicts: # stdlib/public/core/ExistentialCollection.swift.gyb # stdlib/public/core/Mirror.swift
…texts yet Array and Dictionary are now conditionally Hashable, so the importer wants to use them when importing NSArray and NSDictionary types in hashable contexts. Unfortunately, this currently means that a type like NSSet<NSDictionary<NSString *, id> *> * gets imported as Set<Dictionary<String, Any>>, which is invalid — Dictionary.Value needs to be Hashable, too: Set<Dictionary<String, AnyHashable>> For now, work around this by explicitly turning NSArray and NSDictionary into AnyHashable when they are used as the first type parameter of NSSet or NSDictionary, ignoring Hashable conformance in this case. This reverts to the previous behavior.
When implementing a protocol requirement, additional doc comments are unnecessary unless there are additional semantics peculiar to the specific implementation. The manually propagated documentation had several problems, including copypasta and inconsistent terminology. Therefore, if `T.==` doesn't have any doc comments and the implementation of `T.hashValue` is unremarkable, remove the unnecessary doc comment.
This removes Hashable conformance for types that do not already implement Equatable: - CollectionOfOne - AnyCollection - DictionaryLiteral
Implement _hash(into:) rather than hashValue.
599b4bc
to
bb60059
Compare
@swift-ci test |
Build failed |
@swift-ci please test os x platform |
@swift-ci please test macOS platform |
@swift-ci please smoke test compiler performance |
FYI @parkera there is a slight tweak to |
Build comment file:Summary for master smoketestNo regressions above thresholds Debugdebug briefRegressed (0)
Improved (0)
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
debug detailedRegressed (0)
Improved (0)
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
Releaserelease briefRegressed (0)
Improved (0)
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
release detailedRegressed (0)
Improved (0)
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
|
// Must be @nonobjc to avoid infinite recursion during bridging | ||
@nonobjc | ||
public func _toCustomAnyHashable() -> AnyHashable? { | ||
return AnyHashable(self as! Dictionary<AnyHashable, AnyHashable>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct; the values might not be hashable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, don't forget NSSet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct; the values might not be hashable.
Oh, do you mean when they include _SwiftValue
s whose corresponding Swift types aren't Hashable
? Hm, that's a good point -- and I think it applies to NSArray
, too. (And maybe even NSDictionary
keys and NSSet
.)
I vaguely remember that this still ends up working, because we keep such values in their AnyObject
form, which is guaranteed to be hashable through NSObject
's -hash
and -isEqual:
. I'll take a closer look!
Also, don't forget NSSet?
NSSet
already has a similar implementation of _HasCustomAnyHashableRepresentation
elsewhere in the Foundation overlay:
extension NSSet : _HasCustomAnyHashableRepresentation {
// Must be @nonobjc to avoid infinite recursion during bridging
@nonobjc
public func _toCustomAnyHashable() -> AnyHashable? {
return AnyHashable(self as! Set<AnyHashable>)
}
}
(Set
has been Hashable
from the start -- it did not need to wait for conditional conformances with that, because its contents are always guaranteed to be hashable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So apparently the same issue applies to NSSet
elements and NSDictionary
keys -- and it means that there are NSSet
/NSDictionary
instances that can't be bridged to Swift. 🙊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrose-apple It seems this opened a big can of worms. _SwiftValue
instances for non-Hashable
values spectacularly break language assumptions on how bridging interacts with AnyHashable
not just in these new cases, but for existing ones as well.
I suggest to land this PR as is, and to track the _SwiftValue
vs AnyHashable
issue separately, because it reaches far outside the scope of this PR.
I suspect these _toCustomAnyHashable
implementations (including the pre-existing one for NSSet
) are actually fine -- the problem is that self as? Set<AnyHashable>
not only fails sometimes, but unexpectedly, it also crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm less interested in non-hashable keys than I am in non-hashable values, but maybe you're right that this isn't important in practice, since all NSObjects are Hashable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @michael-lehew in case he has any additional thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be very curious to follow on to the tracked _SwiftValue
vs AnyHashable
issue. I agree with @lorentey that it can be considered separately, but landing this PR really forces the issue, especially as more non-ObjC types (thinking model-layer) find their way into the value side of dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! My plan is to make sure everything that can be put in an NSSet
and NSDictionary
can be converted to AnyHashable
. This includes Swift types that aren't natively Hashable
, since their NSObject boxes do implement hashing. Let's move on to SR-7284 to fix the underlying issue.
Note that while the issue is serious, it is a subtle corner case that does not normally occur in applications -- and if it does, users have an easy source-level workaround. (By simply implementing Hashable
.) The additional wrinkles introduced by this PR require even more convoluted Swift->Objective-C->Swift bridging scenarios, and I believe they are even less likely to occur in practice.
public func _hash(into hasher: inout _Hasher) { | ||
for element in self { | ||
hasher.append(element) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit surprising to have the hash be O(n) and dependent on the contents. NSDictionary
and friends have to take great care about this (in fact our -hash
impl currently returns just -count
) because while we do rely on NSCopying
for our keys -- our values are allowed to mutate. In fact thats perhaps a very common use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking further, that might preclude being able to do this conditional conformance for Array and or Dictionary, unless there is a way to constrain mutability. Value types are fine obviously, but if you have reference keys, or reference values, you can't guarantee the conformance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content-based hashing is very much intentional here; we consider this a feature, not a bug. The built-in hashing behavior is designed to produce robust hash values: if two instances aren't equal, they are expected to produce different hashes. (Note that String
hashing also takes time that's proportional to the length of the string; and it also performs Unicode normalization.)
The mutation issue is indeed important; however, Swift approaches it in a different way, by concentrating on first-class value types. The language itself makes it impossible to mutate value-typed keys in a Dictionary
, and the same mechanism protects a Dictionary
's values, if the dictionary is put in a Set
.
As a rule, Swift does not provide language/API-level protection against users mutating reference-typed components in a Dictionary
key (or Set
element) in such a way that changes the key's equality or hashing behavior. (Dictionary
and Set
doesn't even call -copyWithZone:
for keys that happen to implement NSCopying
.) This PR merely follows the existing design established for hashing collections.
Note that Dictionary
only conforms to Hashable
when both its Key
and Value
does; Hashable
conformance is not automatic for reference types, and random model objects will not get unexpectedly hashed after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually kind of with Michael on this one. Hashing Array shouldn't take linear time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(After all this I'm not sure it's worth it to make the collections Hashable at all. Optional, sure.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional observation: every successful lookup of a key in a hash table necessarily involves a successful equality check, which is O(n) for collection types like Array
, Set
and Dictionary
. Therefore, O(n) hashing does not change our expected asymptotic performance, but it does prevent the formation of long collision chains that would make the hash table itself do lookups in time that's linear in the size of the table itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something crucial, but I fail to get the JSON example; JSON only allows string keys. Is it possible to use deeply nested structures as dictionary keys in plists? Why would someone do that?
Putting a deeply nested value type with thousands of components into a Set
may or may not be slow -- if you actually need the value semantics offered by Swift's standard collection types, then I don't really see how we can be faster. Looking up a value in such a set necessarily involves a successful equality check, which will need to traverse the two structures anyway. Having another traversal for the hash value is merely an additional constant factor.
If we were to replace the hash value with the collection's count, we would make dictionary lookups quadratic -- equality checks will still be O(n), but we'd have to repeat them for each element in a collision chain, whose length can easily be proportional to the size of the hash table.
There's nothing wrong with using Set
s as Dictionary
keys, and there's nothing wrong with using Array
s as elements in a Set
. Sure, a hash table may not always be the most appropriate data structure if its elements are large collections -- just like an Array
is not always appropriate for use as a sorted collection.
Incidentally, we did touch upon this issue at the very end of the original swift-evolution thread for this feature:
https://forums.swift.org/t/let-optional-dictionary-and-array-conditionally-conform-to-hashable/9046/58
It may be a good idea to continue this discussion on that thread rather than in review comments -- not many people follow discussions in random PRs! 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point regarding keys. The common cases (outside of just a String/Int/UUID) I've seen there for folks is using a small dictionary of string-string pairs, or an array (ordered uuids).
Using just -count
is horrible, I agree. We want to do something better even for the Foundation types. My point is, if there is a way to make -hash
less expensive but still provide some uniqueness information that's an improvement.
Apologies for missing the SE thread, its so rare I ever get bandwidth to focus on Swift. :( I'll take a look there for context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, better protection against mutations in reference types and exploring performance implications of content-based hashing are both valid points that deserve discussion. We did not spend enough time on these on swift-evolution, so it's good we are at least discussing them now.
I really don't have a good answer to the mutations issue - this PR fits comfortably within the existing Swift model, but perhaps we should do more. Standalone Swift classes aren't Hashable by default, which alleviates the issue a little, but not by much. One thing Set and Dictionary may do is to at least detect invalid mutations sometime after they happen; for example, while they're rehashing elements. (Although it's unclear if a delayed error/trap would be helpful or just confusing.) Would some language level change help more? This is far beyond the scope of this PR, but it's something to ponder.
The performance concerns seem easier to tackle, because we can run benchmarks. For the small cases you mentioned, I’m absolutely positive that hashing the full contents is the only reasonable thing we can do - leaving even a mere handful of relevant bits out of the hash would open the door to runaway collision chains. (Beyond security/reliability concerns, this may also have an implication for storage overhead: I have plans to try increasing the maximum load factor from 75% to around 95% at some point, but this requires extremely robust hashing, as well as adding some extra bits to hash table entries.)
We may get away with sampling a random (but stable) fraction of Array elements when we need to hash large arrays, but given that we can't apply the same trick in ==, it’s not at all clear to me if that would make a difference that's worth the reliability costs. Similarly, Sets could maintain a running hash of their contents — the extra tradeoff there is that it would add overhead to set mutations for all sets, including the vast majority that will never be hashed. (The same applies to Dictionary.) In any case, we'll be able to tweak the implementations in this PR later; hash values are explicitly not stable across even executions of the same Swift program, and we can change the algorithm that generates them in future versions of the stdlib. (As long as we remove the inlinable attribute, which we will probably do as part of an ABI audit.)
Hashing n bits takes O(n) time; this is just how hashing works. Consequently, hashing a list of n elements is n times slower than hashing a single element. This cost may be acceptable for some applications, and prohibitive to others, but this can be said of every single operation offered by the stdlib. I don't see why we would single out Array as in need of constant-time hashing, while at the same time, we're perfectly fine with String and Data taking linear time.
- There are clear usecases for hashing at least small arrays/sets/dictionaries,
- Content-based hashing is not only the most robust choice, but it also aligns perfectly with Swift's emphasis on collections that are themselves value types, and how these collections already implement Equatable. (After all, hashValue is an approximation for one half of a ==...)
- Users can easily implement hash memoization and similar techniques to manually speed up hashing when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This discussion has pushed me to considering finally fixing the low numbered radar about collection -hash implementation being so poor, then NS.* and Swift will be consistent together. Thanks Karoy!
This is a spin-off of PR swiftlang#15382, which is currently stuck in review limbo. Implements part of SE-0143’s Hashable amendments. rdar://problem/38432751
I think we're now ready to land this. The review discussions turned out especially valuable -- we've uncovered several issues that we'll need to follow up on:
Thank you @mortenbekditlevsen, @xwu for the implementation effort, and @jrose-apple, @michael-lehew, @parkera and @airspeedswift for lending us your expertise for the review. 👍 |
Update: The benchmark chart below shows typical amortized per-element performance for the two extremes of hashing: hashing the full contents ("Full") vs just using a constant value ("Fast"), when working with 10-element integer arrays inside a The workload is (1) inserting n random 10-element integer arrays into a set, (2) looking them up in a random order, then (3) removing them in the same random order. As expected, "full" hashing delivers linear performance, while "fast" hashing is quadratic. (Every measurement shown has been divided by the element count; this is why "Full" looks constant, and "Fast" linear.) The hashing algorithm used here (SipHash-1-3) is quite expensive compared to an equality check between integers -- this explains the inverse gap between "Full" and "Fast" for the smaller sizes. (I could close the gap by selecting a simpler algorithm, but SipHash is what the stdlib uses at the moment, so it's interesting to see how it performs. This benchmark doesn't use resilient hashing -- enabling that would widen that gap even further.) This initial gap can be made arbitrarily large by increasing the size of the arrays -- both "Fast" and "Full" hashing slows down linearly as the arrays grow, but "Fast" does it much slower. I think this is fine for now; later on, we can optimize hashing by using simpler hash functions for small hash tables, and better, slower ones for larger ones:
We expect to change Swift's hashing APIs so that the hash function will be selected by the hashing collection rather than the keys, so this is doable. (You can already see a version of the potential new API in the |
@@ -106,6 +106,14 @@ extension Array : _ObjectiveCBridgeable { | |||
} | |||
} | |||
|
|||
extension NSArray : _HasCustomAnyHashableRepresentation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parkera I just saw this, does it need bringing to SCLF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we will need to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this! Here is a list of _HasCustomAnyHashableRepresentation
conformances in the Foundation overlay that are currently missing in open-source Foundation:
NSAffineTransform
NSArray
NSCalendar
NSCharacterSet
NSDate
NSDateComponents
NSDateInterval
NSDictionary
NSLocale
NSNotification
NSNumber
NSSet
NSString
NSTimeZone
NSURL
NSURLComponents
NSURLQueryItem
NSURLRequest
This PR added NSArray and NSDictionary to the list. The intention behind _HCAHR conformances is to have the AnyHashable representation of these classes hash the same way as the corresponding Swift value type. (I don't know if this is as much of a concern in swift-corelibs-foundation as it is in Cocoa, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's quite a long list. @parkera how do you suggest we proceed with resolving this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's only really critical if/when bridging is enabled on Linux; otherwise, it's okay for, say, NSString and String to hash differently when put into an AnyHashable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parkera It looks like most of these implementations would easily translate over to corelibs. as
-conversions won't work without bridging, but it seems straightforward to replace them with explicit conversions.
I expect most of the effort would be spent on writing new tests to cover the new functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Part of the deal of contributing to the Foundation part of the overlay is that we also need those changes for Linux. We can't let it lag behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; I created a bug to track this effort: https://bugs.swift.org/browse/SR-7436
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lorentey - are you going to resolve this bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ianpartridge It turns out conforming to _HCAHR currently has no effect on platforms without the Objective-C runtime. I'll look into whether this can be lifted (as part of SR-7436). In the meantime, I don't think it's worth adding these conformances -- it'd be dead code.
Its conformance has been removed before swiftlang#15382 was merged.
As a followup to SE-0143, this adds conditional conformance to
Hashable
to the following types in the standard library:Optional
Array
,ContiguousArray
,ArraySlice
Dictionary
Range
,ClosedRange
This makes synthesized
Hashable
implementations (SE-0185) available for structs and enums that include fields of these types, freeing programmers from having to manually deal with hash combinators.Resolves SR-6910.