Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SE-0143][stdlib] Conditionally conform stdlib types to Hashable #15382

Merged
merged 16 commits into from
Mar 28, 2018

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Mar 20, 2018

As a followup to SE-0143, this adds conditional conformance to Hashable to the following types in the standard library:

  • Optional
  • Array, ContiguousArray, ArraySlice
  • Dictionary
  • Range, ClosedRange

This makes synthesized Hashable implementations (SE-0185) available for structs and enums that include fields of these types, freeing programmers from having to manually deal with hash combinators.

Resolves SR-6910.

@lorentey
Copy link
Member Author

lorentey commented Mar 20, 2018

This is an updated version of #14527, currently blocked by swiftlang/swift-evolution#808

@lorentey
Copy link
Member Author

/cc @mortenbekditlevsen, @xwu

@lorentey
Copy link
Member Author

@swift-ci test

@lorentey
Copy link
Member Author

@swift-ci please smoke test compiler performance

@lorentey
Copy link
Member Author

@swift-ci test source compatibility

@swift-ci
Copy link
Contributor

Build comment file:

Summary for master smoketest

No regressions above thresholds

Debug

debug brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
name old new delta delta_pct
LLVM.NumLLVMBytesOutput 46,905,801 46,906,292 491 0.0%
time.swift-driver.wall 79.3s 78.7s -615.7ms -0.78%

debug detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 113,826 113,826 0 0.0%
AST.NumLoadedModules 16,158 16,158 0 0.0%
AST.NumTotalClangImportedEntities 333,024 333,024 0 0.0%
AST.NumUsedConformances 9,439 9,439 0 0.0%
IRModule.NumIRBasicBlocks 135,500 135,500 0 0.0%
IRModule.NumIRFunctions 80,152 80,152 0 0.0%
IRModule.NumIRGlobals 74,212 74,212 0 0.0%
IRModule.NumIRInsts 1,490,570 1,490,570 0 0.0%
IRModule.NumIRValueSymbols 137,546 137,546 0 0.0%
LLVM.NumLLVMBytesOutput 46,905,801 46,906,292 491 0.0%
SILModule.NumSILGenFunctions 51,279 51,279 0 0.0%
SILModule.NumSILOptFunctions 91,858 91,858 0 0.0%
Sema.NumConformancesDeserialized 463,018 464,300 1,282 0.28%
Sema.NumConstraintScopes 1,040,689 1,040,689 0 0.0%
Sema.NumDeclsDeserialized 3,049,476 3,054,632 5,156 0.17%
Sema.NumDeclsValidated 184,368 184,368 0 0.0%
Sema.NumFunctionsTypechecked 64,396 64,396 0 0.0%
Sema.NumGenericSignatureBuilders 85,502 85,936 434 0.51%
Sema.NumLazyGenericEnvironments 605,871 607,157 1,286 0.21%
Sema.NumLazyGenericEnvironmentsLoaded 50,595 50,595 0 0.0%
Sema.NumLazyIterableDeclContexts 438,968 440,268 1,300 0.3%
Sema.NumTypesDeserialized 3,276,192 3,281,584 5,392 0.16%
Sema.NumTypesValidated 273,522 273,522 0 0.0%

Release

release brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
name old new delta delta_pct
LLVM.NumLLVMBytesOutput 38,806,636 38,806,632 -4 -0.0%
time.swift-driver.wall 174.2s 174.2s 4.0ms 0.0%

release detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 11,354 11,354 0 0.0%
AST.NumLoadedModules 462 462 0 0.0%
AST.NumTotalClangImportedEntities 31,515 31,515 0 0.0%
AST.NumUsedConformances 9,444 9,444 0 0.0%
IRModule.NumIRBasicBlocks 174,723 174,723 0 0.0%
IRModule.NumIRFunctions 61,154 61,154 0 0.0%
IRModule.NumIRGlobals 59,706 59,706 0 0.0%
IRModule.NumIRInsts 1,462,280 1,462,280 0 0.0%
IRModule.NumIRValueSymbols 109,164 109,164 0 0.0%
LLVM.NumLLVMBytesOutput 38,806,636 38,806,632 -4 -0.0%
SILModule.NumSILGenFunctions 26,022 26,022 0 0.0%
SILModule.NumSILOptFunctions 42,636 42,636 0 0.0%
Sema.NumConformancesDeserialized 156,353 156,462 109 0.07%
Sema.NumConstraintScopes 897,397 897,397 0 0.0%
Sema.NumDeclsDeserialized 258,170 258,656 486 0.19%
Sema.NumDeclsValidated 37,253 37,253 0 0.0%
Sema.NumFunctionsTypechecked 12,616 12,616 0 0.0%
Sema.NumGenericSignatureBuilders 7,664 7,697 33 0.43%
Sema.NumLazyGenericEnvironments 39,776 39,880 104 0.26%
Sema.NumLazyGenericEnvironmentsLoaded 5,092 5,092 0 0.0%
Sema.NumLazyIterableDeclContexts 24,619 24,728 109 0.44%
Sema.NumTypesDeserialized 314,760 315,208 448 0.14%
Sema.NumTypesValidated 44,453 44,453 0 0.0%

@lorentey lorentey added the swift evolution pending discussion Flag → feature: A feature that has a Swift evolution proposal currently in review label Mar 22, 2018
mortenbekditlevsen and others added 16 commits March 23, 2018 19:08
…d Array types. (swiftlang#14247)

* Add conditional Hashable conformance to Optional, Dictionary, Array, ArraySlice and ContiguousArray

* Modified hashValue implementations
The hashValues are now calculated similar to the automatically synthesized values when conforming to Hashable.
This entails using _combineHashValues as values of the collections are iterated - as well as calling _mixInt before returning the hash.

* Added FIXMEs as suggested by Max Moiseev

* Use checkHashable to check Hashable conformance

* Use 2 space indentation

* Hashing of Dictionary is now independent of traversal order

* Added a test to proof failure of (previous) wrong implementation of Dictionary hashValue. Unfortunately it does not work.

* Removed '_mixInt' from 'hashValue' implementation of Optional and Array types based on recommendations from lorentey

* Another attempt at detecting bad hashing due to traversal order

* Dictionary Hashable validation tests now detect bad hashing due to dependence on traversal order

* Removed superfluous initial _mixInt call for Dictionary hashValue implementation.

* Add more elements to dictionary in test to increase the number of possible permutations - making it more likely to detect order-dependent hashes

* Added Hashable conformance to CollectionOfOne, EmptyCollection and Range types

* Fix indirect referral to the only member of CollectionOfOne

* Re-added Hashable conformance to Range after merge from master

* Change hashValue based on comment from @lorentey

* Remove tests for conditional Hashable conformance for Range types. This is left for a followup PR

* Added tests for CollectionOfOne and EmptyCollection

* Added conditional conformance fo Equatable and Hashable for DictionaryLiteral. Added tests too.

* Added conditional Equatable and Hashable conformance to Slice

* Use 'elementsEqual' for Slice equality operator

* Fixed documentation comment and indentation

* Fix DictionaryLiteral equality implementation

* Revert "Fix DictionaryLiteral equality implementation"

This reverts commit 7fc1510.

* Fix DictionaryLiteral equality implementation

* Use equalElements(:by:) to compare DictionaryLiteral elements

* Added conditional conformance for Equatable and Hashable to AnyCollection

* Revert "Use 'elementsEqual' for Slice equality operator"

This reverts commit 0ba2278.

* Revert "Added conditional Equatable and Hashable conformance to Slice"

This reverts commit 84f9934.

* Added conditional conformance for Equatable and Hashable for ClosedRange
…tions

Now that Array and Dictionary conform to Hashable, we need to make sure that their bridged counterparts provide the same hash values when converted to AnyHashable.
This assumes these will land in Swift 4.1; the attributes need to be adjusted if that turns out not to be the case.

It seems @available for protocol conformances is not yet functional. I added attributes for those anyway, marked with FIXME(conformance-availability).

# Conflicts:
#	stdlib/public/core/ExistentialCollection.swift.gyb
#	stdlib/public/core/Mirror.swift
…texts yet

Array and Dictionary are now conditionally Hashable, so the importer wants to use them when importing NSArray and NSDictionary types in hashable contexts. Unfortunately, this currently means that a type like

    NSSet<NSDictionary<NSString *, id> *> *

gets imported as

    Set<Dictionary<String, Any>>,

which is invalid — Dictionary.Value needs to be Hashable, too:

    Set<Dictionary<String, AnyHashable>>

For now, work around this by explicitly turning NSArray and NSDictionary into AnyHashable when they are used as the first type parameter of NSSet or NSDictionary, ignoring Hashable conformance in this case. This reverts to the previous behavior.
When implementing a protocol requirement, additional doc comments are
unnecessary unless there are additional semantics peculiar to the specific
implementation.

The manually propagated documentation had several problems, including copypasta
and inconsistent terminology.

Therefore, if `T.==` doesn't have any doc comments and the implementation of
`T.hashValue` is unremarkable, remove the unnecessary doc comment.
This removes Hashable conformance for types that do not already implement Equatable:

- CollectionOfOne
- AnyCollection
- DictionaryLiteral
Implement _hash(into:) rather than hashValue.
@lorentey lorentey force-pushed the conditional-hashable2 branch from 599b4bc to bb60059 Compare March 23, 2018 19:24
@lorentey
Copy link
Member Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - bb60059

@lorentey
Copy link
Member Author

@swift-ci please test os x platform

@airspeedswift
Copy link
Member

@swift-ci please test macOS platform

@airspeedswift airspeedswift self-requested a review March 26, 2018 21:15
@airspeedswift airspeedswift changed the title 🛑[stdlib] Conditionally conform stdlib types to Hashable [stdlib] Conditionally conform stdlib types to Hashable Mar 26, 2018
@airspeedswift
Copy link
Member

@swift-ci please smoke test compiler performance

@airspeedswift airspeedswift requested a review from parkera March 26, 2018 21:25
@airspeedswift
Copy link
Member

FYI @parkera there is a slight tweak to NSArray for importer purposes.

@lorentey lorentey added swift evolution approved Flag → feature: A feature that was approved through the Swift evolution process and removed swift evolution pending discussion Flag → feature: A feature that has a Swift evolution proposal currently in review labels Mar 26, 2018
@swift-ci
Copy link
Contributor

Build comment file:

Summary for master smoketest

No regressions above thresholds

Debug

debug brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
name old new delta delta_pct
LLVM.NumLLVMBytesOutput 47,056,296 47,056,910 614 0.0%
time.swift-driver.wall 79.0s 79.0s 28.2ms 0.04%

debug detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 113,826 113,826 0 0.0%
AST.NumLoadedModules 16,158 16,158 0 0.0%
AST.NumTotalClangImportedEntities 337,992 337,992 0 0.0%
AST.NumUsedConformances 9,443 9,443 0 0.0%
IRModule.NumIRBasicBlocks 135,979 135,979 0 0.0%
IRModule.NumIRFunctions 80,256 80,256 0 0.0%
IRModule.NumIRGlobals 74,295 74,295 0 0.0%
IRModule.NumIRInsts 1,501,778 1,501,778 0 0.0%
IRModule.NumIRValueSymbols 137,726 137,726 0 0.0%
LLVM.NumLLVMBytesOutput 47,056,296 47,056,910 614 0.0%
SILModule.NumSILGenFunctions 49,967 49,967 0 0.0%
SILModule.NumSILOptFunctions 90,409 90,409 0 0.0%
Sema.NumConformancesDeserialized 472,076 473,358 1,282 0.27%
Sema.NumConstraintScopes 980,790 980,790 0 0.0%
Sema.NumDeclsDeserialized 3,041,418 3,046,574 5,156 0.17%
Sema.NumDeclsValidated 162,677 162,677 0 0.0%
Sema.NumFunctionsTypechecked 64,396 64,396 0 0.0%
Sema.NumGenericSignatureBuilders 85,006 85,440 434 0.51%
Sema.NumLazyGenericEnvironments 574,090 575,380 1,290 0.22%
Sema.NumLazyGenericEnvironmentsLoaded 50,027 50,027 0 0.0%
Sema.NumLazyIterableDeclContexts 437,052 438,352 1,300 0.3%
Sema.NumTypesDeserialized 3,265,161 3,270,553 5,392 0.17%
Sema.NumTypesValidated 272,379 272,379 0 0.0%

Release

release brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
name old new delta delta_pct
LLVM.NumLLVMBytesOutput 38,466,338 38,466,338 0 0.0%
time.swift-driver.wall 171.1s 171.2s 114.7ms 0.07%

release detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 11,354 11,354 0 0.0%
AST.NumLoadedModules 462 462 0 0.0%
AST.NumTotalClangImportedEntities 32,031 32,031 0 0.0%
AST.NumUsedConformances 9,448 9,448 0 0.0%
IRModule.NumIRBasicBlocks 177,671 177,671 0 0.0%
IRModule.NumIRFunctions 61,695 61,695 0 0.0%
IRModule.NumIRGlobals 60,186 60,186 0 0.0%
IRModule.NumIRInsts 1,480,728 1,480,728 0 0.0%
IRModule.NumIRValueSymbols 110,185 110,185 0 0.0%
LLVM.NumLLVMBytesOutput 38,466,338 38,466,338 0 0.0%
SILModule.NumSILGenFunctions 25,070 25,070 0 0.0%
SILModule.NumSILOptFunctions 42,180 42,180 0 0.0%
Sema.NumConformancesDeserialized 152,043 152,152 109 0.07%
Sema.NumConstraintScopes 839,227 839,227 0 0.0%
Sema.NumDeclsDeserialized 255,520 256,006 486 0.19%
Sema.NumDeclsValidated 31,065 31,065 0 0.0%
Sema.NumFunctionsTypechecked 12,616 12,616 0 0.0%
Sema.NumGenericSignatureBuilders 7,610 7,643 33 0.43%
Sema.NumLazyGenericEnvironments 37,647 37,751 104 0.28%
Sema.NumLazyGenericEnvironmentsLoaded 5,033 5,033 0 0.0%
Sema.NumLazyIterableDeclContexts 24,467 24,576 109 0.45%
Sema.NumTypesDeserialized 312,243 312,691 448 0.14%
Sema.NumTypesValidated 43,872 43,872 0 0.0%

// Must be @nonobjc to avoid infinite recursion during bridging
@nonobjc
public func _toCustomAnyHashable() -> AnyHashable? {
return AnyHashable(self as! Dictionary<AnyHashable, AnyHashable>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct; the values might not be hashable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, don't forget NSSet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct; the values might not be hashable.

Oh, do you mean when they include _SwiftValues whose corresponding Swift types aren't Hashable? Hm, that's a good point -- and I think it applies to NSArray, too. (And maybe even NSDictionary keys and NSSet.)

I vaguely remember that this still ends up working, because we keep such values in their AnyObject form, which is guaranteed to be hashable through NSObject's -hash and -isEqual:. I'll take a closer look!

Also, don't forget NSSet?

NSSet already has a similar implementation of _HasCustomAnyHashableRepresentation elsewhere in the Foundation overlay:

extension NSSet : _HasCustomAnyHashableRepresentation {
  // Must be @nonobjc to avoid infinite recursion during bridging
  @nonobjc
  public func _toCustomAnyHashable() -> AnyHashable? {
    return AnyHashable(self as! Set<AnyHashable>)
  }
}

(Set has been Hashable from the start -- it did not need to wait for conditional conformances with that, because its contents are always guaranteed to be hashable.)

Copy link
Member Author

@lorentey lorentey Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So apparently the same issue applies to NSSet elements and NSDictionary keys -- and it means that there are NSSet/NSDictionary instances that can't be bridged to Swift. 🙊

https://bugs.swift.org/browse/SR-7284

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrose-apple It seems this opened a big can of worms. _SwiftValue instances for non-Hashable values spectacularly break language assumptions on how bridging interacts with AnyHashable not just in these new cases, but for existing ones as well.

I suggest to land this PR as is, and to track the _SwiftValue vs AnyHashable issue separately, because it reaches far outside the scope of this PR.

I suspect these _toCustomAnyHashable implementations (including the pre-existing one for NSSet) are actually fine -- the problem is that self as? Set<AnyHashable> not only fails sometimes, but unexpectedly, it also crashes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less interested in non-hashable keys than I am in non-hashable values, but maybe you're right that this isn't important in practice, since all NSObjects are Hashable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @michael-lehew in case he has any additional thoughts here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be very curious to follow on to the tracked _SwiftValue vs AnyHashable issue. I agree with @lorentey that it can be considered separately, but landing this PR really forces the issue, especially as more non-ObjC types (thinking model-layer) find their way into the value side of dictionary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! My plan is to make sure everything that can be put in an NSSet and NSDictionary can be converted to AnyHashable. This includes Swift types that aren't natively Hashable, since their NSObject boxes do implement hashing. Let's move on to SR-7284 to fix the underlying issue.

Note that while the issue is serious, it is a subtle corner case that does not normally occur in applications -- and if it does, users have an easy source-level workaround. (By simply implementing Hashable.) The additional wrinkles introduced by this PR require even more convoluted Swift->Objective-C->Swift bridging scenarios, and I believe they are even less likely to occur in practice.

public func _hash(into hasher: inout _Hasher) {
for element in self {
hasher.append(element)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit surprising to have the hash be O(n) and dependent on the contents. NSDictionary and friends have to take great care about this (in fact our -hash impl currently returns just -count) because while we do rely on NSCopying for our keys -- our values are allowed to mutate. In fact thats perhaps a very common use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking further, that might preclude being able to do this conditional conformance for Array and or Dictionary, unless there is a way to constrain mutability. Value types are fine obviously, but if you have reference keys, or reference values, you can't guarantee the conformance.

Copy link
Member Author

@lorentey lorentey Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-based hashing is very much intentional here; we consider this a feature, not a bug. The built-in hashing behavior is designed to produce robust hash values: if two instances aren't equal, they are expected to produce different hashes. (Note that String hashing also takes time that's proportional to the length of the string; and it also performs Unicode normalization.)

The mutation issue is indeed important; however, Swift approaches it in a different way, by concentrating on first-class value types. The language itself makes it impossible to mutate value-typed keys in a Dictionary, and the same mechanism protects a Dictionary's values, if the dictionary is put in a Set.

As a rule, Swift does not provide language/API-level protection against users mutating reference-typed components in a Dictionary key (or Set element) in such a way that changes the key's equality or hashing behavior. (Dictionary and Set doesn't even call -copyWithZone: for keys that happen to implement NSCopying.) This PR merely follows the existing design established for hashing collections.

Note that Dictionary only conforms to Hashable when both its Key and Value does; Hashable conformance is not automatic for reference types, and random model objects will not get unexpectedly hashed after this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually kind of with Michael on this one. Hashing Array shouldn't take linear time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(After all this I'm not sure it's worth it to make the collections Hashable at all. Optional, sure.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional observation: every successful lookup of a key in a hash table necessarily involves a successful equality check, which is O(n) for collection types like Array, Set and Dictionary. Therefore, O(n) hashing does not change our expected asymptotic performance, but it does prevent the formation of long collision chains that would make the hash table itself do lookups in time that's linear in the size of the table itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something crucial, but I fail to get the JSON example; JSON only allows string keys. Is it possible to use deeply nested structures as dictionary keys in plists? Why would someone do that?

Putting a deeply nested value type with thousands of components into a Set may or may not be slow -- if you actually need the value semantics offered by Swift's standard collection types, then I don't really see how we can be faster. Looking up a value in such a set necessarily involves a successful equality check, which will need to traverse the two structures anyway. Having another traversal for the hash value is merely an additional constant factor.

If we were to replace the hash value with the collection's count, we would make dictionary lookups quadratic -- equality checks will still be O(n), but we'd have to repeat them for each element in a collision chain, whose length can easily be proportional to the size of the hash table.

There's nothing wrong with using Sets as Dictionary keys, and there's nothing wrong with using Arrays as elements in a Set. Sure, a hash table may not always be the most appropriate data structure if its elements are large collections -- just like an Array is not always appropriate for use as a sorted collection.

Incidentally, we did touch upon this issue at the very end of the original swift-evolution thread for this feature:
https://forums.swift.org/t/let-optional-dictionary-and-array-conditionally-conform-to-hashable/9046/58

It may be a good idea to continue this discussion on that thread rather than in review comments -- not many people follow discussions in random PRs! 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point regarding keys. The common cases (outside of just a String/Int/UUID) I've seen there for folks is using a small dictionary of string-string pairs, or an array (ordered uuids).

Using just -count is horrible, I agree. We want to do something better even for the Foundation types. My point is, if there is a way to make -hash less expensive but still provide some uniqueness information that's an improvement.

Apologies for missing the SE thread, its so rare I ever get bandwidth to focus on Swift. :( I'll take a look there for context.

Copy link
Member Author

@lorentey lorentey Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, better protection against mutations in reference types and exploring performance implications of content-based hashing are both valid points that deserve discussion. We did not spend enough time on these on swift-evolution, so it's good we are at least discussing them now.

I really don't have a good answer to the mutations issue - this PR fits comfortably within the existing Swift model, but perhaps we should do more. Standalone Swift classes aren't Hashable by default, which alleviates the issue a little, but not by much. One thing Set and Dictionary may do is to at least detect invalid mutations sometime after they happen; for example, while they're rehashing elements. (Although it's unclear if a delayed error/trap would be helpful or just confusing.) Would some language level change help more? This is far beyond the scope of this PR, but it's something to ponder.

The performance concerns seem easier to tackle, because we can run benchmarks. For the small cases you mentioned, I’m absolutely positive that hashing the full contents is the only reasonable thing we can do - leaving even a mere handful of relevant bits out of the hash would open the door to runaway collision chains. (Beyond security/reliability concerns, this may also have an implication for storage overhead: I have plans to try increasing the maximum load factor from 75% to around 95% at some point, but this requires extremely robust hashing, as well as adding some extra bits to hash table entries.)

We may get away with sampling a random (but stable) fraction of Array elements when we need to hash large arrays, but given that we can't apply the same trick in ==, it’s not at all clear to me if that would make a difference that's worth the reliability costs. Similarly, Sets could maintain a running hash of their contents — the extra tradeoff there is that it would add overhead to set mutations for all sets, including the vast majority that will never be hashed. (The same applies to Dictionary.) In any case, we'll be able to tweak the implementations in this PR later; hash values are explicitly not stable across even executions of the same Swift program, and we can change the algorithm that generates them in future versions of the stdlib. (As long as we remove the inlinable attribute, which we will probably do as part of an ABI audit.)

Hashing n bits takes O(n) time; this is just how hashing works. Consequently, hashing a list of n elements is n times slower than hashing a single element. This cost may be acceptable for some applications, and prohibitive to others, but this can be said of every single operation offered by the stdlib. I don't see why we would single out Array as in need of constant-time hashing, while at the same time, we're perfectly fine with String and Data taking linear time.

  1. There are clear usecases for hashing at least small arrays/sets/dictionaries,
  2. Content-based hashing is not only the most robust choice, but it also aligns perfectly with Swift's emphasis on collections that are themselves value types, and how these collections already implement Equatable. (After all, hashValue is an approximation for one half of a ==...)
  3. Users can easily implement hash memoization and similar techniques to manually speed up hashing when necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discussion has pushed me to considering finally fixing the low numbered radar about collection -hash implementation being so poor, then NS.* and Swift will be consistent together. Thanks Karoy!

lorentey added a commit to lorentey/swift that referenced this pull request Mar 28, 2018
This is a spin-off of PR swiftlang#15382, which is currently stuck in review limbo.

Implements part of SE-0143’s Hashable amendments.

rdar://problem/38432751
@lorentey
Copy link
Member Author

I think we're now ready to land this. The review discussions turned out especially valuable -- we've uncovered several issues that we'll need to follow up on:

  1. Bridging is currently somewhat broken for non-Hashable Swift types. SR-7284, rdar://problem/38922580
  2. We need to do a better job of protecting against hash-breaking mutations of reference types inside hash keys. rdar://problem/36437535
  3. I need to run benchmarks to verify my theory that "full" hashing will match or beat the performance of "fast" hashing for array keys. If these benchmarks uncover a surprise, which they often do, we may need to somehow optimize these implementations.

Thank you @mortenbekditlevsen, @xwu for the implementation effort, and @jrose-apple, @michael-lehew, @parkera and @airspeedswift for lending us your expertise for the review. 👍

@lorentey lorentey changed the title [stdlib] Conditionally conform stdlib types to Hashable [SE-0143][stdlib] Conditionally conform stdlib types to Hashable Mar 28, 2018
@lorentey lorentey merged commit e485c15 into swiftlang:master Mar 28, 2018
@lorentey lorentey deleted the conditional-hashable2 branch March 28, 2018 20:45
@lorentey
Copy link
Member Author

Update: The benchmark chart below shows typical amortized per-element performance for the two extremes of hashing: hashing the full contents ("Full") vs just using a constant value ("Fast"), when working with 10-element integer arrays inside a Set. (Note that the scales are logarithmic; the horizontal axis is the size of the Set.)

The workload is (1) inserting n random 10-element integer arrays into a set, (2) looking them up in a random order, then (3) removing them in the same random order.

arrayhashing - 2 tasks

As expected, "full" hashing delivers linear performance, while "fast" hashing is quadratic. (Every measurement shown has been divided by the element count; this is why "Full" looks constant, and "Fast" linear.) The hashing algorithm used here (SipHash-1-3) is quite expensive compared to an equality check between integers -- this explains the inverse gap between "Full" and "Fast" for the smaller sizes. (I could close the gap by selecting a simpler algorithm, but SipHash is what the stdlib uses at the moment, so it's interesting to see how it performs. This benchmark doesn't use resilient hashing -- enabling that would widen that gap even further.)

This initial gap can be made arbitrarily large by increasing the size of the arrays -- both "Fast" and "Full" hashing slows down linearly as the arrays grow, but "Fast" does it much slower. I think this is fine for now; later on, we can optimize hashing by using simpler hash functions for small hash tables, and better, slower ones for larger ones:

  • For the smallest sizes (say, a few dozen elements), we could completely disable hashing and treat the hash table as a flat unsorted array.
  • For mid-size tables (a few hundred elements or so), we could use full hashing with a fast but insecure algorithm
  • For anything larger than that, we would use an "industrial-strength" hasher like SipHash.

We expect to change Swift's hashing APIs so that the hash function will be selected by the hashing collection rather than the keys, so this is doable. (You can already see a version of the potential new API in the Hashable implementations in this PR.)

@@ -106,6 +106,14 @@ extension Array : _ObjectiveCBridgeable {
}
}

extension NSArray : _HasCustomAnyHashableRepresentation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parkera I just saw this, does it need bringing to SCLF?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we will need to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this! Here is a list of _HasCustomAnyHashableRepresentation conformances in the Foundation overlay that are currently missing in open-source Foundation:

NSAffineTransform
NSArray
NSCalendar
NSCharacterSet
NSDate
NSDateComponents
NSDateInterval
NSDictionary
NSLocale
NSNotification
NSNumber
NSSet
NSString
NSTimeZone
NSURL
NSURLComponents
NSURLQueryItem
NSURLRequest

This PR added NSArray and NSDictionary to the list. The intention behind _HCAHR conformances is to have the AnyHashable representation of these classes hash the same way as the corresponding Swift value type. (I don't know if this is as much of a concern in swift-corelibs-foundation as it is in Cocoa, though.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that's quite a long list. @parkera how do you suggest we proceed with resolving this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's only really critical if/when bridging is enabled on Linux; otherwise, it's okay for, say, NSString and String to hash differently when put into an AnyHashable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parkera It looks like most of these implementations would easily translate over to corelibs. as-conversions won't work without bridging, but it seems straightforward to replace them with explicit conversions.

I expect most of the effort would be spent on writing new tests to cover the new functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Part of the deal of contributing to the Foundation part of the overlay is that we also need those changes for Linux. We can't let it lag behind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; I created a bug to track this effort: https://bugs.swift.org/browse/SR-7436

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lorentey - are you going to resolve this bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianpartridge It turns out conforming to _HCAHR currently has no effect on platforms without the Objective-C runtime. I'll look into whether this can be lifted (as part of SR-7436). In the meantime, I don't think it's worth adding these conformances -- it'd be dead code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
swift evolution approved Flag → feature: A feature that was approved through the Swift evolution process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants