-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest the minimal type disambiguation when an overload doesn't have any unique types #1087
Suggest the minimal type disambiguation when an overload doesn't have any unique types #1087
Conversation
rdar://136207880
Also, ignore "sparse" nodes (without IDs)
This is impressive! I'll try to thumb through this today to give it a review.
TBH i'm fine with sneaking this in, if only because it adds the opportunity to clean up a little bit of the code i added in #928 😅 |
@swift-ci please test |
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work!
Thank you for the series of separate commits explaining your progress improving the performance. We could convert this into a sample app on Swift performance - you used so many different techniques here!
I left a few questions and my usual readability suggestions. The only real suggestion I have, which you alluded to in the PR description, is that maybe it's not worth adding Swift Algorithms just for a few extremely unusual scenarios (symbols with over 64 parameters or return types) that we won't hit very often. When we do hit them, maybe an old-fashioned disambiguation hash would be fine. Aside from avoiding the new dependency, if you remove that bit you could also avoid the complexity around the _IntSet
and _Combination
private protocols.
Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+DisambiguatedPaths.swift
Outdated
Show resolved
Hide resolved
...ces/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+TypeSignatureDisambiguation.swift
Show resolved
Hide resolved
Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+DisambiguatedPaths.swift
Outdated
Show resolved
Hide resolved
Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+DisambiguatedPaths.swift
Outdated
Show resolved
Hide resolved
Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+DisambiguatedPaths.swift
Outdated
Show resolved
Hide resolved
|
||
@inlinable | ||
subscript(row: Int, column: Int) -> Element { | ||
_read { yield storage[index(row: row, column: column)] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does _read
do here? And what about yield
? I'm not familiar with these. Are you handling multithreaded access to storage
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify and read accessors is a feature that's been pitched since 2019. With a few other recently approved proposals I'm hoping that it finally gets approved this time around but until then we're using _read
and _modify
in a few key places to for performance.
Basically, the "read" and "modify" accessors and the yield
keyword enable values to be accessed and changed without requiring a copy by temporarily holding exclusive access to the value. For container types like dictionaries and arrays this can avoid surprises where the compiler needs to make a large copy when it can't guarantee unique access to a value.
This section of Ben Cohen's talk from the Functional Swift conference 5 years ago gives a good explanation of how _modify
works (_read
is very similar but without mutating).
let capacity = width * height | ||
storage = try .init(unsafeUninitializedCapacity: capacity) { buffer, initializedCount in | ||
var wrappedBuffer = UnsafeMutableTableBufferPointer(width: width, wrapping: buffer) | ||
try initializer(&wrappedBuffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that Swift allows you to call a closure like this, while executing an initializer, passing in a pointer to the storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swift knows that this closure doesn't "escape" the initializer—same as Array/init(unsafeUninitializedCapacity:initializingWith:)
which its calling. This initializer repeats the pattern from that initializer.
Since the closure's lifetime is limited by the scope of the initializer it's easier for the compiler to reason about what the closure is doing with the shared pointer.
About the only misuse that the caller could do with the closure is to hold on to the pointer after the initializer call. There's a accepted proposal to allow types like this pointer to be marked non-escapable (SE-0446) to make that misuse a compiler error as well as related proposal to introduce a safe access to contiguous memory (instead of Unsafe[Mutable][Buffer]Pointer
) (SE-0447)]
...ces/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+TypeSignatureDisambiguation.swift
Show resolved
Hide resolved
...ces/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+TypeSignatureDisambiguation.swift
Show resolved
Hide resolved
...ces/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+TypeSignatureDisambiguation.swift
Show resolved
Hide resolved
- Use plural for local variable with array value - Explicitly initialize optional local variable with `nil` - Add assert about passing empty type names - Explain what `nil` return value means in local code comment - Add comment indicating where `_TinySmallValueIntSet` is defined - Use + to join two string instead of string interpolation - Use named arguments for `makeDisambiguation` closure
…ames that disambiguate an overload
@swift-ci please test |
Great work 👏 - thanks for the tweaks, comments and explanations here in the PR! I'm going to bookmark this one :) |
…ack to hash disambiguation
For the extremely rare case of overloads with more than 64 parameters we only try disambiguation by a single parameter type name.
@swift-ci please test |
Bug/issue #, if applicable: rdar://136207880
Summary
This improves the suggested parameter type and return value type disambiguation to suggest the minimal amount of disambiguation when an overload doesn't have any unique types.
For example, consider these 3 overloads:
Currently, because the first overload is the only one with
String
as its first parameter type and the third overload is the one one withFloat
as its last parameter type, DocC suggests readable type signature disambiguation for those two overloads. However, because the second overload requires a combination of two type names to uniquely disambiguate it, DocC doesn't currently suggest a type signature for disambiguation and instead falls back to a hash for disambiguation:Note: DocC can unambiguously resolve
``doSomething(first:second:third:)-(String?,_,Double)``
as a link to the second overload but it doesn't make that suggestion when the link is ambiguous.With the enhancements in this PR, DocC will find the fewest and shortest type names that uniquely disambiguate each overload and suggest that:
Also, with the enhancements in this PR, DocC is able to suggest type signature disambiguation that combines parameter types and return types.
For example, consider these 3 overloads (where the previous third parameter is a return value instead):
The first overload is the only with a
String
type for its first parameter and the third overload is the only with aFloat
return value but the second overload doesn't have either unique parameter types (same as the third overload) or return types (same as the first overload). In this case DocC suggests a combination of parameter types and return types (-(String?,_)->Double
) to uniquely disambiguate the second overload:Details
The first commit of this PR implemented the general support for suggesting combinations of type names as disambiguation and the second commit of this PR extended that support to combinations of parameter types and return value types. All the other changes in this PR is to ensure that this code is very fast.
Performance
To measure the performance of this code, I extracted an example overload group of 8 symbols with 6 parameters from a project and computed the suggested disambiguation for an ambiguous link to that symbol 50k times. This is a rather high number to measure since it's very unlikely that a project would have 50k ambiguous links.
However, because
docc process-catalog emit-generated-curation
computes the suggested disambiguation for every symbol in a project, it's possible that this code would run a few thousands of times. (Running 50k times would require that the target has 50k distinct overload groups which would require at least 100k symbols (if each group is only two symbols) or many hundreds of thousands of symbols if each overload groups contains more symbols).Initially, calling
PathHierarchy.DisambiguationContainer.disambiguatedValues(...)
for these 8 symbols 50k times took just short of 3 seconds.The initial bottleneck was my naive code for computing the sequence of type name combinations to try. Using the optimized combinations code in Swift Algorithms brought this down to about 2 seconds.
After that, most of the time was spent accessing and modifying
Set<Int>
to track which overloads' type names occur in other overloads and to track which type names to try to use for disambiguation. I realized that the numbers these sets were tracking were always in the range0 ..< numberOfTypeNames
.My intuition is that most functions/initializers/subscripts/etc. only have a handful of parameters and that the number of symbols quickly goes down as the number of parameters does up. Tangential data seems to support this. For example, the number of symbols with different number of parameters (greater than 0) in the Swift standard library looks like this with a logarithmic Y-axis:
(The 8, 16, 32, and 64-parameter outliers are each a single symbol (the element-wise initializers for SIMD8, SIMD16, SIMD32, and SIMD64) meaning that there's no other symbols with the same number of parameters to disambiguate from, ignoring that the symbols would also need to have the same names to need disambiguation)
The same distribution—also with a logarithmic Y-axis—for Foundation looks like this:
For an unnamed C library with lots of overloads it looks like this:
And for two large unnamed frameworks the distributions look like this:
This anecdotal data seems to support the idea that an optimization specific to overloads with "few" parameters would be applicable in the common case.
Since the common optimized case only needs to consider the numbers
0 ..< 64
, I implemented a specialized set-algebra type that stores each number in aUInt64
bit set, which brought down the time to about 1.1 second.Now, most of the time was spent computing the sequence of combinations to try. By leveraging the fact that all possible combinations of the specialized bit-set type can be generated by looping over a range of integers and sorting them by
.nonzeroBitCount
, a specialized combinations generator could bring down the time to about 850 ms.The next noteworthy bottleneck was creating and accessing the nested
[[IntSet]]
to determine if a combination of type names is ambiguous or not. I added a specializedTable<Element>
type that stores its elements in a single contiguous storage, which brought down the time to about 600 ms.This was already 5 times faster than the original implementation, and I should have probably stopped here. However, through a series of many smaller improvements I brought the time down to about 350 ms.
At this point, only 25% of the time spent is computing the minimal disambiguation and the remaining 75% is in higher level code that:
This is where I decided to stop 😄
Despite all the low level optimizations I still find the code fairly easy to read and follow. We'll see how I feel about that in a year 🤞.
Dependencies
This adds Swift Algorithms as a package dependency.
However, it is only used in the very rare case when a group of overloaded symbols have more than 64 parameters or when there are more than 64 different symbols in an overload group. If we prefer to not add a new dependency we could remove it and use a different implementation for large overload groups with large numbers of parameters.
For example, we could consider only checking those 64+ parameter overloads for disambiguation involving either one or two different parameters. Such a combinations sequence is not too hard to implement and the only impact would be that the very very very rare case would fallback to suggesting hash disambiguation when we could have found a type disambiguation if we checked more combinations.
We could also consider not even trying to find type disambiguation for 64+ parameters because arguably,
-(_,_,_,_,_,_,_,_,_,_,_,_,_,String,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,Int,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,)
has its own readability problemsTesting
doSomething(first:second:third:)
.doSomething(first:second:)
.Checklist
Make sure you check off the following items. If they cannot be completed, provide a reason.
./bin/test
script and it succeeded[ ] Updated documentation if necessary.I'll add documentation separately. This is tracked by rdar://136207820