Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest the minimal type disambiguation when an overload doesn't have any unique types #1087

Merged

Conversation

d-ronnqvist
Copy link
Contributor

@d-ronnqvist d-ronnqvist commented Nov 6, 2024

Bug/issue #, if applicable: rdar://136207880

Summary

This improves the suggested parameter type and return value type disambiguation to suggest the minimal amount of disambiguation when an overload doesn't have any unique types.

For example, consider these 3 overloads:

func doSomething(first: String,  second: Int, third: Double) {}
func doSomething(first: String?, second: Int, third: Double) {}
func doSomething(first: String?, second: Int, third: Float)  {}

Currently, because the first overload is the only one with String as its first parameter type and the third overload is the one one with Float as its last parameter type, DocC suggests readable type signature disambiguation for those two overloads. However, because the second overload requires a combination of two type names to uniquely disambiguate it, DocC doesn't currently suggest a type signature for disambiguation and instead falls back to a hash for disambiguation:

Screenshot 2024-11-06 at 13 46 54

Note: DocC can unambiguously resolve ``doSomething(first:second:third:)-(String?,_,Double)`` as a link to the second overload but it doesn't make that suggestion when the link is ambiguous.

With the enhancements in this PR, DocC will find the fewest and shortest type names that uniquely disambiguate each overload and suggest that:

Screenshot 2024-11-06 at 13 53 56

Also, with the enhancements in this PR, DocC is able to suggest type signature disambiguation that combines parameter types and return types.

For example, consider these 3 overloads (where the previous third parameter is a return value instead):

func doSomething(first: String,  second: Int) -> Double { 0.0 }
func doSomething(first: String?, second: Int) -> Double { 0.0 }
func doSomething(first: String?, second: Int) -> Float  { 0.0 }

The first overload is the only with a String type for its first parameter and the third overload is the only with a Float return value but the second overload doesn't have either unique parameter types (same as the third overload) or return types (same as the first overload). In this case DocC suggests a combination of parameter types and return types (-(String?,_)->Double) to uniquely disambiguate the second overload:

Screenshot 2024-11-06 at 13 58 57

Details

The first commit of this PR implemented the general support for suggesting combinations of type names as disambiguation and the second commit of this PR extended that support to combinations of parameter types and return value types. All the other changes in this PR is to ensure that this code is very fast.

Performance

To measure the performance of this code, I extracted an example overload group of 8 symbols with 6 parameters from a project and computed the suggested disambiguation for an ambiguous link to that symbol 50k times. This is a rather high number to measure since it's very unlikely that a project would have 50k ambiguous links.

However, because docc process-catalog emit-generated-curation computes the suggested disambiguation for every symbol in a project, it's possible that this code would run a few thousands of times. (Running 50k times would require that the target has 50k distinct overload groups which would require at least 100k symbols (if each group is only two symbols) or many hundreds of thousands of symbols if each overload groups contains more symbols).

Initially, calling PathHierarchy.DisambiguationContainer.disambiguatedValues(...) for these 8 symbols 50k times took just short of 3 seconds.

The initial bottleneck was my naive code for computing the sequence of type name combinations to try. Using the optimized combinations code in Swift Algorithms brought this down to about 2 seconds.

After that, most of the time was spent accessing and modifying Set<Int> to track which overloads' type names occur in other overloads and to track which type names to try to use for disambiguation. I realized that the numbers these sets were tracking were always in the range 0 ..< numberOfTypeNames.

My intuition is that most functions/initializers/subscripts/etc. only have a handful of parameters and that the number of symbols quickly goes down as the number of parameters does up. Tangential data seems to support this. For example, the number of symbols with different number of parameters (greater than 0) in the Swift standard library looks like this with a logarithmic Y-axis:

Screenshot 2024-11-06 at 15 13 15

(The 8, 16, 32, and 64-parameter outliers are each a single symbol (the element-wise initializers for SIMD8, SIMD16, SIMD32, and SIMD64) meaning that there's no other symbols with the same number of parameters to disambiguate from, ignoring that the symbols would also need to have the same names to need disambiguation)

The same distribution—also with a logarithmic Y-axis—for Foundation looks like this:

Screenshot 2024-11-06 at 15 15 20

For an unnamed C library with lots of overloads it looks like this:

Screenshot 2024-11-06 at 15 17 13

And for two large unnamed frameworks the distributions look like this:

Screenshot 2024-11-06 at 15 18 31 Screenshot 2024-11-06 at 15 21 22

This anecdotal data seems to support the idea that an optimization specific to overloads with "few" parameters would be applicable in the common case.

Since the common optimized case only needs to consider the numbers 0 ..< 64, I implemented a specialized set-algebra type that stores each number in a UInt64 bit set, which brought down the time to about 1.1 second.

Now, most of the time was spent computing the sequence of combinations to try. By leveraging the fact that all possible combinations of the specialized bit-set type can be generated by looping over a range of integers and sorting them by .nonzeroBitCount, a specialized combinations generator could bring down the time to about 850 ms.

The next noteworthy bottleneck was creating and accessing the nested [[IntSet]] to determine if a combination of type names is ambiguous or not. I added a specialized Table<Element> type that stores its elements in a single contiguous storage, which brought down the time to about 600 ms.

This was already 5 times faster than the original implementation, and I should have probably stopped here. However, through a series of many smaller improvements I brought the time down to about 350 ms.

At this point, only 25% of the time spent is computing the minimal disambiguation and the remaining 75% is in higher level code that:

  • tracks which elements have already computed disambiguation so that we don't suggest type disambiguation for an element that can be disambiguated by its symbol kind
  • attempts to disambiguate by symbol kind
  • groups elements by number of parameters so that the low level code can compute the minimal disambiguation for each group.

This is where I decided to stop 😄

Despite all the low level optimizations I still find the code fairly easy to read and follow. We'll see how I feel about that in a year 🤞.

Dependencies

This adds Swift Algorithms as a package dependency.

However, it is only used in the very rare case when a group of overloaded symbols have more than 64 parameters or when there are more than 64 different symbols in an overload group. If we prefer to not add a new dependency we could remove it and use a different implementation for large overload groups with large numbers of parameters.

For example, we could consider only checking those 64+ parameter overloads for disambiguation involving either one or two different parameters. Such a combinations sequence is not too hard to implement and the only impact would be that the very very very rare case would fallback to suggesting hash disambiguation when we could have found a type disambiguation if we checked more combinations.

We could also consider not even trying to find type disambiguation for 64+ parameters because arguably, -(_,_,_,_,_,_,_,_,_,_,_,_,_,String,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,Int,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,) has its own readability problems

Testing

  1. Define 3 functions (either top-level or within a class or struct) like these
func doSomething(first: String,  second: Int, third: Double) {}
func doSomething(first: String?, second: Int, third: Double) {}
func doSomething(first: String?, second: Int, third: Float)  {}
  1. From some other symbol, write a link to doSomething(first:second:third:).
  • The warning about the ambiguous link should suggest parameter disambiguation for all 3 overloads.
    Screenshot 2024-11-06 at 13 53 56
  1. Define 3 new functions (either top-level or within a class or struct) like these
func doSomething(first: String,  second: Int) -> Double { 0.0 }
func doSomething(first: String?, second: Int) -> Double { 0.0 }
func doSomething(first: String?, second: Int) -> Float  { 0.0 } 
  1. From some other symbol, write a link to doSomething(first:second:).
  • The warning about the ambiguous link should suggest parameter disambiguation, return value disambiguation, or combined parameter and return value disambiguation for all 3 overloads.
    Screenshot 2024-11-06 at 13 58 57

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • [ ] Updated documentation if necessary. I'll add documentation separately. This is tracked by rdar://136207820

@QuietMisdreavus
Copy link
Contributor

This is impressive! I'll try to thumb through this today to give it a review.

This adds Swift Algorithms as a package dependency.

TBH i'm fine with sneaking this in, if only because it adds the opportunity to clean up a little bit of the code i added in #928 😅

@d-ronnqvist d-ronnqvist changed the title Suggest the minimal type disambiguation when an overload doesn't any unique types. Suggest the minimal type disambiguation when an overload doesn't have any unique types Nov 6, 2024
@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

Thank you for the series of separate commits explaining your progress improving the performance. We could convert this into a sample app on Swift performance - you used so many different techniques here!

I left a few questions and my usual readability suggestions. The only real suggestion I have, which you alluded to in the PR description, is that maybe it's not worth adding Swift Algorithms just for a few extremely unusual scenarios (symbols with over 64 parameters or return types) that we won't hit very often. When we do hit them, maybe an old-fashioned disambiguation hash would be fine. Aside from avoiding the new dependency, if you remove that bit you could also avoid the complexity around the _IntSet and _Combination private protocols.


@inlinable
subscript(row: Int, column: Int) -> Element {
_read { yield storage[index(row: row, column: column)] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does _read do here? And what about yield ? I'm not familiar with these. Are you handling multithreaded access to storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify and read accessors is a feature that's been pitched since 2019. With a few other recently approved proposals I'm hoping that it finally gets approved this time around but until then we're using _read and _modify in a few key places to for performance.

Basically, the "read" and "modify" accessors and the yield keyword enable values to be accessed and changed without requiring a copy by temporarily holding exclusive access to the value. For container types like dictionaries and arrays this can avoid surprises where the compiler needs to make a large copy when it can't guarantee unique access to a value.

This section of Ben Cohen's talk from the Functional Swift conference 5 years ago gives a good explanation of how _modify works (_read is very similar but without mutating).

let capacity = width * height
storage = try .init(unsafeUninitializedCapacity: capacity) { buffer, initializedCount in
var wrappedBuffer = UnsafeMutableTableBufferPointer(width: width, wrapping: buffer)
try initializer(&wrappedBuffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that Swift allows you to call a closure like this, while executing an initializer, passing in a pointer to the storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swift knows that this closure doesn't "escape" the initializer—same as Array/init(unsafeUninitializedCapacity:initializingWith:) which its calling. This initializer repeats the pattern from that initializer.

Since the closure's lifetime is limited by the scope of the initializer it's easier for the compiler to reason about what the closure is doing with the shared pointer.

About the only misuse that the caller could do with the closure is to hold on to the pointer after the initializer call. There's a accepted proposal to allow types like this pointer to be marked non-escapable (SE-0446) to make that misuse a compiler error as well as related proposal to introduce a safe access to contiguous memory (instead of Unsafe[Mutable][Buffer]Pointer) (SE-0447)]

- Use plural for local variable with array value
- Explicitly initialize optional local variable with `nil`
- Add assert about passing empty type names
- Explain what `nil` return value means in local code comment
- Add comment indicating where `_TinySmallValueIntSet` is defined
- Use + to join two string instead of string interpolation
- Use named arguments for `makeDisambiguation` closure
@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@patshaughnessy
Copy link
Contributor

Great work 👏 - thanks for the tweaks, comments and explanations here in the PR! I'm going to bookmark this one :)

@d-ronnqvist
Copy link
Contributor Author

@swift-ci please test

@d-ronnqvist d-ronnqvist merged commit 3cfeeb6 into swiftlang:main Nov 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants