Suggest the minimal type disambiguation when an overload doesn't have any unique types #1087

d-ronnqvist · 2024-11-06T16:07:17Z

Bug/issue #, if applicable: rdar://136207880

Summary

This improves the suggested parameter type and return value type disambiguation to suggest the minimal amount of disambiguation when an overload doesn't have any unique types.

For example, consider these 3 overloads:

func doSomething(first: String,  second: Int, third: Double) {}
func doSomething(first: String?, second: Int, third: Double) {}
func doSomething(first: String?, second: Int, third: Float)  {}

Currently, because the first overload is the only one with String as its first parameter type and the third overload is the one one with Float as its last parameter type, DocC suggests readable type signature disambiguation for those two overloads. However, because the second overload requires a combination of two type names to uniquely disambiguate it, DocC doesn't currently suggest a type signature for disambiguation and instead falls back to a hash for disambiguation:

Note: DocC can unambiguously resolve ``doSomething(first:second:third:)-(String?,_,Double)`` as a link to the second overload but it doesn't make that suggestion when the link is ambiguous.

With the enhancements in this PR, DocC will find the fewest and shortest type names that uniquely disambiguate each overload and suggest that:

Also, with the enhancements in this PR, DocC is able to suggest type signature disambiguation that combines parameter types and return types.

For example, consider these 3 overloads (where the previous third parameter is a return value instead):

func doSomething(first: String,  second: Int) -> Double { 0.0 }
func doSomething(first: String?, second: Int) -> Double { 0.0 }
func doSomething(first: String?, second: Int) -> Float  { 0.0 }

The first overload is the only with a String type for its first parameter and the third overload is the only with a Float return value but the second overload doesn't have either unique parameter types (same as the third overload) or return types (same as the first overload). In this case DocC suggests a combination of parameter types and return types (-(String?,_)->Double) to uniquely disambiguate the second overload:

Details

The first commit of this PR implemented the general support for suggesting combinations of type names as disambiguation and the second commit of this PR extended that support to combinations of parameter types and return value types. All the other changes in this PR is to ensure that this code is very fast.

Performance

To measure the performance of this code, I extracted an example overload group of 8 symbols with 6 parameters from a project and computed the suggested disambiguation for an ambiguous link to that symbol 50k times. This is a rather high number to measure since it's very unlikely that a project would have 50k ambiguous links.

However, because docc process-catalog emit-generated-curation computes the suggested disambiguation for every symbol in a project, it's possible that this code would run a few thousands of times. (Running 50k times would require that the target has 50k distinct overload groups which would require at least 100k symbols (if each group is only two symbols) or many hundreds of thousands of symbols if each overload groups contains more symbols).

Initially, calling PathHierarchy.DisambiguationContainer.disambiguatedValues(...) for these 8 symbols 50k times took just short of 3 seconds.

The initial bottleneck was my naive code for computing the sequence of type name combinations to try. Using the optimized combinations code in Swift Algorithms brought this down to about 2 seconds.

After that, most of the time was spent accessing and modifying Set<Int> to track which overloads' type names occur in other overloads and to track which type names to try to use for disambiguation. I realized that the numbers these sets were tracking were always in the range 0 ..< numberOfTypeNames.

My intuition is that most functions/initializers/subscripts/etc. only have a handful of parameters and that the number of symbols quickly goes down as the number of parameters does up. Tangential data seems to support this. For example, the number of symbols with different number of parameters (greater than 0) in the Swift standard library looks like this with a logarithmic Y-axis:

(The 8, 16, 32, and 64-parameter outliers are each a single symbol (the element-wise initializers for SIMD8, SIMD16, SIMD32, and SIMD64) meaning that there's no other symbols with the same number of parameters to disambiguate from, ignoring that the symbols would also need to have the same names to need disambiguation)

The same distribution—also with a logarithmic Y-axis—for Foundation looks like this:

For an unnamed C library with lots of overloads it looks like this:

And for two large unnamed frameworks the distributions look like this:

This anecdotal data seems to support the idea that an optimization specific to overloads with "few" parameters would be applicable in the common case.

Since the common optimized case only needs to consider the numbers 0 ..< 64, I implemented a specialized set-algebra type that stores each number in a UInt64 bit set, which brought down the time to about 1.1 second.

Now, most of the time was spent computing the sequence of combinations to try. By leveraging the fact that all possible combinations of the specialized bit-set type can be generated by looping over a range of integers and sorting them by .nonzeroBitCount, a specialized combinations generator could bring down the time to about 850 ms.

The next noteworthy bottleneck was creating and accessing the nested [[IntSet]] to determine if a combination of type names is ambiguous or not. I added a specialized Table<Element> type that stores its elements in a single contiguous storage, which brought down the time to about 600 ms.

This was already 5 times faster than the original implementation, and I should have probably stopped here. However, through a series of many smaller improvements I brought the time down to about 350 ms.

At this point, only 25% of the time spent is computing the minimal disambiguation and the remaining 75% is in higher level code that:

tracks which elements have already computed disambiguation so that we don't suggest type disambiguation for an element that can be disambiguated by its symbol kind
attempts to disambiguate by symbol kind
groups elements by number of parameters so that the low level code can compute the minimal disambiguation for each group.

This is where I decided to stop 😄

Despite all the low level optimizations I still find the code fairly easy to read and follow. We'll see how I feel about that in a year 🤞.

Dependencies

This adds Swift Algorithms as a package dependency.

However, it is only used in the very rare case when a group of overloaded symbols have more than 64 parameters or when there are more than 64 different symbols in an overload group. If we prefer to not add a new dependency we could remove it and use a different implementation for large overload groups with large numbers of parameters.

For example, we could consider only checking those 64+ parameter overloads for disambiguation involving either one or two different parameters. Such a combinations sequence is not too hard to implement and the only impact would be that the very very very rare case would fallback to suggesting hash disambiguation when we could have found a type disambiguation if we checked more combinations.

We could also consider not even trying to find type disambiguation for 64+ parameters because arguably, -(_,_,_,_,_,_,_,_,_,_,_,_,_,String,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,Int,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,) has its own readability problems

Testing

Define 3 functions (either top-level or within a class or struct) like these

func doSomething(first: String,  second: Int, third: Double) {}
func doSomething(first: String?, second: Int, third: Double) {}
func doSomething(first: String?, second: Int, third: Float)  {}

From some other symbol, write a link to doSomething(first:second:third:).

The warning about the ambiguous link should suggest parameter disambiguation for all 3 overloads.

Define 3 new functions (either top-level or within a class or struct) like these

func doSomething(first: String,  second: Int) -> Double { 0.0 }
func doSomething(first: String?, second: Int) -> Double { 0.0 }
func doSomething(first: String?, second: Int) -> Float  { 0.0 }

From some other symbol, write a link to doSomething(first:second:).

The warning about the ambiguous link should suggest parameter disambiguation, return value disambiguation, or combined parameter and return value disambiguation for all 3 overloads.

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

Added tests
Ran the ./bin/test script and it succeeded
~~[ ] Updated documentation if necessary.~~ I'll add documentation separately. This is tracked by rdar://136207820

rdar://136207880

Also, ignore "sparse" nodes (without IDs)

QuietMisdreavus · 2024-11-06T16:57:19Z

This is impressive! I'll try to thumb through this today to give it a review.

This adds Swift Algorithms as a package dependency.

TBH i'm fine with sneaking this in, if only because it adds the opportunity to clean up a little bit of the code i added in #928 😅

d-ronnqvist · 2024-11-09T12:03:49Z

@swift-ci please test

d-ronnqvist · 2024-11-11T10:36:09Z

@swift-ci please test

patshaughnessy

Amazing work!

Thank you for the series of separate commits explaining your progress improving the performance. We could convert this into a sample app on Swift performance - you used so many different techniques here!

I left a few questions and my usual readability suggestions. The only real suggestion I have, which you alluded to in the PR description, is that maybe it's not worth adding Swift Algorithms just for a few extremely unusual scenarios (symbols with over 64 parameters or return types) that we won't hit very often. When we do hit them, maybe an old-fashioned disambiguation hash would be fine. Aside from avoiding the new dependency, if you remove that bit you could also avoid the complexity around the _IntSet and _Combination private protocols.

Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+DisambiguatedPaths.swift

...ces/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+TypeSignatureDisambiguation.swift

Sources/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+DisambiguatedPaths.swift

patshaughnessy · 2024-11-12T00:16:08Z

...ces/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+TypeSignatureDisambiguation.swift

+
+    @inlinable
+    subscript(row: Int, column: Int) -> Element {
+        _read { yield storage[index(row: row, column: column)] }


What does _read do here? And what about yield ? I'm not familiar with these. Are you handling multithreaded access to storage?

Modify and read accessors is a feature that's been pitched since 2019. With a few other recently approved proposals I'm hoping that it finally gets approved this time around but until then we're using _read and _modify in a few key places to for performance.

Basically, the "read" and "modify" accessors and the yield keyword enable values to be accessed and changed without requiring a copy by temporarily holding exclusive access to the value. For container types like dictionaries and arrays this can avoid surprises where the compiler needs to make a large copy when it can't guarantee unique access to a value.

This section of Ben Cohen's talk from the Functional Swift conference 5 years ago gives a good explanation of how _modify works (_read is very similar but without mutating).

patshaughnessy · 2024-11-12T00:19:48Z

...ces/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+TypeSignatureDisambiguation.swift

+        let capacity = width * height
+        storage = try .init(unsafeUninitializedCapacity: capacity) { buffer, initializedCount in
+            var wrappedBuffer = UnsafeMutableTableBufferPointer(width: width, wrapping: buffer)
+            try initializer(&wrappedBuffer)


I'm surprised that Swift allows you to call a closure like this, while executing an initializer, passing in a pointer to the storage.

Swift knows that this closure doesn't "escape" the initializer—same as Array/init(unsafeUninitializedCapacity:initializingWith:) which its calling. This initializer repeats the pattern from that initializer.

Since the closure's lifetime is limited by the scope of the initializer it's easier for the compiler to reason about what the closure is doing with the shared pointer.

About the only misuse that the caller could do with the closure is to hold on to the pointer after the initializer call. There's a accepted proposal to allow types like this pointer to be marked non-escapable (SE-0446) to make that misuse a compiler error as well as related proposal to introduce a safe access to contiguous memory (instead of Unsafe[Mutable][Buffer]Pointer) (SE-0447)]

...ces/SwiftDocC/Infrastructure/Link Resolution/PathHierarchy+TypeSignatureDisambiguation.swift

- Use plural for local variable with array value - Explicitly initialize optional local variable with `nil` - Add assert about passing empty type names - Explain what `nil` return value means in local code comment - Add comment indicating where `_TinySmallValueIntSet` is defined - Use + to join two string instead of string interpolation - Use named arguments for `makeDisambiguation` closure

…ames that disambiguate an overload

d-ronnqvist · 2024-11-12T14:41:03Z

@swift-ci please test

patshaughnessy · 2024-11-12T15:06:32Z

Great work 👏 - thanks for the tweaks, comments and explanations here in the PR! I'm going to bookmark this one :)

…ack to hash disambiguation

For the extremely rare case of overloads with more than 64 parameters we only try disambiguation by a single parameter type name.

…d return value

d-ronnqvist · 2024-11-13T13:51:56Z

@swift-ci please test

d-ronnqvist added 19 commits November 5, 2024 10:49

Suggested only the minimal type disambiguation

6109d28

rdar://136207880

Support disambiguating using a mix of parameter types and return types

4d2d482

Skip checking columns that are common for all overloads

0b629a3

Use Swift Algorithms package for combinations

7684ae0

Use specialized Set implementation for few overloads and with types

d9864ca

Allow each Int Set to specialize its creation of combinations

b010bb9

Avoid mapping combinations for large sizes to Set<Int>

bf66819

Avoid reallocations when generating "tiny int set" combinations

2c2d7ef

Avoid indexing into a nested array

4ce6917

Speed up _TinySmallValueIntSet iteration

efa6fcc

Avoid accessing a Set twice to check if a value exist and remove it

596cd90

Avoid temporary allocation when creating set of remaining node IDs

d48ac32

Also, ignore "sparse" nodes (without IDs)

Avoid reallocating the collisions list

b41fff4

Use a custom _TinySmallValueIntSet.isSuperset(of:) implementation

07dad86

Use Table<String> instead of indexing into [[String]]

bc5663f

Avoid recomputing the type name combinations to check

e6f60c8

Compare the type name lengths by number of UTF8 code units

16e15d0

Update code comments, variable names, and internal documentation

b962d1c

Avoid recomputing type name overlap

e29fc0f

d-ronnqvist changed the title ~~Suggest the minimal type disambiguation when an overload doesn't any unique types.~~ Suggest the minimal type disambiguation when an overload doesn't have any unique types Nov 6, 2024

Merge branch 'main' into suggest-minimal-type-disambiguation

a2337ad

d-ronnqvist added 2 commits November 11, 2024 10:41

Fix Swift 5.9 compatibility

16a3eef

Initialize each Table element. Linux requires this.

770694e

patshaughnessy approved these changes Nov 12, 2024

View reviewed changes

d-ronnqvist added 3 commits November 12, 2024 14:14

Add detailed comment with example about how to find the fewest type n…

e6a5d93

…ames that disambiguate an overload

Merge branch 'main' into suggest-minimal-type-disambiguation

ee94641

d-ronnqvist mentioned this pull request Nov 12, 2024

[DNM] Test toolchain build with new package dependency swiftlang/swift#77556

Closed

d-ronnqvist requested a review from franklinsch November 12, 2024 15:38

d-ronnqvist added 6 commits November 12, 2024 18:06

Don't use swift-algorithm as a _local_ dependency in Swift.org CI

9170fa2

Add additional test for 70 parameter type disambiguation

639f131

Add additional test that overloads with all the same parameters fallb…

00da0a1

…ack to hash disambiguation

Remove Swift Algorithms dependency.

178ef42

For the extremely rare case of overloads with more than 64 parameters we only try disambiguation by a single parameter type name.

Merge branch 'main' into suggest-minimal-type-disambiguation

9c98702

Only try mixed type disambiguation when symbol has both parameters an…

06b4a69

…d return value

franklinsch approved these changes Nov 13, 2024

View reviewed changes

d-ronnqvist merged commit 3cfeeb6 into swiftlang:main Nov 13, 2024
2 checks passed

d-ronnqvist mentioned this pull request Nov 13, 2024

Document how to disambiguate symbol links using type signature information #1095

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest the minimal type disambiguation when an overload doesn't have any unique types #1087

Suggest the minimal type disambiguation when an overload doesn't have any unique types #1087

d-ronnqvist commented Nov 6, 2024 •

edited

Loading

QuietMisdreavus commented Nov 6, 2024

d-ronnqvist commented Nov 9, 2024

d-ronnqvist commented Nov 11, 2024

patshaughnessy left a comment

patshaughnessy Nov 12, 2024

d-ronnqvist Nov 12, 2024

patshaughnessy Nov 12, 2024

d-ronnqvist Nov 12, 2024

d-ronnqvist commented Nov 12, 2024

patshaughnessy commented Nov 12, 2024

d-ronnqvist commented Nov 13, 2024

Suggest the minimal type disambiguation when an overload doesn't have any unique types #1087

Suggest the minimal type disambiguation when an overload doesn't have any unique types #1087

Conversation

d-ronnqvist commented Nov 6, 2024 • edited Loading

Summary

Details

Performance

Dependencies

Testing

Checklist

QuietMisdreavus commented Nov 6, 2024

d-ronnqvist commented Nov 9, 2024

d-ronnqvist commented Nov 11, 2024

patshaughnessy left a comment

Choose a reason for hiding this comment

patshaughnessy Nov 12, 2024

Choose a reason for hiding this comment

d-ronnqvist Nov 12, 2024

Choose a reason for hiding this comment

patshaughnessy Nov 12, 2024

Choose a reason for hiding this comment

d-ronnqvist Nov 12, 2024

Choose a reason for hiding this comment

d-ronnqvist commented Nov 12, 2024

patshaughnessy commented Nov 12, 2024

d-ronnqvist commented Nov 13, 2024

d-ronnqvist commented Nov 6, 2024 •

edited

Loading