Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlight declaration differences in overloaded symbol groups #928

Merged
merged 25 commits into from
Jun 27, 2024

Conversation

QuietMisdreavus
Copy link
Contributor

@QuietMisdreavus QuietMisdreavus commented May 23, 2024

Bug/issue #, if applicable: rdar://116409531

Summary

This PR introduces a new field to declaration tokens in Render JSON: highlight. This field indicates that the token is distinct among a set of overloads and should be indicated with a highlight or other style. It also performs a difference comparison of overloaded symbols' declarations so that the fields are populated for Swift-DocC-Render.

When combined with the related Swift-DocC-Render PR (linked below), the rendered page appears like this:

image

Because the differencing works directly on the declaration fragments received from the symbol graph, this PR also takes some care to break up .text tokens to prevent false-positive differences and create a cleaner output. For example, the below overloads correctly only highlight the <S> in the second overload, even though it originally contains a text token of >(. Also visible in this example is some proactive whitespace-trimming from highlighted sections; there is a unique text token containing a single space before the where clause in the second overload, but it's not highlighted to reduce clutter:

image

Performance

This PR uses the standard library's CollectionDifference API to generate the Longest Common Subsequence of the overload declarations. The diff algorithm can have a heavy time complexity, especially when run repeatedly like here. The standard library's implementation is well-optimized, but there is still a necessary performance hit to calculate the diffs. This following is a benchmark run on a large framework with many overloaded symbols:

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric                                   │ Change          │ main                 │ current              │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Duration for 'bundle-registration'       │ +1.913%¹,²      │ 10.351 sec           │ 10.549 sec           │
│ Duration for 'convert-total-time'        │ +3.857%³,⁴      │ 15.477 sec           │ 16.074 sec           │
│ Duration for 'documentation-processing'  │ +7.959%⁵,⁶      │ 5.003 sec            │ 5.401 sec            │
│ Duration for 'finalize-navigation-index' │ no change⁷      │ 0.046 sec            │ 0.046 sec            │
│ Peak memory footprint                    │ no change⁸      │ 811 MB               │ 835.4 MB             │
│ Data subdirectory size                   │ +2.028%⁹        │ 127.4 MB             │ 130 MB               │
│ Index subdirectory size                  │ -0.02829%¹⁰     │ 1.1 MB               │ 1.1 MB               │
│ Total DocC archive size                  │ +1.092%¹¹       │ 236.6 MB             │ 239.2 MB             │
│ Topic Anchor Checksum                    │ no change       │ 53074b40863239b7a00d │ 53074b40863239b7a00d │
│ Topic Graph Checksum                     │ change          │ 5bdfe323b77baec3d5cd │ f4af046ddef9eab49ba8 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Dependencies

swiftlang/swift-docc-render#847 is required to render the tokens with a highlight.

Testing

Steps:

  1. Build the Swift-DocC-Render PR and link the resulting dist directory into the DOCC_HTML_DIR environment variable.
  2. `swift run docc preview --enable-experimental-overloaded-symbol-presentation 'Tests/SwiftDocCTests/Test Bundles/OverloadedSymbols.docc'
  3. Navigate to the OverloadedEnum/firstTestMemberName(_:) overload group and expand the declaration list.
  4. Ensure that the declarations correctly highlight the parameter type in each overloaded declaration.

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

  • Added tests
  • Ran the ./bin/test script and it succeeded
  • [ n/a ] Updated documentation if necessary

@QuietMisdreavus
Copy link
Contributor Author

@swift-ci Please test

Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Just some nitpicks about readability. And also I agree the recursive token data structure seems very counterintuitive.

// Pre-process the declarations by splitting text fragments apart to increase legibility
let mainDeclaration = declaration.declarationFragments.flatMap(preProcessFragment(_:))
let processedOverloadDeclarations = overloadDeclarations.map({ ($0.declaration.flatMap(preProcessFragment(_:)), $0.reference) })
let preProcessedDeclarations = [mainDeclaration] + processedOverloadDeclarations.map(\.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use names like mainFragments, processedOverloadFragments and preProcessedFragments since (I think) you have mapped from DeclarationFragments to Fragments just above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the distinction is minimal. Even SymbolKit doesn't really distinguish between the concept of a "declaration" and "a list of declaration fragments"; the convenience Symbol.declarationFragments accessor directly returns the list instead of the wrapper object. It also avoids having to pluralize a plural, since processedOverloadDeclarations and preProcessedDeclarations are lists of lists. To make them reference "fragments" instead, it would have to be something like processedOverloadFragmentLists or the like, which i consider more awkward.

/// Translate a whole ``SymbolGraph`` declaration to a ``DeclarationRenderSection``
/// declaration and highlight any tokens that aren't shared with a sequence of common tokens.
func translateDeclaration(
_ declaration: [SymbolGraph.Symbol.DeclarationFragments.Fragment],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, this could be func translateDeclaration(_ fragments:, commonFragments:)

@mportiz08
Copy link
Contributor

I'm also concerned about the possibility of other renderers that consume DocC Render JSON that might not anticipate the concept of recursive tokens.

FWIW, other renderers would already have to deal with recursion for basic existing things like formatting inline text (you can have any combination of bold/emphasis/codeVoice or nested lists, etc) with our existing tree-like concept of block/inline content which carries over from the markdown. I wouldn't actually anticipate that there would be any other renderer that would handle experimental features under feature flags like this in any case though.

Copy link
Contributor

@d-ronnqvist d-ronnqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly feel that we shouldn't redefine tokens to be a hierarchical structure. Not only because it breaks existing code in DocC and in clients but because it's a surprising and unconventional way to model an already well defined concept.

@d-ronnqvist
Copy link
Contributor

I'm also concerned about the possibility of other renderers that consume DocC Render JSON that might not anticipate the concept of recursive tokens.

FWIW, other renderers would already have to deal with recursion for basic existing things like formatting inline text (you can have any combination of bold/emphasis/codeVoice or nested lists, etc) with our existing tree-like concept of block/inline content which carries over from the markdown. I wouldn't actually anticipate that there would be any other renderer that would handle experimental features under feature flags like this in any case though.

That is different because "Text" in the RenderNode spec isn't recursive, it's a simple object with "text" and "type" properties (very similar to "DeclarationToken"):

"Text": {
    "type": "object",
    "required": [
        "type",
        "text"
    ],
    "properties": {
        "type": {
            "type": "string",
            "enum": ["text"]
        },
        "text": {
            "type": "string"
        }
    }
},

Just like "DeclarationToken", I would be opposed to making "Text" in the RenderNode spec recursive.

Instead, the way that we have modeled emphasis, strong, codeVoice, etc. in the RenderNode spec is by defining many dedicated types—some of which are recursive and some of which are not—and wrapping them all in a "RenderInlineContent" type:

"RenderInlineContent": {
    "oneOf": [
        {
            "$ref": "#/components/schemas/Text"
        },
        {
            "$ref": "#/components/schemas/Emphasis"
        },
        {
            "$ref": "#/components/schemas/Strong"
        },
        {
            "$ref": "#/components/schemas/CodeVoice"
        },
        ...
        }
    ]
}

We could introduce this type of structure for declarations as well—like what @vera suggested with enum RenderToken {} (which I feel would be better named "DeclarationInlineContent" for consistency and to distance it from the "token" concept)—but to use that structure we would need to make breaking changes to the various "tokens" properties in the RenderNode spec:

 "tokens": {
    "type": "array",
    "items": {
-       "$ref": "#/components/schemas/DeclarationToken"
+       "$ref": "#/components/schemas/DeclarationInlineContent"
    }
}

As far as I know, we don't have a process yet for how to handle breaking change to the RenderNode spec.

If the Swift code won't mirror the Render Node spec, I don't feel like this structure makes sense for the Swift code.

@mportiz08
Copy link
Contributor

I've created a draft renderer PR for both proposed Render JSON schemas that have been discussed in this PR:

Please let me know which approach ends up being decided for this PR, and I'll close the corresponding renderer PR that isn't needed and remove draft status from the other.

@QuietMisdreavus
Copy link
Contributor Author

@swift-ci Please test

@QuietMisdreavus QuietMisdreavus marked this pull request as ready for review June 5, 2024 16:43
@QuietMisdreavus
Copy link
Contributor Author

@swift-ci Please test

@QuietMisdreavus
Copy link
Contributor Author

@d-ronnqvist @patshaughnessy I've updated the PR to remove the recursive token type and refactor the code to clean it up a bit. I've also added some tests, so i've removed the draft status from the PR.

Copy link
Contributor

@patshaughnessy patshaughnessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for moving the code around and extracting the methods. The new initializers make things much easier to read 👏

@QuietMisdreavus
Copy link
Contributor Author

@swift-ci Please test

@@ -151,4 +152,114 @@ class DeclarationsRenderSectionTests: XCTestCase {
XCTAssertEqual(declarationsSection.declarations.count, 2)
XCTAssert(declarationsSection.declarations.allSatisfy({ $0.platforms == [.iOS, .macOS] }))
}

func testHighlightDiff() throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add more—and more complicated—tests for this. The code works really well but it'd be good to have more comprehensive testing so that we can be confident that we don't break it in the future.

This is just one test with two overloads func myFunc(param: Int) and func myFunc<S>(param: S) where S : StringProtocol but there's a lot more possible syntax that would be worth testing, especially for text tokens since there's logic to split those for better diffident.

Here are the variations that I could think of for a single parameter type. We definitely don't need to add tests for all of them but I think we should have something that's an array, something with a dictionary, something with a tuple, something with an optional, something with a closure type, something with a variadic, and something with a generic argument.

public func doSomething(with: Int)              {}
public func doSomething(with: Int?)             {}
public func doSomething(with: [Int]?)           {}
public func doSomething(with: [Int?])           {}
public func doSomething(with: (Int, Int))       {}
public func doSomething(with: (Int?, Int))      {}
public func doSomething(with: (Int, Int?))      {}
public func doSomething(with: (Int, Int)?)      {}
public func doSomething(with: [Int: Int])       {}
public func doSomething(with: [Int: Int]?)      {}
public func doSomething(with: [Int?: Int])      {}
public func doSomething(with: [Int: Int?])      {}
public func doSomething(with: Int...)           {}
public func doSomething(with: Int?...)          {}
public func doSomething(with: Set<Int>)         {}
public func doSomething(with: Set<Int?>)        {}
public func doSomething(with: Set<Int>?)        {}
public func doSomething(with: (Int) -> ())      {}
public func doSomething(with: (Int) -> Int)     {}
public func doSomething(with: (Int?) -> Int)    {}
public func doSomething(with: (Int) -> Int?)    {}
public func doSomething(with: ((Int) -> Int)?)  {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(by the way, the diff in this test is not that "fancy". I can come think with much more complex differences between overloads)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code seems to trip up with tuples and closures, because the differencing eagerly gloms onto the first closing parenthesis it finds and treats that as a common token with the closing parenthesis for the argument list in other overloads...

image

It also makes the whitespace trimming fall apart for where clauses, since now the argument list parenthesis is considered a different token:

image

I can write a test with these symbols, but the highlighting here is a bit unfortunate. However, to fix it "properly" would require introducing the complete symbol information into the differencing algorithm somehow, so that the entire argument type could be considered a distinct "token" and the correct parenthesis (and comma, in case of multiple arguments) could be counted as a common token. If we decide that we want to improve this, i'd like to defer that to after this PR lands so that we can get a "good-enough" implementation landed that we can iterate on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The highlighting is great. I don't think the few cases where it could be slightly better should hold back this PR.

If anything we could add a comment in the tests for the highlights that could be slightly better as extra information for anyone who wants to iterate on this in the future.

In other words: the implementation looks great and we don't need to add many new tests but I think we should add a handful (maybe one with a tuple, one with a closure, and one with an array/dictionary and then fit an optional and a generic value in one of those 3 cases). How does that sound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated my tests to test the following overload groups:

public func overload1(param: Int) {}
public func overload1(param: Int?) {}
public func overload1(param: [Int]) {}
public func overload1(param: [Int]?) {}
public func overload1(param: Set<Int>) {}
public func overload1(param: [Int: Int]) {}

public func overload2(p1: Int, p2: Int) {}
public func overload2(p1: (Int, Int), p2: Int) {}
public func overload2(p1: Int, p2: (Int, Int)) {}
public func overload2(p1: (Int) -> (), p2: Int) {}
public func overload2(p1: (Int) -> Int, p2: Int) {}
public func overload2(p1: (Int) -> Int?, p2: Int) {}
public func overload2(p1: ((Int) -> Int)?, p2: Int) {}

public func overload3(_ p: [Int: Int]) {}
public func overload3<T: Hashable>(_ p: [T: T]) {}
public func overload3<K: Hashable, V>(_ p: [K: V]) {}

I feel like this both tests the features we want (type decorators getting highlighted, whitespace trimmed off highlighted tokens, splitting >( tokens) and also adds the known edge case issues (parentheses and commas throwing off the diff).

I also added a convenience wrapper in the test code so that i could more concisely test that certain spans of declarations were being highlighted as expected. I'm not 100% sure that this is completely useful, but it helped when rewriting the tests to work with six or seven overloads at a time. 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some out-of-band discussion, i've rewritten the test wrapper to render plain-text comparison strings instead of using the string fragments i originally used.

@d-ronnqvist d-ronnqvist self-requested a review June 10, 2024 08:46
@QuietMisdreavus
Copy link
Contributor Author

@swift-ci Please test

@QuietMisdreavus
Copy link
Contributor Author

@swift-ci Please test

Copy link
Contributor

@d-ronnqvist d-ronnqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

@QuietMisdreavus
Copy link
Contributor Author

@swift-ci Please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants