Improve performance of JSONPointer encoding #1099

d-ronnqvist · 2024-11-19T13:25:38Z

Bug/issue #, if applicable:

Summary

This improves the performance of JSONPointer encoding which is a hot spot when building documentation for mixed Swift and Objective-C projects.

On an example mixed Swift and Objective-C project with 10k+ symbols, these changes improves the total documentation convert time by almost 1%:

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Metric                                   │ Change          │ main                 │ current              │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Duration for 'convert-total-time'        │ -0,9475 %¹      │ 14,88 sec            │ 14,739 sec           │
│ Duration for 'documentation-processing'  │ -1,533 %²       │ 6,859 sec            │ 6,754 sec            │
│ Duration for 'finalize-navigation-index' │ -1,141 %³       │ 0,293 sec            │ 0,29 sec             │
│ Peak memory footprint                    │ no change⁴,⁵    │ 743,2 MB             │ 745,9 MB             │
│ Data subdirectory size                   │ no change⁶      │ 248,6 MB             │ 248,6 MB             │
│ Index subdirectory size                  │ no change⁷      │ 5,3 MB               │ 5,3 MB               │
│ Total DocC archive size                  │ no change⁸      │ 288,5 MB             │ 288,6 MB             │
│ Topic Anchor Checksum                    │ no change       │ 081bb0f890c7fa3fba23 │ 081bb0f890c7fa3fba23 │
│ Topic Graph Checksum                     │ no change       │ 2683bd73f5612a553927 │ 2683bd73f5612a553927 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

 1: There's a statistically significant difference between the two benchmark measurements.
    The values are different enough that the most probable explanation is that they're random samples from two different data sets.
    t-statistic : +3,226          degrees of freedom : 28       95% confidence critical value : +2,042

 2: There's a statistically significant difference between the two benchmark measurements.
    The values are different enough that the most probable explanation is that they're random samples from two different data sets.
    t-statistic : +3,576          degrees of freedom : 28       95% confidence critical value : +2,042

 3: There's a statistically significant difference between the two benchmark measurements.
    The values are different enough that the most probable explanation is that they're random samples from two different data sets.
    t-statistic : +2,606          degrees of freedom : 28       95% confidence critical value : +2,042

 4: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : -0,707          degrees of freedom : 28       95% confidence critical value : +2,042

 5: A human should check that the measurements for this metric look like random samples around a certain value to ensure the validity of this result.


 6: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : -1,569          degrees of freedom : 28       95% confidence critical value : +2,042

 7: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : -0,005          degrees of freedom : 28       95% confidence critical value : +2,042

 8: No statistically significant difference.
    The values are similar enough that the most probable explanation is that they're random samples from the same data set.
    t-statistic : -1,566          degrees of freedom : 28       95% confidence critical value : +2,042

Dependencies

None.

Testing

Nothing in particular. This isn't a user-facing change.

Checklist

Make sure you check off the following items. If they cannot be completed, provide a reason.

~~[ ] Added tests~~
Ran the ./bin/test script and it succeeded
~~[ ] Updated documentation if necessary~~

patshaughnessy

Seems like a nice performance improvement - thanks for the benchmarks!

I left a few coding questions. Also, should we entirely remove the EscapedCharacters enum? Leaving that in the code might lead to a maintenance issue: If we ever decide to escape other characters someday we might simply add them to the enumeration, not realizing the optimized code here doesn't use that any more. If we leave EscapedCharacters in the code, we should at least add a comment explaining that it's not used.

patshaughnessy · 2024-11-19T21:20:31Z

Sources/SwiftDocC/Model/Rendering/Variants/JSONPointer.swift

+
+        var string: [UTF8.CodeUnit] = []
+        string.reserveCapacity(
+            pathComponents.reduce(0) { acc, component in


Would writing an explicit for-loop be faster than calling reduce with a closure?

It shouldn't be and I couldn't measure any difference.

reduce(_:) is an accumulating for-loop and the compiler shouldn't have any problems inlining and optimizing it the same as if I wrote the loop inline myself.

reduce(_:) is an accumulating for-loop and the compiler shouldn't have any problems inlining and optimizing it the same as if I wrote the loop inline myself.

Note that this is only true because the accumulating value is a primitive (Int).

Using reduce(_:) over collections result in repeated copies that makes it accidentally O(n²) instead of O(n). Using reduce(into:) instead for accumulating into collections solves this.

patshaughnessy · 2024-11-19T21:24:08Z

Sources/SwiftDocC/Model/Rendering/Variants/JSONPointer.swift

+
+        for component in pathComponents {
+            // The leading slash and component separator
+            string.append(forwardSlash)


I assume Swift's implementation of append is very fast? Is there any faster way of constructing a string like this, aside from using unsafe pointers with C code?

Yes, append is very fast. All it does is:

checking that the buffer is uniquely referenced (which the compiler should be able to know that it is since this is a locally scoped variable and hopefully optimize away completely)

checking that there's sufficient reserved capacity in the buffer (which is basically an integer comparison)

initializing a new element at the right offset into the underlying buffer.

The next step would be dropping down to unsafe buffers ourselves but then we'd be responsible for validating the capacity (uniqueness isn't a concern since it's a locally scoped variable).

Since the replacements we're doing here are longer, the escaped string may be longer than the original string which means validating the capacity is a real concern so we would need to do that ourselves. Because of this we're not gaining anything by using lower level API for this. This means that in practice this code is likely to do one reallocation when it escaped the string. We could bake in a little bit of extra capacity (for example 8 or 16 elements) to avoid this.

Now that I said that I had to measure reserving capacity for 16 extra characters. 😄

The current changes in this PR is a ~580% improvement compared to the original code in a micro benchmark that only measures encoding lots and lots of JSONPointer values.

Reserving an extra 16 characters to avoid the reallocation when the escaped string grows beyond the length of the original string, is a ~5% improvement over the current changes in this PR which means that it's a ~600% improvement over the original code.

Since it's only 1 more line to make that change I'd say it's worth it.

patshaughnessy · 2024-11-19T21:27:41Z

Sources/SwiftDocC/Model/Rendering/Variants/JSONPointer.swift

+            string.reserveCapacity($0.utf8.count)
+
+            var remaining = $0.utf8[...]
+            while let char = remaining.popFirst() {


Why couldn't you use the simpler for char in component.utf8 style loop you have above in escaped? Why do you need to use popFirst - I'm afraid mutating the string like this might be slower than simply iterating over the UTF8 values.

Because I need to iterate over pairs of characters to identify "~0" and "~1".

At first I though I could iterate over the character's pairwise but when I tried it I realized that it meant that "A~0B" would iterate over (A, ~), (~, 0), (0, B) which that made the loop more complicated because I needed to keep state from the (~, 0) iteration so that the (0, B) iteration wouldn't append the 0.

d-ronnqvist · 2024-11-21T11:07:17Z

@swift-ci please test

d-ronnqvist · 2024-11-21T11:13:27Z

Also, should we entirely remove the EscapedCharacters enum? Leaving that in the code might lead to a maintenance issue: If we ever decide to escape other characters someday we might simply add them to the enumeration, not realizing the optimized code here doesn't use that any more. If we leave EscapedCharacters in the code, we should at least add a comment explaining that it's not used.

It's public API so we'd need to deprecate it and phase it out. There was never any public API to escape / unescape a JSONPointer so there wasn't really a reason for EscapedCharacters to be public API to begin with, other than as information about what characters need escaping and what their replacements are. I don't know how useful it is for a client of DocC to have access to that information I didn't feel like taking it away in this PR.

* Improve performance of JSONPointer encoding & decoding * Reserve some extra capacity in temporary storage to avoid reallocations

* Improve performance of JSONPointer encoding & decoding * Reserve some extra capacity in temporary storage to avoid reallocations rdar://141047138

Improve performance of JSONPointer encoding & decoding

28a48ac

patshaughnessy approved these changes Nov 19, 2024

View reviewed changes

d-ronnqvist added 2 commits November 21, 2024 12:04

Merge branch 'main' into json-pointer-encode-performance

757fc76

Reserve some extra capacity in temporary storage to avoid reallocations

78773fe

d-ronnqvist merged commit e0ce789 into swiftlang:main Nov 21, 2024
2 checks passed

d-ronnqvist deleted the json-pointer-encode-performance branch November 21, 2024 11:14

d-ronnqvist mentioned this pull request Dec 6, 2024

[6.1] Improve performance of JSONPointer encoding #1119

Merged

d-ronnqvist added a commit that referenced this pull request Dec 9, 2024

Improve performance of JSONPointer encoding (#1099) (#1119)

cfaf3ca

* Improve performance of JSONPointer encoding & decoding * Reserve some extra capacity in temporary storage to avoid reallocations rdar://141047138

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of JSONPointer encoding #1099

Improve performance of JSONPointer encoding #1099

d-ronnqvist commented Nov 19, 2024

patshaughnessy left a comment

patshaughnessy Nov 19, 2024

d-ronnqvist Nov 21, 2024 •

edited

Loading

d-ronnqvist Nov 21, 2024

patshaughnessy Nov 19, 2024

d-ronnqvist Nov 21, 2024

d-ronnqvist Nov 21, 2024

patshaughnessy Nov 19, 2024

d-ronnqvist Nov 21, 2024

d-ronnqvist commented Nov 21, 2024

d-ronnqvist commented Nov 21, 2024

Improve performance of JSONPointer encoding #1099

Improve performance of JSONPointer encoding #1099

Conversation

d-ronnqvist commented Nov 19, 2024

Summary

Dependencies

Testing

Checklist

patshaughnessy left a comment

Choose a reason for hiding this comment

patshaughnessy Nov 19, 2024

Choose a reason for hiding this comment

d-ronnqvist Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

d-ronnqvist Nov 21, 2024

Choose a reason for hiding this comment

patshaughnessy Nov 19, 2024

Choose a reason for hiding this comment

d-ronnqvist Nov 21, 2024

Choose a reason for hiding this comment

d-ronnqvist Nov 21, 2024

Choose a reason for hiding this comment

patshaughnessy Nov 19, 2024

Choose a reason for hiding this comment

d-ronnqvist Nov 21, 2024

Choose a reason for hiding this comment

d-ronnqvist commented Nov 21, 2024

d-ronnqvist commented Nov 21, 2024

d-ronnqvist Nov 21, 2024 •

edited

Loading