-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of JSONPointer encoding #1099
Improve performance of JSONPointer encoding #1099
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a nice performance improvement - thanks for the benchmarks!
I left a few coding questions. Also, should we entirely remove the EscapedCharacters
enum? Leaving that in the code might lead to a maintenance issue: If we ever decide to escape other characters someday we might simply add them to the enumeration, not realizing the optimized code here doesn't use that any more. If we leave EscapedCharacters
in the code, we should at least add a comment explaining that it's not used.
|
||
var string: [UTF8.CodeUnit] = [] | ||
string.reserveCapacity( | ||
pathComponents.reduce(0) { acc, component in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would writing an explicit for-loop be faster than calling reduce
with a closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be and I couldn't measure any difference.
reduce(_:)
is an accumulating for-loop and the compiler shouldn't have any problems inlining and optimizing it the same as if I wrote the loop inline myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reduce(_:) is an accumulating for-loop and the compiler shouldn't have any problems inlining and optimizing it the same as if I wrote the loop inline myself.
Note that this is only true because the accumulating value is a primitive (Int
).
Using reduce(_:)
over collections result in repeated copies that makes it accidentally O(n²) instead of O(n). Using reduce(into:)
instead for accumulating into collections solves this.
|
||
for component in pathComponents { | ||
// The leading slash and component separator | ||
string.append(forwardSlash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume Swift's implementation of append
is very fast? Is there any faster way of constructing a string like this, aside from using unsafe pointers with C code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, append is very fast. All it does is:
- checking that the buffer is uniquely referenced (which the compiler should be able to know that it is since this is a locally scoped variable and hopefully optimize away completely)
- checking that there's sufficient reserved capacity in the buffer (which is basically an integer comparison)
- initializing a new element at the right offset into the underlying buffer.
The next step would be dropping down to unsafe buffers ourselves but then we'd be responsible for validating the capacity (uniqueness isn't a concern since it's a locally scoped variable).
Since the replacements we're doing here are longer, the escaped string may be longer than the original string which means validating the capacity is a real concern so we would need to do that ourselves. Because of this we're not gaining anything by using lower level API for this. This means that in practice this code is likely to do one reallocation when it escaped the string. We could bake in a little bit of extra capacity (for example 8 or 16 elements) to avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I said that I had to measure reserving capacity for 16 extra characters. 😄
The current changes in this PR is a ~580% improvement compared to the original code in a micro benchmark that only measures encoding lots and lots of JSONPointer
values.
Reserving an extra 16 characters to avoid the reallocation when the escaped string grows beyond the length of the original string, is a ~5% improvement over the current changes in this PR which means that it's a ~600% improvement over the original code.
Since it's only 1 more line to make that change I'd say it's worth it.
string.reserveCapacity($0.utf8.count) | ||
|
||
var remaining = $0.utf8[...] | ||
while let char = remaining.popFirst() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why couldn't you use the simpler for char in component.utf8
style loop you have above in escaped
? Why do you need to use popFirst
- I'm afraid mutating the string like this might be slower than simply iterating over the UTF8 values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I need to iterate over pairs of characters to identify "~0" and "~1".
At first I though I could iterate over the character's pairwise but when I tried it I realized that it meant that "A~0B"
would iterate over (A, ~)
, (~, 0)
, (0, B)
which that made the loop more complicated because I needed to keep state from the (~, 0)
iteration so that the (0, B)
iteration wouldn't append the 0
.
@swift-ci please test |
It's public API so we'd need to deprecate it and phase it out. There was never any public API to escape / unescape a |
* Improve performance of JSONPointer encoding & decoding * Reserve some extra capacity in temporary storage to avoid reallocations
Bug/issue #, if applicable:
Summary
This improves the performance of JSONPointer encoding which is a hot spot when building documentation for mixed Swift and Objective-C projects.
On an example mixed Swift and Objective-C project with 10k+ symbols, these changes improves the total documentation convert time by almost 1%:
Dependencies
None.
Testing
Nothing in particular. This isn't a user-facing change.
Checklist
Make sure you check off the following items. If they cannot be completed, provide a reason.
[ ] Added tests./bin/test
script and it succeeded[ ] Updated documentation if necessary