Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of JSONPointer encoding #1099

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 70 additions & 28 deletions Sources/SwiftDocC/Model/Rendering/Variants/JSONPointer.swift
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*
This source file is part of the Swift.org open source project

Copyright (c) 2021 Apple Inc. and the Swift project authors
Copyright (c) 2021-2024 Apple Inc. and the Swift project authors
Licensed under Apache License v2.0 with Runtime Library Exception

See https://swift.org/LICENSE.txt for license information
Expand All @@ -20,7 +20,7 @@ public struct JSONPointer: Codable, CustomStringConvertible, Equatable {
public var pathComponents: [String]

public var description: String {
"/\(pathComponents.map(Self.escape).joined(separator: "/"))"
Self.escaped(pathComponents)
}

/// Creates a JSON Pointer given its path components.
Expand Down Expand Up @@ -87,36 +87,78 @@ public struct JSONPointer: Codable, CustomStringConvertible, Equatable {
public init(from decoder: Decoder) throws {
let container = try decoder.singleValueContainer()
let stringValue = try container.decode(String.self)
self.pathComponents = stringValue.removingLeadingSlash.components(separatedBy: "/").map(Self.unescape)
self.pathComponents = Self.unescaped(stringValue)
}

/// Escapes a path component of a JSON pointer.
static func escape(_ pointerPathComponents: String) -> String {
applyEscaping(pointerPathComponents, shouldUnescape: false)
}

/// Unescaped a path component of a JSON pointer.
static func unescape(_ pointerPathComponents: String) -> String {
applyEscaping(pointerPathComponents, shouldUnescape: true)
private static func escaped(_ pathComponents: [String]) -> String {
// This code is called quite frequently for mixed language content.
// Optimizing it has a measurable impact on the total documentation build time.

var string: [UTF8.CodeUnit] = []
string.reserveCapacity(
pathComponents.reduce(0) { acc, component in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would writing an explicit for-loop be faster than calling reduce with a closure?

Copy link
Contributor Author

@d-ronnqvist d-ronnqvist Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be and I couldn't measure any difference.

reduce(_:) is an accumulating for-loop and the compiler shouldn't have any problems inlining and optimizing it the same as if I wrote the loop inline myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce(_:) is an accumulating for-loop and the compiler shouldn't have any problems inlining and optimizing it the same as if I wrote the loop inline myself.

Note that this is only true because the accumulating value is a primitive (Int).

Using reduce(_:) over collections result in repeated copies that makes it accidentally O(n²) instead of O(n). Using reduce(into:) instead for accumulating into collections solves this.

acc + 1 /* the "/" separator */ + component.utf8.count
}
)

for component in pathComponents {
// The leading slash and component separator
string.append(forwardSlash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume Swift's implementation of append is very fast? Is there any faster way of constructing a string like this, aside from using unsafe pointers with C code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, append is very fast. All it does is:

  • checking that the buffer is uniquely referenced (which the compiler should be able to know that it is since this is a locally scoped variable and hopefully optimize away completely)
  • checking that there's sufficient reserved capacity in the buffer (which is basically an integer comparison)
  • initializing a new element at the right offset into the underlying buffer.

The next step would be dropping down to unsafe buffers ourselves but then we'd be responsible for validating the capacity (uniqueness isn't a concern since it's a locally scoped variable).

Since the replacements we're doing here are longer, the escaped string may be longer than the original string which means validating the capacity is a real concern so we would need to do that ourselves. Because of this we're not gaining anything by using lower level API for this. This means that in practice this code is likely to do one reallocation when it escaped the string. We could bake in a little bit of extra capacity (for example 8 or 16 elements) to avoid this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I said that I had to measure reserving capacity for 16 extra characters. 😄

The current changes in this PR is a ~580% improvement compared to the original code in a micro benchmark that only measures encoding lots and lots of JSONPointer values.

Reserving an extra 16 characters to avoid the reallocation when the escaped string grows beyond the length of the original string, is a ~5% improvement over the current changes in this PR which means that it's a ~600% improvement over the original code.

Since it's only 1 more line to make that change I'd say it's worth it.


// The escaped component
for char in component.utf8 {
switch char {
case tilde:
string.append(contentsOf: escapedTilde)
case forwardSlash:
string.append(contentsOf: escapedForwardSlash)
default:
string.append(char)
}
}
}

return String(decoding: string, as: UTF8.self)
}

/// Applies an escaping operation to the path component of a JSON pointer.
/// - Parameters:
/// - pointerPathComponent: The path component to escape.
/// - shouldUnescape: Whether this function should unescape or escape the path component.
/// - Returns: The escaped value if `shouldUnescape` is false, otherwise the escaped value.
private static func applyEscaping(_ pointerPathComponent: String, shouldUnescape: Bool) -> String {
EscapedCharacters.allCases
.reduce(pointerPathComponent) { partialResult, characterThatNeedsEscaping in
partialResult
.replacingOccurrences(
of: characterThatNeedsEscaping[
keyPath: shouldUnescape ? \EscapedCharacters.escaped : \EscapedCharacters.rawValue
],
with: characterThatNeedsEscaping[
keyPath: shouldUnescape ? \EscapedCharacters.rawValue : \EscapedCharacters.escaped
]
)
private static func unescaped(_ escapedRawString: String) -> [String] {
escapedRawString.removingLeadingSlash.components(separatedBy: "/").map {
// This code is called quite frequently for mixed language content.
// Optimizing it has a measurable impact on the total documentation build time.

var string: [UTF8.CodeUnit] = []
string.reserveCapacity($0.utf8.count)

var remaining = $0.utf8[...]
while let char = remaining.popFirst() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why couldn't you use the simpler for char in component.utf8 style loop you have above in escaped? Why do you need to use popFirst - I'm afraid mutating the string like this might be slower than simply iterating over the UTF8 values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need to iterate over pairs of characters to identify "~0" and "~1".

At first I though I could iterate over the character's pairwise but when I tried it I realized that it meant that "A~0B" would iterate over (A, ~), (~, 0), (0, B) which that made the loop more complicated because I needed to keep state from the (~, 0) iteration so that the (0, B) iteration wouldn't append the 0.

guard char == tilde, let escapedCharacterIndicator = remaining.popFirst() else {
string.append(char)
continue
}

// Check the character
switch escapedCharacterIndicator {
case zero:
string.append(tilde)
case one:
string.append(forwardSlash)
default:
// This string isn't an escaped JSON Pointer. Return it as-is.
return $0
}
}

return String(decoding: string, as: UTF8.self)
}
}
}

// A few UInt8 raw values for various UTF-8 characters that this implementation frequently checks for

private let tilde = UTF8.CodeUnit(ascii: "~")
private let forwardSlash = UTF8.CodeUnit(ascii: "/")
private let zero = UTF8.CodeUnit(ascii: "0")
private let one = UTF8.CodeUnit(ascii: "1")

private let escapedTilde = [tilde, zero]
private let escapedForwardSlash = [tilde, one]