Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More string format removals #15316

Merged
merged 3 commits into from
Mar 27, 2023

Conversation

pettyjamesm
Copy link
Member

Description

Follow up from #15258 to replace other usages of String.format in non-exceptional code paths where simple string concatenations are equally clear (but more performant). These changes are not expected to be nearly as significant in terms of performance as the linked previous PR.

Additional context and related issues

No additional context should be necessary.

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Dec 6, 2022
@pettyjamesm pettyjamesm requested a review from findepi December 6, 2022 19:01
@pettyjamesm pettyjamesm marked this pull request as ready for review December 6, 2022 19:01
@findepi findepi requested review from skrzypo987 and lukasz-stec and removed request for findepi December 6, 2022 22:48
Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if (root == null) {
throw new NullPointerException(optimizer.getClass().getName() + " returned a null plan");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is significant enough to warrant making the code more verbose and harder to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm sure that the performance cost isn't massive, this is called inside of the loop for each optimizer. On option would be to switch from requireNonNull(root, format("%s ...")) to requireNonNull(root, () -> format("%s ...")) but that would still be a capturing lambda creation per loop iteration even if it avoids the String.format overhead, which is why I changed it over to an explicit null check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but there are only a handful of optimizers. Also, they do way more complicated things, so this is going to be a drop in the bucket.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does lambda capture matter here? I suspect this would be optimized away by inlining.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't matter as long as the JIT does actually inline the requireNonNull and the lambda, but if either one of those doesn't happen then you'll get an extra allocation inside of the loop (which would still be better than calling String.format inside of each loop iteration, but is avoidable by using an explicit null check).

"@%s: %s",
Integer.toHexString(identityHashCode(node)),
node);
return "@" + Integer.toHexString(identityHashCode(node)) + ": " + node;
Copy link
Member

@martint martint Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here. This shouldn't matter from a performance perspective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this toString() implementation seems to only be present for more helpful debugging, I'm fine with changing it back here and will rebase without this change.

However, at a high level- I think the general philosophy should be to avoid using String.format in normal (ie: non-exceptional, not just for debugging) code paths and to prefer string concatenation instead- especially for simple things like format("%s=%s", ...). JDK 9+ javac implementations generate more efficient, specialized to the call-site concatenations even compared to StringBuilder, and even small inefficiencies can add up to a non-trivial overall performance overhead.

@pettyjamesm pettyjamesm force-pushed the more-string-format-removals branch 2 times, most recently from 99a850b to 444d601 Compare December 12, 2022 21:16
@@ -58,7 +57,7 @@ public BytecodeNode generateExpression(BytecodeGeneratorContext generator)
RowExpression term = terms.get(i);
block.append(generator.generate(term));

IfStatement ifWasNull = new IfStatement(format("if term %s wasNull...", i))
IfStatement ifWasNull = new IfStatement("if term " + i + " wasNull...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These make the code less readable, IMO. Also, the code generation is not such a frequent operation that performance of concatenation vs String.format is a concern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree about concatenation being less readable, but that's a matter of taste so for the sake of argument, let's assume that format is slightly more readable. Even so, I don't agree with the idea that we should prefer the less efficient implementation (in non-exceptional, non-debugging code paths) by default. I'm arguing the opposite: even when the codepath is not especially "hot", we should prefer the more efficient of the two approaches when the readability difference is small to try to avoid accumulating thousands of small hits to performance overall.

If you still disagree, then I'm happy to remove either/both of the two places you left PR feedback comments about (and any others you want to flag)- I just wanted to make the case that we should reconsider the practice of adding these avoidable performance costs and prefer concatenation over formatting in the general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sympathetic to the "accumulating thousands of small hits to performance" argument, but the question is if this has any measurable impact. Our general philosophy is to prefer readability, unless the performance impact matters. This is a slippery slope.

In this specific case, I'm not sure it's less readable. But this one seems worse:

- .setDescription(format("cursor.get%s(%d)", type, field))
+ .setDescription("cursor.get" + type + "(" + field + ")")

Hopefully String Templates will allow both readability and performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum - Fair point on the other change, I've pushed a new revision that removes that particular change in CursorProcessorCompiler.

if (root == null) {
throw new NullPointerException(optimizer.getClass().getName() + " returned a null plan");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but there are only a handful of optimizers. Also, they do way more complicated things, so this is going to be a drop in the bucket.

@pettyjamesm pettyjamesm force-pushed the more-string-format-removals branch from 444d601 to e9fbf00 Compare December 19, 2022 20:51
@pettyjamesm pettyjamesm force-pushed the more-string-format-removals branch from e9fbf00 to 126213f Compare January 3, 2023 18:04
@pettyjamesm pettyjamesm force-pushed the more-string-format-removals branch 2 times, most recently from 4656dec to 5f3af9c Compare January 21, 2023 19:38
@pettyjamesm
Copy link
Member Author

@martint, I've rebased the revisions and the tests are still green. I think there's still the open comment item from your earlier review- can you take a look again to close this PR out?

@pettyjamesm pettyjamesm force-pushed the more-string-format-removals branch from 5f3af9c to b1bc63d Compare March 6, 2023 22:56
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, but will let @martint take another look

if (root == null) {
throw new NullPointerException(optimizer.getClass().getName() + " returned a null plan");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does lambda capture matter here? I suspect this would be optimized away by inlining.

@@ -58,7 +57,7 @@ public BytecodeNode generateExpression(BytecodeGeneratorContext generator)
RowExpression term = terms.get(i);
block.append(generator.generate(term));

IfStatement ifWasNull = new IfStatement(format("if term %s wasNull...", i))
IfStatement ifWasNull = new IfStatement("if term " + i + " wasNull...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sympathetic to the "accumulating thousands of small hits to performance" argument, but the question is if this has any measurable impact. Our general philosophy is to prefer readability, unless the performance impact matters. This is a slippery slope.

In this specific case, I'm not sure it's less readable. But this one seems worse:

- .setDescription(format("cursor.get%s(%d)", type, field))
+ .setDescription("cursor.get" + type + "(" + field + ")")

Hopefully String Templates will allow both readability and performance.

@pettyjamesm pettyjamesm force-pushed the more-string-format-removals branch from b1bc63d to e171697 Compare March 7, 2023 15:29
@pettyjamesm
Copy link
Member Author

Test failure is unrelated and caused by #16406

Avoids an eager and unnecessary String.format call by letting
checkArgument perform the required formatting only when the check
fails.
Also replaces unnecessary usages of Guava's Joiner in favor of
Collectors.joining where appropriate.
Replaces simple String.format usages in non-exceptional code paths
with simple string concatenations where applicable.
@pettyjamesm pettyjamesm force-pushed the more-string-format-removals branch from e171697 to 6f269f4 Compare March 8, 2023 21:55
@martint martint merged commit 4f8985c into trinodb:master Mar 27, 2023
@pettyjamesm pettyjamesm deleted the more-string-format-removals branch March 27, 2023 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants