Avoid String.format usages in hot paths #15258

pettyjamesm · 2022-11-30T18:19:01Z

Description

Replaces usages of String.format in expected codepaths to improve performance. In particular time / timestamp related JSON serializers and casts to varchar improve dramatically with timestamp(3) casting throughput improving 2x, time(3) improving more than 30x, and time with time zone(3) improving ~40x (benchmark results: jmh.morethan.io).

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

core/trino-spi/src/main/java/io/trino/spi/type/TimeZoneKey.java

martint · 2022-12-01T05:44:58Z

...o-main/src/test/java/io/trino/operator/scalar/timestamp/BenchmarkCastTimestampToVarchar.java

+@Fork(1)
+@Warmup(iterations = 5, time = 1, timeUnit = SECONDS)
+@Measurement(iterations = 5, time = 1, timeUnit = SECONDS)
+public class BenchmarkCastTimestampToVarchar


Why not benchmark the cast methods directly instead of going through the expression compiler and processor machinery?

I thought about it, but initially I did it this way to make sure I was actually changing the correct functions that would be used during casting operations. After I was sure that I was, it turned out actually to be easier to do it this way since I could test all of the different types with a single benchmark instead of having to specialize benchmark methods for each type as well as short/long variants of timestamps which end up dispatching to different methods based on precision.

martint · 2022-12-01T05:48:57Z

core/trino-spi/src/main/java/io/trino/spi/type/SqlTimeWithTimeZone.java

+            for (int index = builder.length() - 1; index > 8; index--) {
+                long temp = scaledFraction / 10;
+                int digit = (int) (scaledFraction - (temp * 10));
+                scaledFraction = temp;
+                builder.setCharAt(index, (char) ('0' + digit));


Didn't we add similar code somewhere else recently? We may want to consider putting it somewhere we can reuse it, since it's not particularly trivial.

We did indeed. That change was to SqlTimestamp in the SPI which improved performance just for the JSON serialization path. I think there are a few problems that I think has caused this code to proliferate:

SqlTimestamp and SqlTimestampWithTimeZone are SPI objects that seem to only exist in order to serialize values as JSON. They have similar, but slightly different logic from one another (the time zone part)

Casting to timestamp (and with time zone) types are currently implemented in trino-main and also formatted to strings (even though casting to varchar needs a UTF8 slice) because the formatting logic is being shared for long and short timestamp types both with and without time zones and for conversions and in parts of the JSON type handling paths. So it’s a 2x2 grid of timestamp types each targeting at least 2 output representations, and not accessible from the SPI - meanwhile the SPI can’t host the implementation because it depends on the packing / unpacking implementations in main.

Time with and without time zone is similar in Trino main, except that was already producing a slice directly (presumably for performance to avoid the intermediate string and since the format is simple enough to implement by hand) which means it also can’t live in the SPI because of the dependency on slice.

cla-bot bot added the cla-signed label Nov 30, 2022

pettyjamesm requested a review from martint November 30, 2022 18:21

github-actions bot added the tests:hive label Nov 30, 2022

pettyjamesm force-pushed the reduce-string-format-usages branch 2 times, most recently from 95aa4e3 to 6644105 Compare November 30, 2022 22:22

pettyjamesm marked this pull request as ready for review December 1, 2022 02:49

martint reviewed Dec 1, 2022

View reviewed changes

pettyjamesm added 10 commits December 5, 2022 10:30

Replace String.format with HexFormat in ColorType

bf08609

Use string concatenation in ViewReaderUtil

a2f6999

Avoid String.format in TimeZoneKey class init

f026136

Add timestamp cast benchmark

8cbeea0

Avoid String.format in SqlTimeWithTimeZone.toString()

99fe13c

Avoid String.format in SqlTime#toString()

74e6764

Reuse unpacked offset in TimeWithTimeZoneToTimestampWithTimeZoneCast

8130eed

Avoid String.format in TimeWithTimeZoneToVarcharCast

22ed0da

Avoid String.format in DateTimes formatting

1de43f6

Avoid String.format in TimeOperators

773fbea

pettyjamesm force-pushed the reduce-string-format-usages branch from 6644105 to 773fbea Compare December 5, 2022 15:30

pettyjamesm mentioned this pull request Dec 6, 2022

More string format removals #15316

Merged

martint merged commit 7339adc into trinodb:master Dec 6, 2022

pettyjamesm deleted the reduce-string-format-usages branch December 6, 2022 17:26

github-actions bot added this to the 404 milestone Dec 6, 2022

pettyjamesm added the performance label Dec 7, 2022

colebow mentioned this pull request Dec 14, 2022

Add Trino 405 release notes #15139

Merged

wendigo mentioned this pull request Aug 30, 2023

Multi-join pushdown appends recursively _<number> to column names so that they exceed maximum alias length #18642

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid String.format usages in hot paths #15258

Avoid String.format usages in hot paths #15258

pettyjamesm commented Nov 30, 2022 •

edited

Loading

martint Dec 1, 2022

pettyjamesm Dec 1, 2022

martint Dec 1, 2022

pettyjamesm Dec 1, 2022

Avoid String.format usages in hot paths #15258

Avoid String.format usages in hot paths #15258

Conversation

pettyjamesm commented Nov 30, 2022 • edited Loading

Description

Release notes

martint Dec 1, 2022

Choose a reason for hiding this comment

pettyjamesm Dec 1, 2022

Choose a reason for hiding this comment

martint Dec 1, 2022

Choose a reason for hiding this comment

pettyjamesm Dec 1, 2022

Choose a reason for hiding this comment

pettyjamesm commented Nov 30, 2022 •

edited

Loading