Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize InstantDeserializer addInColonToOffsetIfMissing() #336

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

schlosna
Copy link
Contributor

@schlosna schlosna commented Dec 16, 2024

When using Jackson to deserialize timestamps with formatter of DateTimeFormatter.ISO_OFFSET_DATE_TIME or DateTimeFormatter.ISO_ZONED_DATE_TIME, a lot of time is spent in InstantDeserializer::addInColonToOffsetIfMissing allocating and performing regex matching on possible timezone offset, even if the input timestamp is already in a valid ISO 8601 format with explicit zone of Z or with colon separated offset.

Similar to #266

# 2021 MacBookPro M1 Pro
# JMH version: 1.37
# VM version: JDK 21.0.5, OpenJDK 64-Bit Server VM, 21.0.5+11-LTS

Before (2.18.2)

Benchmark                                    Mode  Cnt     Score    Error  Units
InstantDeserializerBenchmark.offsetDateTime  avgt    5   942.358 ± 21.485  ns/op
InstantDeserializerBenchmark.zonedDateTime   avgt    5  1025.040 ± 37.269  ns/op

After (2.19.0-SNAPSHOT)

Benchmark                                    Mode  Cnt    Score     Error  Units
InstantDeserializerBenchmark.offsetDateTime  avgt    5  705.542 ±  20.482  ns/op
InstantDeserializerBenchmark.zonedDateTime   avgt    5  850.149 ± 219.331  ns/op
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.json.JsonMapper;
import com.fasterxml.jackson.datatype.jsr310.JavaTimeModule;
import java.time.OffsetDateTime;
import java.time.ZoneOffset;
import java.time.ZonedDateTime;
import java.time.format.DateTimeFormatter;
import java.util.List;
import java.util.Locale;
import java.util.concurrent.TimeUnit;
import java.util.function.Function;
import java.util.stream.IntStream;
import java.util.stream.Stream;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.MethodSource;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 3, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
@SuppressWarnings({"designforextension", "NullAway", "CheckStyle"})
public class DateTimeDeserializerBenchmark {

    private static final ObjectMapper mapper = JsonMapper.builder()
            .defaultLocale(Locale.ENGLISH)
            .addModule(new JavaTimeModule())
            .build();

    private static final List<String> timestamps = timestamps();
    private static final int EXPECTED_TIMESTAMPS = 515;

    @Benchmark
    @OperationsPerInvocation(EXPECTED_TIMESTAMPS)
    public void zonedDateTime(Blackhole blackhole) throws Exception {
        for (String string : timestamps) {
            blackhole.consume(mapper.readValue(string, ZonedDateTime.class));
        }
    }

    @Benchmark
    @OperationsPerInvocation(EXPECTED_TIMESTAMPS)
    public void offsetDateTime(Blackhole blackhole) throws Exception {
        for (String string : timestamps) {
            blackhole.consume(mapper.readValue(string, OffsetDateTime.class));
        }
    }

    public static List<String> timestamps() {
        return Stream.of(
                        DateTimeFormatter.ISO_DATE_TIME,
                        DateTimeFormatter.ISO_INSTANT,
                        DateTimeFormatter.ISO_OFFSET_DATE_TIME,
                        DateTimeFormatter.ISO_ZONED_DATE_TIME)
                .flatMap(f -> IntStream.rangeClosed(-18, 18)
                        .mapToObj(h -> Stream.of(
                                f.format(OffsetDateTime.now(ZoneOffset.ofHours(h))),
                                f.format(OffsetDateTime.now(
                                        ZoneOffset.ofHoursMinutes(h, Math.abs(h) == 18 ? 0 : h < 0 ? -30 : 30)))))
                        .flatMap(Function.identity()))
                .flatMap(ts -> {
                    int lastColon = ts.lastIndexOf(':');
                    if (lastColon == -1 || lastColon != ts.length() - 3) {
                        return Stream.of(ts);
                    }
                    return Stream.of(
                            ts, new StringBuilder(ts).deleteCharAt(lastColon).toString());
                })
                .map(ts -> '"' + ts + '"')
                .toList();
    }

    public static void main(String[] _args) throws Exception {
        new Runner(new OptionsBuilder()
                        .include(DateTimeDeserializerBenchmark.class.getSimpleName())
                        .build())
                .run();
    }
}

@schlosna schlosna changed the title Ds/colon offset Optimize InstantDeserializer addInColonToOffsetIfMissing Dec 16, 2024
@schlosna schlosna marked this pull request as ready for review December 16, 2024 13:57
Copy link
Member

@JooHyukKim JooHyukKim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schlosna Super interesting findings! 👍🏼👍🏼 Just out of curiosity, may I ask how much performance improvement this change makes in your usecase/production? Here the performance test here you shared (thank you) seems to show like 20% improvement, but just wondering how much or just how it helps in production.

Thank you in advance!

Comment on lines +815 to +816
@Test
public void OffsetDateTime_with_offset_can_be_deserialized() throws Exception {
Copy link
Member

@JooHyukKim JooHyukKim Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we can merge this and one for zonedDateTime below into a separate test class like Xxx336Test.java for their purposes and similar style, but idk might be overkill for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can consolidate these into a separate test class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think keeping them along with existing test makes sense: this is not new functionality but optimizing (and adding test coverage). So let's not create per-issue test classes.

@JooHyukKim
Copy link
Member

JooHyukKim commented Dec 16, 2024

Also, for a second I thought about maybe change addInColonToOffsetIfMissing to protected for internal use then... have new (non-existent currently)

  • ZonedDateTimeDeserializer
  • OffsetDateTimeDeserializer

classes to override for their own good. This one also potential overkill (or at least for at this point)

@schlosna
Copy link
Contributor Author

@schlosna Super interesting findings! 👍🏼👍🏼 Just out of curiosity, may I ask how much performance improvement this change makes in your usecase/production? Here the performance test here you shared (thank you) seems to show like 20% improvement, but just wondering how much or just how it helps in production.

Thank you in advance!

Thanks for the quick review.

I have seen profiles pointing at the regex matcher allocations and method profiles pointing at addInColonToOffsetIfMissing for a number of production systems that heavily use Jackson and OffsetDateTime for serialization/deserialization. I will try to spin up a more realistic JMH benchmark workload.

@cowtowncoder cowtowncoder added the cla-received Marker to denote that there is a CLA for pr label Dec 17, 2024
@cowtowncoder
Copy link
Member

While more performance results can be useful and interesting, I think I am satisfied with included benchmarks. True, end-to-end effect will be more limited, but this seems like safe change wrt test coverage.

So I will go ahead and merge -- 2.19(.0) makes sense since while looks safe enough, changes are not trivial so prefer inclusion in minor version (over patch).

@cowtowncoder cowtowncoder merged commit 29aa2b8 into FasterXML:2.19 Dec 17, 2024
4 checks passed
@cowtowncoder cowtowncoder changed the title Optimize InstantDeserializer addInColonToOffsetIfMissing Optimize InstantDeserializer addInColonToOffsetIfMissing() Dec 17, 2024
@cowtowncoder cowtowncoder modified the milestones: 2.19., 2.19.0 Dec 17, 2024
cowtowncoder added a commit that referenced this pull request Dec 17, 2024
@schlosna schlosna deleted the ds/colon-offset branch December 17, 2024 02:16
@schlosna
Copy link
Contributor Author

Thanks @cowtowncoder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-received Marker to denote that there is a CLA for pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants