Make `ByteSourceJsonBootstrapper` use `StringReader` for < 8KiB byte[] inputs #1081

schlosna · 2023-08-10T22:06:00Z

From @carterkozak #1079 (comment) this is an alternative implementation draft to address #593 (and #995 (comment) ) when deserializing from a byte[] as the InputStreamReader code path triggers an 8KiB HeapByteBuffer allocation for StreamDecoder regardless of input byte array length. This allocation significantly penalizes smaller byte[] sources.

The approach here converts byte[] inputs smaller thank 8KiB to a String and processed via StringReader. This should avoid unnecessary 8KiB heap byte buffer allocation and leverage OpenJDK's continued charset decoding improvements (e.g. https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html ).

Initial benchmarks from FasterXML/jackson-benchmarks#9 show StringReader providing performance equivalent to ByteArrayInputStream source in worst case, and anywhere from ~2x to ~10x speedup in best case.

pjfanning · 2023-08-10T22:49:19Z

src/main/java/com/fasterxml/jackson/core/json/ByteSourceJsonBootstrapper.java

@@ -230,6 +230,12 @@ public Reader constructReader() throws IOException
                InputStream in = _in;

                if (in == null) {
+                    int length = _inputEnd - _inputPtr;
+                    if (length >= 0 && length <= 8192) {


minor nit: It may not be worth checking length >= 0. It has a small perf impact and the length is very unlikely to negative and will just fail at the ByteArrayInputStream stage anyway. The length == 0 case will probably be more tidily handled with the StringReader than the ByteArrayInputStream route.

I flipped the condition to short circuit for large inputs first if (length <= STRING_READER_BYTE_ARRAY_LENGTH_LIMIT && length >= 0).

I'd prefer to keep the length >= 0 as the array length check here should be negligible, and ensures the ByteInputStream continues handling any negative array length path.

After thinking it through over night, I removed the length >= 0 condition.

cowtowncoder · 2023-08-18T23:46:13Z

Ok I assume this has been benchmarked and since it only affects what I assume is minority use case (non-UTF-8 input that comes as byte input), I am ok merging it.
(but obviously is a use case that matters to some otherwise wouldn't be contributed :) ).

Thank you @schlosna !

Optimize to avoid allocation of heap ByteBuffer via InputStreamReader. Remove after upgrade to Jackson 2.16. see: FasterXML/jackson-core#1081 and FasterXML/jackson-benchmarks#9

Now that AtlasDB has upgraded to Jackson 2.16.1, remove performance workaround that landed upstream in Jackson 2.16.0. See FasterXML/jackson-core#1081

Now that AtlasDB has upgraded to Jackson 2.16.1, remove performance workaround that landed upstream in Jackson 2.16.0. See FasterXML/jackson-core#1081 Removes changes from #6750

Since FasterXML/jackson-core#1081 this optimization does no longer make sense as it's applied internally by Jackson if needed.

ByteSourceJsonBootstrapper uses StringReader for < 8KiB byte[] inputs

053fd09

This was referenced Aug 10, 2023

Add StringReader input benchmark FasterXML/jackson-benchmarks#9

Merged

ByteSourceJsonBootstrapper uses CharBufferReader for byte[] inputs #1079

Closed

pjfanning reviewed Aug 10, 2023

View reviewed changes

Clarify 8KiB byte array length limit

f525c86

schlosna marked this pull request as ready for review August 11, 2023 02:09

Skip greater than zero check

a515382

cowtowncoder changed the title ~~ByteSourceJsonBootstrapper uses StringReader for < 8KiB byte[] inputs~~ Make ByteSourceJsonBootstrapper use StringReader for < 8KiB byte[] inputs Aug 18, 2023

cowtowncoder added the 2.16 Issue planned (at earliest) for 2.16 label Aug 18, 2023

cowtowncoder merged commit 4fd8c85 into FasterXML:2.16 Aug 18, 2023

cowtowncoder added a commit that referenced this pull request Aug 18, 2023

Update release notes wrt #1081; minor comment clean up

e33efb3

schlosna deleted the ds/593-StringReader branch August 28, 2023 12:43

schlosna mentioned this pull request Sep 14, 2023

Optimize InternalSchemaMetadataPayloadCodec serDe palantir/atlasdb#6739

Merged

schlosna mentioned this pull request Sep 22, 2023

Optimize JacksonPersister#hydrateFromBytes palantir/atlasdb#6750

Merged

Dith3r mentioned this pull request Jun 10, 2024

Improve performance of json functions trinodb/trino#22348

Merged

schlosna added a commit to palantir/atlasdb that referenced this pull request Jun 27, 2024

Remove Jackson byte[] workaround

acf3d58

Now that AtlasDB has upgraded to Jackson 2.16.1, remove performance workaround that landed upstream in Jackson 2.16.0. See FasterXML/jackson-core#1081

schlosna mentioned this pull request Jun 27, 2024

Remove Jackson byte[] workaround palantir/atlasdb#7168

Merged

wendigo mentioned this pull request Aug 5, 2024

Remove redundant copy in JsonUtil trinodb/trino#22941

Closed

wendigo added a commit to trinodb/trino that referenced this pull request Aug 5, 2024

Remove no longer needed optimization

e401847

Since FasterXML/jackson-core#1081 this optimization does no longer make sense as it's applied internally by Jackson if needed.

wendigo mentioned this pull request Aug 5, 2024

Remove no longer needed optimization trinodb/trino#22942

Merged

wendigo added a commit to trinodb/trino that referenced this pull request Aug 6, 2024

Remove no longer needed optimization

89cd867

Since FasterXML/jackson-core#1081 this optimization does no longer make sense as it's applied internally by Jackson if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `ByteSourceJsonBootstrapper` use `StringReader` for < 8KiB byte[] inputs #1081

Make `ByteSourceJsonBootstrapper` use `StringReader` for < 8KiB byte[] inputs #1081

schlosna commented Aug 10, 2023

pjfanning Aug 10, 2023

schlosna Aug 11, 2023 •

edited

Loading

schlosna Aug 11, 2023

cowtowncoder commented Aug 18, 2023

Make ByteSourceJsonBootstrapper use StringReader for < 8KiB byte[] inputs #1081

Make ByteSourceJsonBootstrapper use StringReader for < 8KiB byte[] inputs #1081

Conversation

schlosna commented Aug 10, 2023

pjfanning Aug 10, 2023

Choose a reason for hiding this comment

schlosna Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

schlosna Aug 11, 2023

Choose a reason for hiding this comment

cowtowncoder commented Aug 18, 2023

Make `ByteSourceJsonBootstrapper` use `StringReader` for < 8KiB byte[] inputs #1081

Make `ByteSourceJsonBootstrapper` use `StringReader` for < 8KiB byte[] inputs #1081

schlosna Aug 11, 2023 •

edited

Loading