Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ByteSourceJsonBootstrapper use StringReader for < 8KiB byte[] inputs #1081

Merged
merged 3 commits into from
Aug 18, 2023

Conversation

schlosna
Copy link
Contributor

From @carterkozak #1079 (comment) this is an alternative implementation draft to address #593 (and #995 (comment) ) when deserializing from a byte[] as the InputStreamReader code path triggers an 8KiB HeapByteBuffer allocation for StreamDecoder regardless of input byte array length. This allocation significantly penalizes smaller byte[] sources.

image

The approach here converts byte[] inputs smaller thank 8KiB to a String and processed via StringReader. This should avoid unnecessary 8KiB heap byte buffer allocation and leverage OpenJDK's continued charset decoding improvements (e.g. https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html ).

Initial benchmarks from FasterXML/jackson-benchmarks#9 show StringReader providing performance equivalent to ByteArrayInputStream source in worst case, and anywhere from ~2x to ~10x speedup in best case.

@@ -230,6 +230,12 @@ public Reader constructReader() throws IOException
InputStream in = _in;

if (in == null) {
int length = _inputEnd - _inputPtr;
if (length >= 0 && length <= 8192) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: It may not be worth checking length >= 0. It has a small perf impact and the length is very unlikely to negative and will just fail at the ByteArrayInputStream stage anyway. The length == 0 case will probably be more tidily handled with the StringReader than the ByteArrayInputStream route.

Copy link
Contributor Author

@schlosna schlosna Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I flipped the condition to short circuit for large inputs first if (length <= STRING_READER_BYTE_ARRAY_LENGTH_LIMIT && length >= 0).

I'd prefer to keep the length >= 0 as the array length check here should be negligible, and ensures the ByteInputStream continues handling any negative array length path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking it through over night, I removed the length >= 0 condition.

@schlosna schlosna marked this pull request as ready for review August 11, 2023 02:09
@cowtowncoder cowtowncoder changed the title ByteSourceJsonBootstrapper uses StringReader for < 8KiB byte[] inputs Make ByteSourceJsonBootstrapper use StringReader for < 8KiB byte[] inputs Aug 18, 2023
@cowtowncoder cowtowncoder added the 2.16 Issue planned (at earliest) for 2.16 label Aug 18, 2023
@cowtowncoder
Copy link
Member

Ok I assume this has been benchmarked and since it only affects what I assume is minority use case (non-UTF-8 input that comes as byte input), I am ok merging it.
(but obviously is a use case that matters to some otherwise wouldn't be contributed :) ).

Thank you @schlosna !

@cowtowncoder cowtowncoder merged commit 4fd8c85 into FasterXML:2.16 Aug 18, 2023
@schlosna schlosna deleted the ds/593-StringReader branch August 28, 2023 12:43
schlosna added a commit to palantir/atlasdb that referenced this pull request Sep 22, 2023
Optimize to avoid allocation of heap ByteBuffer via InputStreamReader.
Remove after upgrade to Jackson 2.16.

see: FasterXML/jackson-core#1081 and
FasterXML/jackson-benchmarks#9
bulldozer-bot bot pushed a commit to palantir/atlasdb that referenced this pull request Sep 25, 2023
Optimize to avoid allocation of heap ByteBuffer via InputStreamReader. Remove after upgrade to Jackson 2.16.

see: FasterXML/jackson-core#1081 and FasterXML/jackson-benchmarks#9
schlosna added a commit to palantir/atlasdb that referenced this pull request Jun 27, 2024
Now that AtlasDB has upgraded to Jackson 2.16.1, remove performance
workaround that landed upstream in Jackson 2.16.0. See
FasterXML/jackson-core#1081
bulldozer-bot bot pushed a commit to palantir/atlasdb that referenced this pull request Jul 1, 2024
Now that AtlasDB has upgraded to Jackson 2.16.1, remove performance workaround that landed upstream in Jackson 2.16.0. See FasterXML/jackson-core#1081

Removes changes from #6750
wendigo added a commit to trinodb/trino that referenced this pull request Aug 5, 2024
Since FasterXML/jackson-core#1081 this optimization
does no longer make sense as it's applied internally by Jackson if needed.
wendigo added a commit to trinodb/trino that referenced this pull request Aug 6, 2024
Since FasterXML/jackson-core#1081 this optimization
does no longer make sense as it's applied internally by Jackson if needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.16 Issue planned (at earliest) for 2.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants