-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ability to buffered read huge strings in custom KSerializers #2012
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I think we can try to fit this into the 1.5.0 release window. Can you please 1) rebase PR on the actual dev
and 2) update API dumps?
formats/json/commonMain/src/kotlinx/serialization/json/internal/StreamingJsonDecoder.kt
Outdated
Show resolved
Hide resolved
Do I need update API dumps if jvmApiCheck is passed? |
If check task is successful, then API dumps won't change on updating |
But |
You right, sorry, my bad. Fixed |
I think much better use InputStream (decodeFromStream) + define custom serializer for BASE64 element In serializer you can use or create temporary limited size ByteArrayBuffer / BufferedInputStream read bytes windowed/chunked by buffer size , decode and save to bytearray or temp file (outputStream). After finish create String from ByteArray or file. Dont use ...From/ToString because its cause AllRead() and create large String Object, You was need 6g buffer * 3 times (for full read, for convert and for object). Then you read in StreamMode, you can read by bytes and reuse one 64k temp buffer Also look |
@fred01 :kotlinx-serialization-json:jvmApiCheck as well |
|
Yes, I looked closely at this PR before starting this one. Get some inspiration from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You likely need to rebase PR on the dev branch, as there are some optimization changes happened to lexer
core/commonMain/src/kotlinx/serialization/encoding/ChunkedDecoder.kt
Outdated
Show resolved
Hide resolved
formats/json-tests/jvmTest/src/kotlinx/serialization/json/JsonChunkedDecoderTest.kt
Outdated
Show resolved
Hide resolved
formats/json/commonMain/src/kotlinx/serialization/json/internal/StreamingJsonDecoder.kt
Outdated
Show resolved
Hide resolved
core/commonMain/src/kotlinx/serialization/encoding/ChunkedDecoder.kt
Outdated
Show resolved
Hide resolved
var char = source[currentPosition] // Avoid two range checks visible in the profiler | ||
while (char != STRING) { | ||
if (++currentPosition >= source.length) { | ||
// end of chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you're not handling string escape sequences. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. For the first time, it was intentionally, because I suppose to consume base64 string, which can't contain double quote. But later, I prefer to generic approach, and seems, now I should handle double quotes as well. Will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for escaping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please note - to properly handle escaping, I'm forced to move actual decoding method from ReaderJsonLexer to AbstractJsonLexer. In other case I would need un-private a lot amount of methods of AbstractJsonLexer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't quite understand. Can you elaborate on that please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, I placed my method in ReaderJsonLexer to use highest possible hierarchy level. But to properly handle escaping I forced to move handling method to AbstractJsonLexer, go down one level. Is it OK?
formats/json/commonMain/src/kotlinx/serialization/json/internal/lexer/JsonLexer.kt
Outdated
Show resolved
Hide resolved
formats/json/commonMain/src/kotlinx/serialization/json/internal/StreamingJsonDecoder.kt
Outdated
Show resolved
Hide resolved
formats/json/commonMain/src/kotlinx/serialization/json/internal/lexer/StringJsonLexer.kt
Outdated
Show resolved
Hide resolved
- Support escaping - Support lenient mode - KDoc fixes - Formatting
- Added sample usage to method's KDoc
core/commonMain/src/kotlinx/serialization/encoding/ChunkedDecoder.kt
Outdated
Show resolved
Hide resolved
core/commonMain/src/kotlinx/serialization/encoding/ChunkedDecoder.kt
Outdated
Show resolved
Hide resolved
core/commonMain/src/kotlinx/serialization/encoding/ChunkedDecoder.kt
Outdated
Show resolved
Hide resolved
core/commonMain/src/kotlinx/serialization/encoding/ChunkedDecoder.kt
Outdated
Show resolved
Hide resolved
core/commonMain/src/kotlinx/serialization/encoding/ChunkedDecoder.kt
Outdated
Show resolved
Hide resolved
data class ClassWithLargeStringDataField(val largeStringField: LargeStringData) | ||
|
||
|
||
object LargeStringSerializer : KSerializer<LargeStringData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that interface and implementation you've provided are located in commonMain
, i.e. multiplatform, they should be tested in commonTest
, too — at least that part which is possible. IIRC Base64 is not MPP, so you can move LargeStringSerializer
and its test to commonTest and leave LargeBase64StringSerializer
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. BTW, I have a simple native kotlin Base64 implementation, inspired by Okio https://android.googlesource.com/platform/external/okhttp/+/a2cab72aa5ff730ba2ae987b45398faafffeb505/okio/okio/src/main/java/okio/Base64.java (apache license) To be honest it's just converted from java and slightly corrected. Is it worth (allowed) to use it here? In that case we can move this test completely to common part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's worth it, as testing can be done without it
formats/json-tests/jvmTest/src/kotlinx/serialization/json/JsonChunkedDecoderTest.kt
Outdated
Show resolved
Hide resolved
formats/json-tests/jvmTest/src/kotlinx/serialization/json/JsonChunkedDecoderTest.kt
Outdated
Show resolved
Hide resolved
var char = source[currentPosition] // Avoid two range checks visible in the profiler | ||
while (char != STRING) { | ||
if (++currentPosition >= source.length) { | ||
// end of chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't quite understand. Can you elaborate on that please?
formats/json-tests/jvmTest/src/kotlinx/serialization/json/JsonChunkedDecoderTest.kt
Outdated
Show resolved
Hide resolved
@@ -307,6 +306,54 @@ internal abstract class AbstractJsonLexer { | |||
*/ | |||
abstract fun consumeKeyString(): String | |||
|
|||
private fun insideString(isLenient:Boolean, char:Char):Boolean = if (isLenient) { charToTokenClass(char) == TC_OTHER } else { char != STRING } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting: spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still formatting
formats/json/commonMain/src/kotlinx/serialization/json/internal/lexer/AbstractJsonLexer.kt
Outdated
Show resolved
Hide resolved
…l/lexer/AbstractJsonLexer.kt Co-authored-by: Leonid Startsev <[email protected]>
…der.kt Co-authored-by: Leonid Startsev <[email protected]>
…der.kt Co-authored-by: Leonid Startsev <[email protected]>
…der.kt Co-authored-by: Leonid Startsev <[email protected]>
- Slighly modified KDoc documention and example - Moved non-base64 part of test to json commonText - Avoid code duplication in plain string test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK after fixing minor comments
core/commonMain/src/kotlinx/serialization/encoding/ChunkedDecoder.kt
Outdated
Show resolved
Hide resolved
@@ -307,6 +306,54 @@ internal abstract class AbstractJsonLexer { | |||
*/ | |||
abstract fun consumeKeyString(): String | |||
|
|||
private fun insideString(isLenient:Boolean, char:Char):Boolean = if (isLenient) { charToTokenClass(char) == TC_OTHER } else { char != STRING } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still formatting
Although check failing tests on non-JVM platforms |
- Slighly KDoc modifications - Fixed tests for non-JVM platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, thank you!
#1987 Added method which allow to perform large string handling (decode base64 binary for example) at user level