-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce CharArray caching for InputStream decoding #2100
Conversation
JacksonComparisonBenchmark.kotlinFromStream: 260 ops/ms -> 460 ops/ms (NB: it's ASCII-only) |
formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonStreams.kt
Outdated
Show resolved
Hide resolved
* By default, 16k buffer is allocated for char-related operations, and it may create a non-trivial GC-pressure for small objects decoding * The solution is simple -- pool these char arrays * The estimated performance improvement is tens of percents, reaching 70% on our ASCII benchmarks Fixes #1893
191bc20
to
890c0e1
Compare
@@ -13,20 +17,37 @@ internal object CharArrayPool { | |||
System.getProperty("kotlinx.serialization.json.pool.size").toIntOrNull() | |||
}.getOrNull() ?: 1024 * 1024 // 2 MB seems to be a reasonable constraint, (1M of chars) | |||
|
|||
fun take(): CharArray { | |||
protected fun take(size: Int): CharArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to take the size in the class constructor rather than in the function call.
In fact, this is the case for each instance of the pool right now.
Another reason is a potential error if I call like this
val a = take(100)
release(a)
val b = take(100)
release(b)
val c = take(200)
c[101] // !!!
The API does not advise in any way that you should not do this.
If you add a size to the constructor, then you can also add a check for the size of the array inside the releaseImpl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good remark for any general-purpose API, but in this situation, we are dealing with well-tailored internal pools for one specific purpose. Also, it will force us to introduce a few more bytecodes -- for class with ctor, for outer class that will store the instance etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the byte code, only a new field with the int type will be added (the parameter will be transferred from the take
method to the init
method), and in the child class, only the value from the parameter will be setted to the child constructor.
I think this is a good tradeoff so that if someone else refactor the code after a while, they don't add an unobvious bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a protected method, I don't see that it is necessary for now, but the idea overall is a valid concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is indeed protected and sued as a base for the implementation.
None of the implementations are prone to the potential bug described above and I prefer to avoid overcomplications in a 50 lines of fully internal code that has a single purpose
@@ -47,6 +51,7 @@ internal fun <T> Json.decodeToSequenceByReader( | |||
deserializer: DeserializationStrategy<T>, | |||
format: DecodeSequenceMode = DecodeSequenceMode.AUTO_DETECT | |||
): Sequence<T> { | |||
// Note: no explicit release, as the sequence are lazy and thrown away in an arbitrary manner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that this function permanently "steals" an array from the pool and if decodeToSequenceByReader
and decodeByReader
are called in the application, then in fact this is equivalent to creating a new instance of the array for each call to decodeToSequenceByReader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in this case it is better not to use the pool for decodeToSequenceByReader
so that not affects decodeByReader
?
At the same time, in any case, each call to decodeToSequenceByReader
will require the creation of a new array (possibly in another thread), so the total amount of work does not change from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Looks like 'taking and never returning' in decodeToSequence may negatively affect decodeFromStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you may try to add lexer.release()
to JsonIterator.hasNext()
implementation. This may solve the problems at the first sight but may require thorough testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it would still steal buffer if the resulting sequence is used together with things like .take()
/limit
, but it's still better than current solution?
|
||
private fun getRandomString() = (1..Random.nextInt(0, charset.length)).map { charset[it] }.joinToString(separator = "") | ||
|
||
private fun doTest(iterations: Int, block: (JsonTestingMode) -> Unit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JsonTestingMode
seems to be not used anywhere.
I'd also add test for decodeFromStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in the test bodies, streams are tested as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I haven't noticed because it has the default name it
. Maybe add a name for better readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -47,6 +51,7 @@ internal fun <T> Json.decodeToSequenceByReader( | |||
deserializer: DeserializationStrategy<T>, | |||
format: DecodeSequenceMode = DecodeSequenceMode.AUTO_DETECT | |||
): Sequence<T> { | |||
// Note: no explicit release, as the sequence are lazy and thrown away in an arbitrary manner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Looks like 'taking and never returning' in decodeToSequence may negatively affect decodeFromStream
@@ -47,6 +51,7 @@ internal fun <T> Json.decodeToSequenceByReader( | |||
deserializer: DeserializationStrategy<T>, | |||
format: DecodeSequenceMode = DecodeSequenceMode.AUTO_DETECT | |||
): Sequence<T> { | |||
// Note: no explicit release, as the sequence are lazy and thrown away in an arbitrary manner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you may try to add lexer.release()
to JsonIterator.hasNext()
implementation. This may solve the problems at the first sight but may require thorough testing.
@@ -47,6 +51,7 @@ internal fun <T> Json.decodeToSequenceByReader( | |||
deserializer: DeserializationStrategy<T>, | |||
format: DecodeSequenceMode = DecodeSequenceMode.AUTO_DETECT | |||
): Sequence<T> { | |||
// Note: no explicit release, as the sequence are lazy and thrown away in an arbitrary manner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it would still steal buffer if the resulting sequence is used together with things like .take()
/limit
, but it's still better than current solution?
@@ -13,20 +17,37 @@ internal object CharArrayPool { | |||
System.getProperty("kotlinx.serialization.json.pool.size").toIntOrNull() | |||
}.getOrNull() ?: 1024 * 1024 // 2 MB seems to be a reasonable constraint, (1M of chars) | |||
|
|||
fun take(): CharArray { | |||
protected fun take(size: Int): CharArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a protected method, I don't see that it is necessary for now, but the idea overall is a valid concern.
|
||
private fun getRandomString() = (1..Random.nextInt(0, charset.length)).map { charset[it] }.joinToString(separator = "") | ||
|
||
private fun doTest(iterations: Int, block: (JsonTestingMode) -> Unit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I haven't noticed because it has the default name it
. Maybe add a name for better readability?
* Add benchmark * Introduce pooling of CharArray for InputStream decoding: by default, 16k buffer is allocated for char-related operations, and it may create a non-trivial GC pressure for small objects decoding * The estimated performance improvement is tens of percents, reaching 70% on our ASCII benchmarks Fixes Kotlin#1893
No description provided.