Introduce CharArray caching for InputStream decoding #2100

qwwdfsad · 2022-11-18T14:03:15Z

No description provided.

qwwdfsad · 2022-11-18T14:05:22Z

JacksonComparisonBenchmark.kotlinFromStream: 260 ops/ms -> 460 ops/ms (NB: it's ASCII-only)

formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonStreams.kt

* By default, 16k buffer is allocated for char-related operations, and it may create a non-trivial GC-pressure for small objects decoding * The solution is simple -- pool these char arrays * The estimated performance improvement is tens of percents, reaching 70% on our ASCII benchmarks Fixes #1893

shanshin · 2022-11-18T16:18:34Z

formats/json/jvmMain/src/kotlinx/serialization/json/internal/CharArrayPool.kt

@@ -13,20 +17,37 @@ internal object CharArrayPool {
        System.getProperty("kotlinx.serialization.json.pool.size").toIntOrNull()
    }.getOrNull() ?: 1024 * 1024 // 2 MB seems to be a reasonable constraint, (1M of chars)

-    fun take(): CharArray {
+    protected fun take(size: Int): CharArray {


I think it's better to take the size in the class constructor rather than in the function call.

In fact, this is the case for each instance of the pool right now.
Another reason is a potential error if I call like this

val a = take(100) release(a) val b = take(100) release(b) val c = take(200) c[101] // !!!

The API does not advise in any way that you should not do this.

If you add a size to the constructor, then you can also add a check for the size of the array inside the releaseImpl

This is a good remark for any general-purpose API, but in this situation, we are dealing with well-tailored internal pools for one specific purpose. Also, it will force us to introduce a few more bytecodes -- for class with ctor, for outer class that will store the instance etc.

In the byte code, only a new field with the int type will be added (the parameter will be transferred from the take method to the init method), and in the child class, only the value from the parameter will be setted to the child constructor.

I think this is a good tradeoff so that if someone else refactor the code after a while, they don't add an unobvious bug.

Given that this is a protected method, I don't see that it is necessary for now, but the idea overall is a valid concern.

The method is indeed protected and sued as a base for the implementation.
None of the implementations are prone to the potential bug described above and I prefer to avoid overcomplications in a 50 lines of fully internal code that has a single purpose

shanshin · 2022-11-18T16:35:51Z

formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonStreams.kt

@@ -47,6 +51,7 @@ internal fun <T> Json.decodeToSequenceByReader(
    deserializer: DeserializationStrategy<T>,
    format: DecodeSequenceMode = DecodeSequenceMode.AUTO_DETECT
 ): Sequence<T> {
+    // Note: no explicit release, as the sequence are lazy and thrown away in an arbitrary manner


Do I understand correctly that this function permanently "steals" an array from the pool and if decodeToSequenceByReader and decodeByReader are called in the application, then in fact this is equivalent to creating a new instance of the array for each call to decodeToSequenceByReader?

Maybe in this case it is better not to use the pool for decodeToSequenceByReader so that not affects decodeByReader?
At the same time, in any case, each call to decodeToSequenceByReader will require the creation of a new array (possibly in another thread), so the total amount of work does not change from this.

+1. Looks like 'taking and never returning' in decodeToSequence may negatively affect decodeFromStream

Alternatively, you may try to add lexer.release() to JsonIterator.hasNext() implementation. This may solve the problems at the first sight but may require thorough testing.

Although it would still steal buffer if the resulting sequence is used together with things like .take()/limit, but it's still better than current solution?

sandwwraith · 2022-11-21T14:18:50Z

formats/json-tests/jvmTest/src/kotlinx/serialization/json/JsonConcurrentStressTest.kt

+
+    private fun getRandomString() = (1..Random.nextInt(0, charset.length)).map { charset[it] }.joinToString(separator = "")
+
+    private fun doTest(iterations: Int, block: (JsonTestingMode) -> Unit) {


JsonTestingMode seems to be not used anywhere.

I'd also add test for decodeFromStream

It is used in the test bodies, streams are tested as well

Ah, I haven't noticed because it has the default name it. Maybe add a name for better readability?

sandwwraith · 2022-11-21T14:23:30Z

formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonStreams.kt

@@ -47,6 +51,7 @@ internal fun <T> Json.decodeToSequenceByReader(
    deserializer: DeserializationStrategy<T>,
    format: DecodeSequenceMode = DecodeSequenceMode.AUTO_DETECT
 ): Sequence<T> {
+    // Note: no explicit release, as the sequence are lazy and thrown away in an arbitrary manner


+1. Looks like 'taking and never returning' in decodeToSequence may negatively affect decodeFromStream

sandwwraith · 2022-11-21T14:25:40Z

formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonStreams.kt

@@ -47,6 +51,7 @@ internal fun <T> Json.decodeToSequenceByReader(
    deserializer: DeserializationStrategy<T>,
    format: DecodeSequenceMode = DecodeSequenceMode.AUTO_DETECT
 ): Sequence<T> {
+    // Note: no explicit release, as the sequence are lazy and thrown away in an arbitrary manner


Alternatively, you may try to add lexer.release() to JsonIterator.hasNext() implementation. This may solve the problems at the first sight but may require thorough testing.

sandwwraith · 2022-11-21T14:27:53Z

formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonStreams.kt

@@ -47,6 +51,7 @@ internal fun <T> Json.decodeToSequenceByReader(
    deserializer: DeserializationStrategy<T>,
    format: DecodeSequenceMode = DecodeSequenceMode.AUTO_DETECT
 ): Sequence<T> {
+    // Note: no explicit release, as the sequence are lazy and thrown away in an arbitrary manner


Although it would still steal buffer if the resulting sequence is used together with things like .take()/limit, but it's still better than current solution?

sandwwraith · 2022-11-21T14:30:03Z

formats/json/jvmMain/src/kotlinx/serialization/json/internal/CharArrayPool.kt

@@ -13,20 +17,37 @@ internal object CharArrayPool {
        System.getProperty("kotlinx.serialization.json.pool.size").toIntOrNull()
    }.getOrNull() ?: 1024 * 1024 // 2 MB seems to be a reasonable constraint, (1M of chars)

-    fun take(): CharArray {
+    protected fun take(size: Int): CharArray {


Given that this is a protected method, I don't see that it is necessary for now, but the idea overall is a valid concern.

sandwwraith · 2022-11-22T14:21:56Z

formats/json-tests/jvmTest/src/kotlinx/serialization/json/JsonConcurrentStressTest.kt

+
+    private fun getRandomString() = (1..Random.nextInt(0, charset.length)).map { charset[it] }.joinToString(separator = "")
+
+    private fun doTest(iterations: Int, block: (JsonTestingMode) -> Unit) {


Ah, I haven't noticed because it has the default name it. Maybe add a name for better readability?

* Add benchmark * Introduce pooling of CharArray for InputStream decoding: by default, 16k buffer is allocated for char-related operations, and it may create a non-trivial GC pressure for small objects decoding * The estimated performance improvement is tens of percents, reaching 70% on our ASCII benchmarks Fixes Kotlin#1893

Add benchmark

eae2056

qwwdfsad requested a review from sandwwraith November 18, 2022 14:34

sandwwraith reviewed Nov 18, 2022

View reviewed changes

formats/json/commonMain/src/kotlinx/serialization/json/internal/JsonStreams.kt Outdated Show resolved Hide resolved

qwwdfsad force-pushed the streams-opto branch from 191bc20 to 890c0e1 Compare November 18, 2022 15:48

qwwdfsad requested a review from sandwwraith November 18, 2022 15:49

shanshin self-requested a review November 18, 2022 15:56

shanshin reviewed Nov 18, 2022

View reviewed changes

sandwwraith requested changes Nov 21, 2022

View reviewed changes

~cleanup

b021565

qwwdfsad requested a review from sandwwraith November 22, 2022 09:46

sandwwraith approved these changes Nov 22, 2022

View reviewed changes

shanshin approved these changes Nov 22, 2022

View reviewed changes

~improve readability

742ba46

qwwdfsad merged commit 57deef6 into dev Nov 24, 2022

qwwdfsad deleted the streams-opto branch November 24, 2022 14:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce CharArray caching for InputStream decoding #2100

Introduce CharArray caching for InputStream decoding #2100

qwwdfsad commented Nov 18, 2022

qwwdfsad commented Nov 18, 2022

shanshin Nov 18, 2022

qwwdfsad Nov 21, 2022

shanshin Nov 21, 2022

sandwwraith Nov 21, 2022

qwwdfsad Nov 22, 2022

shanshin Nov 18, 2022

qwwdfsad Nov 21, 2022

shanshin Nov 21, 2022

sandwwraith Nov 21, 2022

sandwwraith Nov 21, 2022

sandwwraith Nov 21, 2022

sandwwraith Nov 21, 2022

qwwdfsad Nov 22, 2022

sandwwraith Nov 22, 2022

qwwdfsad Nov 24, 2022

sandwwraith Nov 21, 2022

sandwwraith Nov 21, 2022

sandwwraith Nov 21, 2022

sandwwraith Nov 21, 2022

sandwwraith Nov 22, 2022


		private fun getRandomString() = (1..Random.nextInt(0, charset.length)).map { charset[it] }.joinToString(separator = "")

		private fun doTest(iterations: Int, block: (JsonTestingMode) -> Unit) {

Introduce CharArray caching for InputStream decoding #2100

Introduce CharArray caching for InputStream decoding #2100

Conversation

qwwdfsad commented Nov 18, 2022

qwwdfsad commented Nov 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment