Optimize the loop for writing varints in ProtoBuf #1294

qwwdfsad · 2021-01-18T09:17:30Z

It has no performance impact on small (~1-2 bytes) varints, but increases throughput of large (7-9 bytes) varints by up to 25%

shanshin · 2021-01-18T17:23:25Z

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/Streams.kt

    }

-    private fun encodeVarint32SlowPath(value: Int) {
+    private inline fun encodeVarint(value: Long, length: Int) {


It is unlikely that someone now is using a 32-bit CPU but these calculations can be slow on them.

Could you please elaborate on what you mean here?

current now have long type, in old encodeVarint32SlowPath it was int. So perhaps for 32-bit CPU and 32-bit varints operations and, or, ushr will be slower and total speed will degrade.

We generally do not care about 32-bit platforms

shanshin · 2021-01-18T17:28:58Z

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/Streams.kt

        // Fast-path: unrolled loop for single byte
+        ensureCapacity(5)


Why now for long we ensure dynamic length but here always 5?

Because 32-bit varints are used on a regular basis for tags, it's always beneficial to allocate all the space.
Also, because tags are small, one-byte fast-path is used and we do not have to calculate varint length at all for such values

shanshin · 2021-01-18T17:33:55Z

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/Streams.kt

+
+    fun encodeVarint64(value: Long) {
+        val length = varIntLength(value)
+        ensureCapacity(length + 1)


Suggest do +1 inside varIntLength or VAR_INT_LENGTHS init block to match the name.

Then it will slightly complicate the loop (have to do decrement), let's leave it as is

shanshin · 2021-01-18T18:45:24Z

formats/protobuf/commonMain/src/kotlinx/serialization/protobuf/internal/Streams.kt

    }

-    private fun encodeVarint32SlowPath(value: Int) {
+    private inline fun encodeVarint(value: Long, length: Int) {


current now have long type, in old encodeVarint32SlowPath it was int. So perhaps for 32-bit CPU and 32-bit varints operations and, or, ushr will be slower and total speed will degrade.

Optimize the loop for writing varints in ProtoBuf

412f18a

It has no performance impact on small (~1-2 bytes) varints, but increases throughput of large (7-9 bytes) varints by up to 25%

qwwdfsad assigned shanshin and unassigned shanshin Jan 18, 2021

qwwdfsad requested a review from shanshin January 18, 2021 09:17

shanshin approved these changes Jan 18, 2021

View reviewed changes

qwwdfsad merged commit cb6a56b into dev Jan 19, 2021

qwwdfsad deleted the proto-varint-opto branch January 19, 2021 17:32

dependabot bot mentioned this pull request Mar 15, 2021

[API]: Bump kotlinx-serialization-json from 1.0.1 to 1.1.0 in /noty-api PatilShreyas/NotyKT#145

Merged

richardstartin mentioned this pull request Mar 20, 2021

posts/dont-use-protobuf-for-telemetry richardstartin/richardstartin.github.io#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the loop for writing varints in ProtoBuf #1294

Optimize the loop for writing varints in ProtoBuf #1294

qwwdfsad commented Jan 18, 2021

shanshin Jan 18, 2021

qwwdfsad Jan 18, 2021

shanshin Jan 18, 2021

qwwdfsad Jan 19, 2021

shanshin Jan 18, 2021

qwwdfsad Jan 18, 2021

shanshin Jan 18, 2021

qwwdfsad Jan 19, 2021

shanshin Jan 18, 2021

		// Fast-path: unrolled loop for single byte
		ensureCapacity(5)

Optimize the loop for writing varints in ProtoBuf #1294

Optimize the loop for writing varints in ProtoBuf #1294

Conversation

qwwdfsad commented Jan 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment