-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the loop for writing varints in ProtoBuf #1294
Conversation
It has no performance impact on small (~1-2 bytes) varints, but increases throughput of large (7-9 bytes) varints by up to 25%
} | ||
|
||
private fun encodeVarint32SlowPath(value: Int) { | ||
private inline fun encodeVarint(value: Long, length: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unlikely that someone now is using a 32-bit CPU but these calculations can be slow on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate on what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current
now have long
type, in old encodeVarint32SlowPath
it was int
. So perhaps for 32-bit CPU and 32-bit varints operations and
, or
, ushr
will be slower and total speed will degrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally do not care about 32-bit platforms
// Fast-path: unrolled loop for single byte | ||
ensureCapacity(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why now for long
we ensure dynamic length but here always 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because 32-bit varints are used on a regular basis for tags, it's always beneficial to allocate all the space.
Also, because tags are small, one-byte fast-path is used and we do not have to calculate varint length at all for such values
|
||
fun encodeVarint64(value: Long) { | ||
val length = varIntLength(value) | ||
ensureCapacity(length + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest do +1 inside varIntLength
or VAR_INT_LENGTHS
init block to match the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will slightly complicate the loop (have to do decrement), let's leave it as is
} | ||
|
||
private fun encodeVarint32SlowPath(value: Int) { | ||
private inline fun encodeVarint(value: Long, length: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current
now have long
type, in old encodeVarint32SlowPath
it was int
. So perhaps for 32-bit CPU and 32-bit varints operations and
, or
, ushr
will be slower and total speed will degrade.
It has no performance impact on small (~1-2 bytes) varints, but increases throughput of large (7-9 bytes) varints by up to 25%