-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce marshaling allocations with MarshalAppend #503
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @emcfarlane! The performance win is definitely nice :)
I don't love adding more optional methods to Codec
, but the only alternative is to have connect's internal codecs cheat and use the client and handler's buffer pools. On balance, the approach you have is probably the better option.
Before merging this, I'd like:
- To wait for a released version of the protobuf runtime to have all the necessary features, and
- To have tests (probably using
testing/quick
) verifying that aMarshalAppend
followed by anUnmarshal
leaves the original protobuf message unchanged.
envelope.go
Outdated
if c, ok := w.codec.(marshalAppend); ok { | ||
raw, err = c.MarshalAppend(buffer.Bytes(), message) | ||
} else { | ||
raw, err = w.codec.Marshal(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think performance along this path has gotten worse, since we no longer put raw
into the pool when we're done with it. Adding a bit of code to the else
block should fix it:
if len(raw) > 0 {
defer w.bufferPool.Put(bytes.NewBuffer(raw))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with this change, I think we're failing to effectively pool here. At base, the buffers in the pool are pretty small. MarshalAppend
is likely to grow the slice, in which case returning buffer
to the pool no longer helps much. Give me a moment to play with this a bit locally and find a way to fix this that isn't too prone to bugs down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will do the right thing - doing buffer.Write(raw)
will grow the buffer as needed, so it will be returned to the pool with a larger size. I think we can remove adding an additional bytes.Buffer to the pool.
Protobuf v1.31.0 just released! Updated the library and added a test case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great - just a couple of small things. Looking forward to the perf improvements.
Use MarshalAppend proto methods to reuse the byte slices for serialization. Applied to internal codecs for proto and protojson. Deprecate Annotate to use AnnotateSymbol method instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me - leaving for @akshayjshah to take another look.
envelope.go
Outdated
if c, ok := w.codec.(marshalAppend); ok { | ||
raw, err = c.MarshalAppend(buffer.Bytes(), message) | ||
} else { | ||
raw, err = w.codec.Marshal(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with this change, I think we're failing to effectively pool here. At base, the buffers in the pool are pretty small. MarshalAppend
is likely to grow the slice, in which case returning buffer
to the pool no longer helps much. Give me a moment to play with this a bit locally and find a way to fix this that isn't too prone to bugs down the road.
If the original buffer was too small for the application workload, don't put it back in the pool.
Made a small change to allow the buffers in the pool to grow if they're not large enough; this change looks good to me now. @emcfarlane can you take a look and ensure that it's still ok with you? I'd like to get this into the next release if possible, since we're already planning to release some performance improvements. |
envelope.go
Outdated
} else { | ||
// The original buffer was sufficiently large for the workload, so we can | ||
// return it to the pool. | ||
defer w.bufferPool.Put(buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshayjshah I don't think this defer should be optional. We always want to defer the fetched buffer.
buffer := w.bufferPool.Get()
defer w.bufferPool.Put(buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we? For best performance, we want the buffers in the pool to be large enough to contain the marshaled bytes without resizing (at least most of the time). We know that the buffer we've pulled from the pool was too small for this message. I think we're better off letting the too-small buffer be GC'ed and putting the new buffer into the pool instead, under the assumption that we need the buffers in the pool to grow to accommodate the application workload.
None of this is perfect, but I think we're probably better off pooling over- rather than under-aggressively. The GC is still free to empty the pool on each cycle if it needs to reclaim the space. (Last I looked, I think it just categorically emptied pools...but I haven't checked in many years.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦🏽 @pkwarren pointed out to me that we're calling buffer.Write
just a few lines below, which allocates the same amount of memory all over again. That makes returning the original buffer to the pool fine, but also means we're eating allocs that we definitely don't need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling bytes.NewBuffer(largerThanBuf)
will cause an allocation like buffer.Write(largerThanBuf)
but has the chance for reuse. Maybe it's better to split these two methods to keep the old behaviour
var buffer *bytes.Buffer
defer w.bufferPool.Put(buffer)
switch codec := w.codec.(type) {
case marshalAppender:
buffer = w.bufferPool.Get()
raw, err := codec.MarshalAppend(buffer.Bytes(), message)
if err != nil {
return errorf(CodeInternal, "marshal append message: %w", err)
}
buffer.Write(raw)
default:
raw, err := w.codec.Marshal(message)
if err != nil {
return errorf(CodeInternal, "marshal message: %w", err)
}
// We can't avoid allocating the byte slice, so we may as well reuse it once
// we're done with it.
buffer = bytes.NewBuffer(raw)
}
envelope := &envelope{Data: buffer}
return w.Write(envelope)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did something similar-ish in the next commit to more clearly separate the two paths. I think I've gotten the best of both worlds: we avoid allocating if MarshalAppend
didn't grow the slice capacity, and we eat a fixed-size allocation instead of an O(N) allocation if it did. Performance improved slightly:
[main] BenchmarkConnect/unary-12 8054 1318347 ns/op 14414394 B/op 251 allocs/op
[original PR] BenchmarkConnect/unary-12 10192 1176274 ns/op 6122383 B/op 236 allocs/op
[latest] BenchmarkConnect/unary-12 10448 1153636 ns/op 6115680 B/op 236 allocs/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
[](https://renovatebot.com) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/bufbuild/connect-go](https://togithub.com/bufbuild/connect-go) | require | minor | `v1.8.0` -> `v1.9.0` | --- ### Release Notes <details> <summary>bufbuild/connect-go (github.com/bufbuild/connect-go)</summary> ### [`v1.9.0`](https://togithub.com/bufbuild/connect-go/releases/tag/v1.9.0) [Compare Source](https://togithub.com/bufbuild/connect-go/compare/v1.8.0...v1.9.0) #### What's Changed Along with a few new features and bugfixes, this release includes a variety of internal performance improvements. [v1.8.0] BenchmarkConnect/unary-12 8415 1305116 ns/op 14449031 B/op 254 allocs/op [v1.9.0] BenchmarkConnect/unary-12 10443 1151366 ns/op 6024079 B/op 236 allocs/op ##### Enhancements - Allow clients to set Host in headers by [@​emcfarlane](https://togithub.com/emcfarlane) in [https://github.com/bufbuild/connect-go/pull/522](https://togithub.com/bufbuild/connect-go/pull/522) - Allow Connect clients to configure max HTTP GET size by [@​jhump](https://togithub.com/jhump) in [https://github.com/bufbuild/connect-go/pull/529](https://togithub.com/bufbuild/connect-go/pull/529) - Reduce allocations in HTTP routing by [@​mattrobenolt](https://togithub.com/mattrobenolt) in [https://github.com/bufbuild/connect-go/pull/519](https://togithub.com/bufbuild/connect-go/pull/519) - Cache mapping of methods to protocol handlers by [@​emcfarlane](https://togithub.com/emcfarlane) in [https://github.com/bufbuild/connect-go/pull/525](https://togithub.com/bufbuild/connect-go/pull/525) - Improve performance of percent encoding by [@​emcfarlane](https://togithub.com/emcfarlane) in [https://github.com/bufbuild/connect-go/pull/531](https://togithub.com/bufbuild/connect-go/pull/531) - Reduce marshaling allocations with MarshalAppend by [@​emcfarlane](https://togithub.com/emcfarlane) in [https://github.com/bufbuild/connect-go/pull/503](https://togithub.com/bufbuild/connect-go/pull/503) ##### Bugfixes - Discard unknown JSON fields by default by [@​akshayjshah](https://togithub.com/akshayjshah) in [https://github.com/bufbuild/connect-go/pull/518](https://togithub.com/bufbuild/connect-go/pull/518) - Canonicalize Connect streaming trailer names by [@​jchadwick-buf](https://togithub.com/jchadwick-buf) in [https://github.com/bufbuild/connect-go/pull/528](https://togithub.com/bufbuild/connect-go/pull/528) ##### Other changes - Tighten internal lint checks by [@​zchee](https://togithub.com/zchee) in [https://github.com/bufbuild/connect-go/pull/520](https://togithub.com/bufbuild/connect-go/pull/520) - Simplify code when pooled buffers aren't returned by [@​pkwarren](https://togithub.com/pkwarren) in [https://github.com/bufbuild/connect-go/pull/532](https://togithub.com/bufbuild/connect-go/pull/532) #### New Contributors - [@​zchee](https://togithub.com/zchee) made their first contribution in [https://github.com/bufbuild/connect-go/pull/520](https://togithub.com/bufbuild/connect-go/pull/520) - [@​emcfarlane](https://togithub.com/emcfarlane) made their first contribution in [https://github.com/bufbuild/connect-go/pull/522](https://togithub.com/bufbuild/connect-go/pull/522) **Full Changelog**: bufbuild/connect-go@v1.8.0...v1.9.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/open-feature/flagd). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xNDEuMyIsInVwZGF0ZWRJblZlciI6IjM1LjE0MS4zIiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Use the `MarshalAppend` methods introduced in v1.31 of the protobuf runtime to more aggressively reuse buffers from Connect's internal pool (for both binary and JSON). This reduces allocations and substantially improves throughput: ``` BenchmarkConnect/unary-12 8054 1318347 ns/op 14414394 B/op 251 allocs/op BenchmarkConnect/unary-12 10448 1153636 ns/op 6115680 B/op 236 allocs/op ``` --------- Co-authored-by: Akshay Shah <[email protected]>
Use the
MarshalAppend
methods introduced in v1.31 of the protobuf runtime to more aggressively reuse buffers from Connect's internal pool (for both binary and JSON). This reduces allocations and substantially improves throughput: