Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce marshaling allocations with MarshalAppend #503

Merged
merged 7 commits into from
Jun 27, 2023

Conversation

emcfarlane
Copy link
Contributor

@emcfarlane emcfarlane commented May 9, 2023

Use the MarshalAppend methods introduced in v1.31 of the protobuf runtime to more aggressively reuse buffers from Connect's internal pool (for both binary and JSON). This reduces allocations and substantially improves throughput:

BenchmarkConnect/unary-12                 8054           1318347 ns/op        14414394 B/op        251 allocs/op
BenchmarkConnect/unary-12                 10448          1153636 ns/op         6115680 B/op        236 allocs/op

@CLAassistant
Copy link

CLAassistant commented May 9, 2023

CLA assistant check
All committers have signed the CLA.

@akshayjshah akshayjshah self-requested a review May 10, 2023 05:13
Copy link
Member

@akshayjshah akshayjshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @emcfarlane! The performance win is definitely nice :)

I don't love adding more optional methods to Codec, but the only alternative is to have connect's internal codecs cheat and use the client and handler's buffer pools. On balance, the approach you have is probably the better option.

Before merging this, I'd like:

  • To wait for a released version of the protobuf runtime to have all the necessary features, and
  • To have tests (probably using testing/quick) verifying that a MarshalAppend followed by an Unmarshal leaves the original protobuf message unchanged.

envelope.go Outdated
if c, ok := w.codec.(marshalAppend); ok {
raw, err = c.MarshalAppend(buffer.Bytes(), message)
} else {
raw, err = w.codec.Marshal(message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think performance along this path has gotten worse, since we no longer put raw into the pool when we're done with it. Adding a bit of code to the else block should fix it:

if len(raw) > 0 {
  defer w.bufferPool.Put(bytes.NewBuffer(raw))
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with this change, I think we're failing to effectively pool here. At base, the buffers in the pool are pretty small. MarshalAppend is likely to grow the slice, in which case returning buffer to the pool no longer helps much. Give me a moment to play with this a bit locally and find a way to fix this that isn't too prone to bugs down the road.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will do the right thing - doing buffer.Write(raw) will grow the buffer as needed, so it will be returned to the pool with a larger size. I think we can remove adding an additional bytes.Buffer to the pool.

@emcfarlane emcfarlane marked this pull request as ready for review June 26, 2023 13:19
@emcfarlane
Copy link
Contributor Author

Protobuf v1.31.0 just released! Updated the library and added a test case.

Copy link
Contributor

@pkwarren pkwarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great - just a couple of small things. Looking forward to the perf improvements.

Use MarshalAppend proto methods to reuse the byte slices for
serialization. Applied to internal codecs for proto and protojson.

Deprecate Annotate to use AnnotateSymbol method instead.
Copy link
Contributor

@pkwarren pkwarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me - leaving for @akshayjshah to take another look.

envelope.go Outdated
if c, ok := w.codec.(marshalAppend); ok {
raw, err = c.MarshalAppend(buffer.Bytes(), message)
} else {
raw, err = w.codec.Marshal(message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with this change, I think we're failing to effectively pool here. At base, the buffers in the pool are pretty small. MarshalAppend is likely to grow the slice, in which case returning buffer to the pool no longer helps much. Give me a moment to play with this a bit locally and find a way to fix this that isn't too prone to bugs down the road.

If the original buffer was too small for the application workload, don't
put it back in the pool.
@akshayjshah
Copy link
Member

Made a small change to allow the buffers in the pool to grow if they're not large enough; this change looks good to me now. @emcfarlane can you take a look and ensure that it's still ok with you? I'd like to get this into the next release if possible, since we're already planning to release some performance improvements.

envelope.go Outdated
} else {
// The original buffer was sufficiently large for the workload, so we can
// return it to the pool.
defer w.bufferPool.Put(buffer)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshayjshah I don't think this defer should be optional. We always want to defer the fetched buffer.

buffer := w.bufferPool.Get()
defer w.bufferPool.Put(buffer)

Copy link
Member

@akshayjshah akshayjshah Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we? For best performance, we want the buffers in the pool to be large enough to contain the marshaled bytes without resizing (at least most of the time). We know that the buffer we've pulled from the pool was too small for this message. I think we're better off letting the too-small buffer be GC'ed and putting the new buffer into the pool instead, under the assumption that we need the buffers in the pool to grow to accommodate the application workload.

None of this is perfect, but I think we're probably better off pooling over- rather than under-aggressively. The GC is still free to empty the pool on each cycle if it needs to reclaim the space. (Last I looked, I think it just categorically emptied pools...but I haven't checked in many years.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦🏽 @pkwarren pointed out to me that we're calling buffer.Write just a few lines below, which allocates the same amount of memory all over again. That makes returning the original buffer to the pool fine, but also means we're eating allocs that we definitely don't need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling bytes.NewBuffer(largerThanBuf) will cause an allocation like buffer.Write(largerThanBuf) but has the chance for reuse. Maybe it's better to split these two methods to keep the old behaviour

	var buffer *bytes.Buffer
	defer w.bufferPool.Put(buffer)

	switch codec := w.codec.(type) {
	case marshalAppender:
		buffer = w.bufferPool.Get()
		raw, err := codec.MarshalAppend(buffer.Bytes(), message)
		if err != nil {
			return errorf(CodeInternal, "marshal append message: %w", err)
		}
		buffer.Write(raw)
	default:
		raw, err := w.codec.Marshal(message)
		if err != nil {
			return errorf(CodeInternal, "marshal message: %w", err)
		}
		// We can't avoid allocating the byte slice, so we may as well reuse it once
		// we're done with it.
		buffer = bytes.NewBuffer(raw)
	}
	envelope := &envelope{Data: buffer}
	return w.Write(envelope)

Copy link
Member

@akshayjshah akshayjshah Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did something similar-ish in the next commit to more clearly separate the two paths. I think I've gotten the best of both worlds: we avoid allocating if MarshalAppend didn't grow the slice capacity, and we eat a fixed-size allocation instead of an O(N) allocation if it did. Performance improved slightly:

[main] BenchmarkConnect/unary-12                   8054           1318347 ns/op        14414394 B/op        251 allocs/op
[original PR] BenchmarkConnect/unary-12            10192          1176274 ns/op         6122383 B/op        236 allocs/op
[latest] BenchmarkConnect/unary-12                 10448          1153636 ns/op         6115680 B/op        236 allocs/op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@akshayjshah akshayjshah changed the title Support MarshalAppend to reuse bytes slices Reduce marshaling allocations with MarshalAppend Jun 27, 2023
@emcfarlane emcfarlane merged commit 46ff1ed into connectrpc:main Jun 27, 2023
@emcfarlane emcfarlane deleted the append branch June 27, 2023 08:22
renovate bot referenced this pull request in open-feature/flagd Jun 27, 2023
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
|
[github.com/bufbuild/connect-go](https://togithub.com/bufbuild/connect-go)
| require | minor | `v1.8.0` -> `v1.9.0` |

---

### Release Notes

<details>
<summary>bufbuild/connect-go (github.com/bufbuild/connect-go)</summary>

###
[`v1.9.0`](https://togithub.com/bufbuild/connect-go/releases/tag/v1.9.0)

[Compare
Source](https://togithub.com/bufbuild/connect-go/compare/v1.8.0...v1.9.0)

#### What's Changed

Along with a few new features and bugfixes, this release includes a
variety of internal performance improvements.

[v1.8.0] BenchmarkConnect/unary-12 8415 1305116 ns/op 14449031 B/op 254
allocs/op
[v1.9.0] BenchmarkConnect/unary-12 10443 1151366 ns/op 6024079 B/op 236
allocs/op

##### Enhancements

- Allow clients to set Host in headers by
[@&#8203;emcfarlane](https://togithub.com/emcfarlane) in
[https://github.com/bufbuild/connect-go/pull/522](https://togithub.com/bufbuild/connect-go/pull/522)
- Allow Connect clients to configure max HTTP GET size by
[@&#8203;jhump](https://togithub.com/jhump) in
[https://github.com/bufbuild/connect-go/pull/529](https://togithub.com/bufbuild/connect-go/pull/529)
- Reduce allocations in HTTP routing by
[@&#8203;mattrobenolt](https://togithub.com/mattrobenolt) in
[https://github.com/bufbuild/connect-go/pull/519](https://togithub.com/bufbuild/connect-go/pull/519)
- Cache mapping of methods to protocol handlers by
[@&#8203;emcfarlane](https://togithub.com/emcfarlane) in
[https://github.com/bufbuild/connect-go/pull/525](https://togithub.com/bufbuild/connect-go/pull/525)
- Improve performance of percent encoding by
[@&#8203;emcfarlane](https://togithub.com/emcfarlane) in
[https://github.com/bufbuild/connect-go/pull/531](https://togithub.com/bufbuild/connect-go/pull/531)
- Reduce marshaling allocations with MarshalAppend by
[@&#8203;emcfarlane](https://togithub.com/emcfarlane) in
[https://github.com/bufbuild/connect-go/pull/503](https://togithub.com/bufbuild/connect-go/pull/503)

##### Bugfixes

- Discard unknown JSON fields by default by
[@&#8203;akshayjshah](https://togithub.com/akshayjshah) in
[https://github.com/bufbuild/connect-go/pull/518](https://togithub.com/bufbuild/connect-go/pull/518)
- Canonicalize Connect streaming trailer names by
[@&#8203;jchadwick-buf](https://togithub.com/jchadwick-buf) in
[https://github.com/bufbuild/connect-go/pull/528](https://togithub.com/bufbuild/connect-go/pull/528)

##### Other changes

- Tighten internal lint checks by
[@&#8203;zchee](https://togithub.com/zchee) in
[https://github.com/bufbuild/connect-go/pull/520](https://togithub.com/bufbuild/connect-go/pull/520)
- Simplify code when pooled buffers aren't returned by
[@&#8203;pkwarren](https://togithub.com/pkwarren) in
[https://github.com/bufbuild/connect-go/pull/532](https://togithub.com/bufbuild/connect-go/pull/532)

#### New Contributors

- [@&#8203;zchee](https://togithub.com/zchee) made their first
contribution in
[https://github.com/bufbuild/connect-go/pull/520](https://togithub.com/bufbuild/connect-go/pull/520)
- [@&#8203;emcfarlane](https://togithub.com/emcfarlane) made their first
contribution in
[https://github.com/bufbuild/connect-go/pull/522](https://togithub.com/bufbuild/connect-go/pull/522)

**Full Changelog**:
bufbuild/connect-go@v1.8.0...v1.9.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/open-feature/flagd).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xNDEuMyIsInVwZGF0ZWRJblZlciI6IjM1LjE0MS4zIiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
akshayjshah added a commit that referenced this pull request Jul 26, 2023
Use the `MarshalAppend` methods introduced in v1.31 of the protobuf
runtime to more aggressively reuse buffers from Connect's internal pool
(for both binary and JSON). This reduces allocations and substantially
improves throughput:

```
BenchmarkConnect/unary-12                   8054           1318347 ns/op        14414394 B/op        251 allocs/op
BenchmarkConnect/unary-12                 10448           1153636 ns/op         6115680 B/op        236 allocs/op
```

---------

Co-authored-by: Akshay Shah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants