-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IMPROVED] Reuse buffers in Conn.processOpError #341
Conversation
This significantly reduces memory usage when the nats Conn enters an error loop.
Oops, this commit (nats-io/go-nats@5406b8c: nats: simplify error check in Conn.connect) briefly leaked into this PR on account of it being on my master branch - it has been removed so that only nats-io/go-nats@e23ea03 is part of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need a bit more investigation and regardless, if we do a change here, we need to take into consideration the ReconnectBufSize value.
nc.bw = bufio.NewWriterSize(nc.pending, nc.Opts.ReconnectBufSize) | ||
// Reset pending buffers before reconnecting. | ||
if nc.pending == nil { | ||
nc.pending = new(bytes.Buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not account for nc.Opts.ReconnectBufSize
, which is worrisome that tests pass, which means that this is not properly tested.
More importantly, in which situation would you get this called so many times and so fast that the GC would not collect those buffers? Maybe we need to understand better why this is a problem in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kozlovic thanks for the feedback!
The tests likely passed because bufio.NewWriterSize(W, N)
does nothing to limit the amount of data written io.Writer
W - N here is only used to size the internal buffer. Properly limiting the amount of data writted to W would need to be handled by W - implementing a LimitedWriter
like io.LimitedReader and wrapping the buffer passed to nc.pending
would work - this would likely require updating the error handling around reconnects.
The reason this was spinning so hard and generating loads of garbage was that we use ginkgo for our tests and it encourages some nasty interesting patterns - like spinning up a nats
listener for each test irregardless of whether it has something to connect to - it is spun down at the end of each test, but who knows what the connection state is then. Because of the nastiness ginkgo introduces it does tend to highlight situations where applications do not perform well when error'ing.
That said, this is an edge case that occurs only when error'ing in a tight loop in test code, but an 8Mb memory allocation per reconnect is still a pretty heavy hit (regardless of the efficacy of Go's GC).
Thanks again for reviewing and please let me know if there is anything I may do to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the underlying bytes buffer is never limiting the amount of data that can be written. We actually do check for length in the publish()
call to enforce that limit (this is to avoid unbound growth).
Let me investigate a bit more to see why the original code would need so much memory, but otherwise I think your changes are good. Thanks!
Sorry for the delay. @derekcollison would you want to have a look? I think the change is ok. |
@charlievieth Would you want to have a go at what @derekcollison commented on? If you don't have the time, I can try to update the PR (or we can merge and then I will submit a new PR). Let me know. Thanks! |
Let's merge and have a quick follow-on PR. Thanks @charlievieth, much appreciated. |
Thanks for taking the PR! |
@charlievieth You are welcomed. But I had another look and something is not quite right. I don't see how By the way, the original code would create a |
@kozlovic Don't have a ton of time to look at this, while it's certainly possible that Regarding, your second point I don't see how the size of Or to put it more tersely, changing the size of |
If it panic, that's the trace I would want to see: how we get to processOpErr with a non nil nc.pending.
True. But this is so that the rest of the library behaves the same regardless if we are connected or not. I agree that the size of
Changing where? Are you saying in original code as opposed to what you did in the PR? |
Disregard the first part about |
I was referring to this PR. |
Again, thanks for taking the time to look at this stuff. |
This significantly reduces memory usage when the nats Conn enters an error loop.
In a Cloud Foundry test (route-emitter/cmd/route-emitter) the
Conn.processOpError
was responsible for ~95% of memory allocations. This was due toConn.processOpError
allocating a newbufio.Writer
each time it was called. This commit changesConn.processOpError
to reuse and reset its existingbufio.Writer
.Below is the output from
go tool pprof
implicatingConn.processOpError
:Output of
go tool pprof
after applying this patch (note the absence ofnats-io
and overall reduction in memory):I would have provided the output of running the benchmarks in nats-io/go-nats/test, but they do not appear to hit this code path. That said, I'd be happy to send them if you think they'd be useful.