-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: fix DistSender circuit breaker tripped
metrics leak
#121142
kvcoord: fix DistSender circuit breaker tripped
metrics leak
#121142
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though it would be nice to have a test that exercises this logic.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @erikgrinaker)
pkg/kv/kvclient/kvcoord/dist_sender_circuit_breaker.go
line 361 at r1 (raw file):
// closedC is closed when the circuit breaker has been GCed. This will shut // down a running probe, and prevent new probes from launching. closedC chan struct{}
Should we be allocating this channel in newReplicaCircuitBreaker
?
pkg/kv/kvclient/kvcoord/dist_sender_circuit_breaker.go
line 938 at r1 (raw file):
since it would also use an untripped breaker if it arrived after GC
👍
pkg/kv/kvclient/kvcoord/dist_sender_circuit_breaker.go
line 941 at r1 (raw file):
if r.isClosed() { if r.isTripped() { r.breaker.Reset()
It's ok for an EventHandler function to call back into Reset, right? This won't cause any deadlocks?
c91274a
to
60db867
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure, all of this needs testing. I'm going to add some tomorrow, but will likely need a bit more time to add comprehensive testing.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andrewbaptist and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender_circuit_breaker.go
line 361 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we be allocating this channel in
newReplicaCircuitBreaker
?
Yes, forgot to add that.
pkg/kv/kvclient/kvcoord/dist_sender_circuit_breaker.go
line 941 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's ok for an EventHandler function to call back into Reset, right? This won't cause any deadlocks?
Yes, afaict. OnProbeDone
is called without holding the circuit breaker mutex, before the probe is recorded as shut down. This shouldn't be any different from the probe itself calling report(nil)
.
60db867
to
9ef08ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist)
9ef08ce
to
90515f0
Compare
I've seen a couple of CI failures in |
CI is passing now, going for it and catching up in the nightlies. bors r+ |
If a tripped circuit breaker is GCed, the `tripped` metric will consider it tripped forever. This patch untrips the breaker during GC, taking care to properly shut down and synchronize with any concurrent probes to avoid metrics leaks. Epic: none Release note: None
90515f0
to
91e7278
Compare
Canceled. |
bors r+ |
If a tripped circuit breaker is GCed, the
tripped
metric will consider it tripped forever. This patch untrips the breaker during GC, taking care to properly shut down and synchronize with any concurrent probes to avoid metrics leaks.Resolves #121030.
Epic: none
Release note: None