-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deployment watcher: fix goroutine leak when job is purged #20348
Conversation
The deployment watcher on the leader makes blocking queries to detect when the set of active deployments changes. It takes the resulting list of deployments and adds or removes watchers based on whether the deployment is active. But when a job is purged, the deployment will be deleted. This unblocks the query but the query result only shows the remaining deployments. When the query unblocks, ensure that all active watchers have a corresponding deployment in state. If not, remove the watcher so that the goroutine stops. Fixes: #19988
0c69e47
to
004fb71
Compare
@@ -191,6 +190,21 @@ func (w *Watcher) watchDeployments(ctx context.Context) { | |||
w.remove(d) | |||
} | |||
} | |||
|
|||
// Ensure we're not tracking deployments that have been deleted because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you check for the number of watchers and number of deployments, you could avoid always checking even when there are no changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. That also made me realize that we're not locking the read on w.watchers
here, so I need to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can't optimize this with a length check after all. Imagine the following scenario:
- watchers for deployment A and B exist, len(deployments) == 2, len(watchers) == 2
- A gets purged, B completes (is no longer active), C gets added
- after we call
add
/remove
, we have len(deployments) == 2, len(watchers) == 2 - so now checking the lengths would be equal and we'd skip removing A
I've pulled this block of code out to a function to make the locking nicer, and I've added a note about why we can't optimize it with the length check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The deployment watcher on the leader makes blocking queries to detect when the set of active deployments changes. It takes the resulting list of deployments and adds or removes watchers based on whether the deployment is active. But when a job is purged, the deployment will be deleted. This unblocks the query but the query result only shows the remaining deployments. When the query unblocks, ensure that all active watchers have a corresponding deployment in state. If not, remove the watcher so that the goroutine stops. Fixes: #19988
…#20348) (#20359) The deployment watcher on the leader makes blocking queries to detect when the set of active deployments changes. It takes the resulting list of deployments and adds or removes watchers based on whether the deployment is active. But when a job is purged, the deployment will be deleted. This unblocks the query but the query result only shows the remaining deployments. When the query unblocks, ensure that all active watchers have a corresponding deployment in state. If not, remove the watcher so that the goroutine stops. Fixes: #19988 Co-authored-by: Tim Gross <[email protected]>
The test added in #20348 for the deployment watcher was backported to 1.5.x but is hitting a compilation error because the signature of a state store method was changed.
The test added in #20348 for the deployment watcher was backported to 1.5.x but is hitting a compilation error because the signature of a state store method was changed.
The deployment watcher on the leader makes blocking queries to detect when the set of active deployments changes. It takes the resulting list of deployments and adds or removes watchers based on whether the deployment is active. But when a job is purged, the deployment will be deleted. This unblocks the query but the query result only shows the remaining deployments. When the query unblocks, ensure that all active watchers have a corresponding deployment in state. If not, remove the watcher so that the goroutine stops. Fixes: #19988
The deployment watcher on the leader makes blocking queries to detect when the set of active deployments changes. It takes the resulting list of deployments and adds or removes watchers based on whether the deployment is active. But when a job is purged, the deployment will be deleted. This unblocks the query but the query result only shows the remaining deployments.
When the query unblocks, ensure that all active watchers have a corresponding deployment in state. If not, remove the watcher so that the goroutine stops.
Fixes: #19988