-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deployment watcher: Reuse allocsCh if allocIndex remains the same #10756
Conversation
@@ -531,6 +532,8 @@ FAIL: | |||
if res.createEval || len(res.allowReplacements) != 0 { | |||
w.createBatchedUpdate(res.allowReplacements, allocIndex) | |||
} | |||
|
|||
allocsCh = w.getAllocsCh(allocIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this line is immediately after the allocIndex
but I wanted to avoid creating a query in error or failure conditions. Though, given that those conditions are relatively rare, I can be persuaded to move it if it improves readability.
@@ -487,7 +488,7 @@ FAIL: | |||
break FAIL | |||
} | |||
|
|||
case updates = <-w.getAllocsCh(allocIndex): | |||
case updates = <-allocsCh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to effectively test the side-effects of this change :(.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Looking back through the code this looks like it's been around since early in the history of deployments once progress deadline was introduced. You've left a lot of great context here in the PR description; I'd recommend moving a lot of it into the commit message so that it's right there with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Tim Gross <[email protected]>
…0756) Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries. `deploymentWatcher.getAllocsCh` creates a new goroutine that makes a blocking query to fetch updates of deployment allocs. ## Background When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through [`max_parallel`](https://www.nomadproject.io/docs/job-specification/update#max_parallel) and health checks configurations. The `scheduler` and `deploymentwatcher` package collaborate to achieve deployment logic: The scheduler only places the canaries and `max_parallel` allocations for a new deployment; the `deploymentwatcher` monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the next `max_parallel` round of allocations. The `deploymentwatcher` package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. If `deploymentwatcher` fails or is hindered from fetching the state, the deployments fail to make progress. `Deploymentwatcher` logic only runs on the leader. ## Why unnecessary deployment watchers can halt cluster progress Previously, `getAllocsCh` is called on every for loop iteration in `deploymentWatcher.watch()` function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, *all* `deploymentWatcher`s get notified through `w.deploymentUpdateCh`. The `getAllocsCh` goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes: 1. it spikes the CPU/Memory usage resulting potentially leading to OOM or very slow processing 2. it activates the [query rate limiter](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L896-L898), so later the watcher fails to get updates and consequently fails to make progress towards placing new allocations for the deployment! So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter. Well, until a spike of deployments are created triggering the condition again. ### Relevant Code References Path for deployment monitoring: * [`Watcher.watchDeployments`](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L164-L192) loops waiting for deployment updates. * On every deployment update, [`w.getDeploys`](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L194-L229) returns all deployments in the system * `watchDeployments` calls `w.add(d)` on every active deployment * which in turns, [updates existing watcher if one is found](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L251-L255). * The deployment watcher [updates local local deployment field and trigger `deploymentUpdateCh` channel]( https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L136-L147) * The [deployment watcher `deploymentUpdateCh` selector is activated](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L455-L489). Most of the time the selector clause is a no-op, because the flow was triggered due to another deployment update * The `watch` for-loop iterates again and in the previous code we create yet another goroutine and blocking call that risks being rate limited. Co-authored-by: Tim Gross <[email protected]>
Nice |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries.
deploymentWatcher.getAllocsCh
creates a new goroutine that makes a blocking query to fetch updates of deployment allocs.Background
When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through
max_parallel
and health checks configurations.The
scheduler
anddeploymentwatcher
package collaborate to achieve deployment logic: The scheduler only places the canaries andmax_parallel
allocations for a new deployment; thedeploymentwatcher
monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the nextmax_parallel
round of allocations.The
deploymentwatcher
package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. Ifdeploymentwatcher
fails or is hindered from fetching the state, the deployments fail to make progress.Deploymentwatcher
logic only runs on the leader.Why unnecessary deployment watchers can halt cluster progress
Previously,
getAllocsCh
is called on every for loop iteration indeploymentWatcher.watch()
function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, alldeploymentWatcher
s get notified throughw.deploymentUpdateCh
. ThegetAllocsCh
goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes:So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter. Well, until a spike of deployments are created triggering the condition again.
Relevant Code References
Path for deployment monitoring:
Watcher.watchDeployments
loops waiting for deployment updates.w.getDeploys
returns all deployments in the systemwatchDeployments
callsw.add(d)
on every active deploymentdeploymentUpdateCh
channeldeploymentUpdateCh
selector is activated. Most of the time the selector clause is a no-op, because the flow was triggered due to another deployment updatewatch
for-loop iterates again and in the previous code we create yet another goroutine and blocking call that risks being rate limited.Next Steps: Examine Rate Limiter
We ought to examine the rate limiter. The original deployments PR introduced
queryLimiter
, but its benefits are unclear at this point and I'm not sure if token based rate limiting makes sense here.The original limit is 1000 new-queries/second. #10706 make the flag configurable. I'd recommend operators to keep that value in the same order of magnitude as maximum number of running deployments they expect to have.