Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes a bug where an allocation is considered healthy if some of the tasks are being restarted and as such, their checks aren't tracked by consul agent client. The underlying problem is that allocation registration in consul agent/client code is mutable: tasks get removed as services from consul, prior to stopping/restarting to allow for graceful removal from LBs. The downside is that the health tracker may consider the allocation as healthy if one of the task is down. This uses the simplest approach to patch the problem by detecting the number of expected checks against the registered checks. I don't anticipate disrepency of counters. `sreg.Checks` should only contain checks that nomad agent explicitly registered and filter out unexpected or unrelated checks: https://github.com/hashicorp/nomad/blob/0ecda992317d3300e1c1da05170f8bba18410357/command/agent/consul/client.go#L1138-L1147 . A better approach would have been to strictly compare the found check IDs against an immutable list of expected IDs. This sadly requires significant code changes both to task runner service hooks and consul hooks, that I'm not comfortable so close to cutting a new release.
- Loading branch information