-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consul: avoid triggering unnecessary sync when removing workload #10857
Conversation
There are bits of logic in callers of RemoveWorkload on group/task cleanup hooks which call RemoveWorkload with the "Canary" version of the workload, in case the alloc is marked as a Canary. This logic triggers an extra sync with Consul, and also doesn't do the intended behavior - for which no special casing is necessary anyway. When the workload is marked for removal, all associated services and checks will be removed regardless of the Canary status, because the service and check IDs do not incorporate the canary-ness in the first place. The only place where canary-ness matters is when updating a workload, where we need to compute the hash of the services and checks to determine whether they have been modified, the Canary flag of which is a part of that. Fixes #10842
I'm fairly sure about the reasoning here, but all the tests are oriented around the number of operations rather than the content of the operations, so it's hard to understand what the original intent was. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm fairly sure about the reasoning here, but all the tests are oriented around the number of operations rather than the content of the operations, so it's hard to understand what the original intent was.
A good chunk of these tests do check the Op
field, which is most of what we're interested in here. It might be worth extending what we've done in groupservice_hook_test
and task_runner_test
to the tests that don't... what we're doing in alloc_runner_unix_test
seems especially silly given that we're not even saving lines of code because we have what could be an assertion as a comment. 😀
I checked git-blame
and found that a lot of these tests got fleshed out when I fixed check restart in 760bb97. It wasn't all that clear to me why the counts were what they were, which is a shame because if I'd spent a bit more time digging into that I might have discovered this bug earlier.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
There are bits of logic in callers of RemoveWorkload on group/task
cleanup hooks which call RemoveWorkload with the "Canary" version
of the workload, in case the alloc is marked as a Canary. This logic
triggers an extra sync with Consul, and also doesn't do the intended
behavior - for which no special casing is necessary anyway. When the
workload is marked for removal, all associated services and checks
will be removed regardless of the Canary status, because the service
and check IDs do not incorporate the canary-ness in the first place.
The only place where canary-ness matters is when updating a workload,
where we need to compute the hash of the services and checks to determine
whether they have been modified, the Canary flag of which is a part of
the hash function.
Fixes #10842