Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul: avoid triggering unnecessary sync when removing workload #10857

Merged
merged 1 commit into from
Jul 7, 2021

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Jul 6, 2021

There are bits of logic in callers of RemoveWorkload on group/task
cleanup hooks which call RemoveWorkload with the "Canary" version
of the workload, in case the alloc is marked as a Canary. This logic
triggers an extra sync with Consul, and also doesn't do the intended
behavior - for which no special casing is necessary anyway. When the
workload is marked for removal, all associated services and checks
will be removed regardless of the Canary status, because the service
and check IDs do not incorporate the canary-ness in the first place.

The only place where canary-ness matters is when updating a workload,
where we need to compute the hash of the services and checks to determine
whether they have been modified, the Canary flag of which is a part of
the hash function.

Fixes #10842

There are bits of logic in callers of RemoveWorkload on group/task
cleanup hooks which call RemoveWorkload with the "Canary" version
of the workload, in case the alloc is marked as a Canary. This logic
triggers an extra sync with Consul, and also doesn't do the intended
behavior - for which no special casing is necessary anyway. When the
workload is marked for removal, all associated services and checks
will be removed regardless of the Canary status, because the service
and check IDs do not incorporate the canary-ness in the first place.

The only place where canary-ness matters is when updating a workload,
where we need to compute the hash of the services and checks to determine
whether they have been modified, the Canary flag of which is a part of
that.

Fixes #10842
@shoenig
Copy link
Member Author

shoenig commented Jul 6, 2021

I'm fairly sure about the reasoning here, but all the tests are oriented around the number of operations rather than the content of the operations, so it's hard to understand what the original intent was.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm fairly sure about the reasoning here, but all the tests are oriented around the number of operations rather than the content of the operations, so it's hard to understand what the original intent was.

A good chunk of these tests do check the Op field, which is most of what we're interested in here. It might be worth extending what we've done in groupservice_hook_test and task_runner_test to the tests that don't... what we're doing in alloc_runner_unix_test seems especially silly given that we're not even saving lines of code because we have what could be an assertion as a comment. 😀

I checked git-blame and found that a lot of these tests got fleshed out when I fixed check restart in 760bb97. It wasn't all that clear to me why the counts were what they were, which is a shame because if I'd spent a bit more time digging into that I might have discovered this bug earlier.

@shoenig shoenig merged commit 5e0ffb9 into main Jul 7, 2021
@shoenig shoenig deleted the b-rm-canarys branch July 7, 2021 14:47
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ineffective handling of canary edge case on service de-registration
3 participants