-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service registration idempotence regarding HealthChecks #4903
Comments
Seems reasonable, I've labelled it so we can discuss with team at next available opportunity. |
I am not the smartest so I am an advocate for simplicity, when I read |
@gertcuykens agree, but I do know that some people rely on this behavior (by inserting some checks outside service definition and applying service definition on each chef run). So not doing that way will break some users. We could however recommend using this query parameter in the documentation God built Earth in 7 days between there were no legacy ;) |
Can we settle on a version bump and the reverse |
Sadly version bumps on our API are way to drastic a thing to get around minor breaking changes like this. A version bump that changes the URL basically breaks every Consul client in the world and causes massive upgrade pain. We appreciate how ugly it is trying to maintain "legacy" behaviour - there is tons of it in Consul as something that's been around a while, but any change that breaks existing clients effectively prevents many of our largest users from being able to upgrade at all and leaves them stranded on older versions so we have to be extremely pragmatic with our approach. If you mean version bump Consul that's even worse - even if we make breaking changes we still have to give loads of warning of the change and ensure that the upgrade is smooth so old clients can still use new servers etc without any semantics being confused etc. How would one roll out an upgrade of all clients if they have a breaking API change? You can't upgrade the clients first since then they will do the wrong thing against old API and you can't upgrade the consul clients first for same reason etc. That said, we all agreed that this was unpleasant and that just "fixing" it might be an option - expecting Check append behaviour is pretty odd and not something we ever documented. BUT there is a major use-case that is relied upon all the time by existing tooling that would be broken if we change the default behaviour here. In many cases people register services with all their checks etc either via Config or API, and then use this endpoint from some totally separate automation to do things like update the service tags or metadata. When they do that, they rely on the fact that when they PUT a change to the service with it's new tags, they don't need to perfectly redefine all it's Checks too. They just leave those blank and it doesn't update them. If we change the default then all such integrations would remove all their checks on next tag update! We could special case empty checks but then it still isn't really idempotent for Chef as there would be no way to remove all checks. I do agree @gertcuykens that |
Then I suggest we leave current api v1 behavior, document that part better and use github I strongly believe we should follow docker and kubernetes version / config filosophie here as in As far as i can tell you will not find any https://docs.docker.com/registry/spec/api/ Also who said a tool can't mix v1 and v2 api? Kubernetes does it all the time. Maybe we can introduce a api level in our consul config files also where the default is v1 or do we have that already? |
@gertcuykens thanks for the input, I agree those are potential options. In general HashiCorp tools don't always make the same choices as others like Docker and Kubernetes when it comes to versioning. Partly because there is more than one way to organise, partly because some of them pre-dated other popular projects. It's a hard problem in general because it is much more than a technical decision -- it's super important that we evolve our software in ways that our current users expect and that cause a minimum of disruption and extra work for them to do. Having HashiCorp tools be somewhat consistent with themselves historically in how they handle backwards incompatibility and consistent with each other is probably more important that doing the same thing as Kubernetes for example. We could add a /v2/ but every time we "bump" version like that it requires a massive amount of work to rewrite all clients that are using it. For a minor "bug" like this that isn't affecting anyone much for the last 4 years it seems, it's hard to justify asking people to change stuff. Sure both can coexist a while and would have to but that increases maintenance burden. Imagine ending up in a situation where we have v1,2,3,4 because each one only changed one minor bug with backwards compatibility ramifications in one endpoint. That would be a nightmare to maintain and document as well as horrible for clients to keep up with. In general URL versioning is a pretty heated topic (Google API versioning if you're interested in some viewpoints!) but most people agree it's pretty horrible to change the URL of every resource in your API to introduce a single breaking change somewhere. That said, adding an query param might not be the best way either - there is a spectrum of possible solutions between that and making a whole new /v2/ and changing every single API endpoint and client that exists! We'll consider the simplest way that will work least badly for users 😄. |
We just had a chance to discuss this and aren't decided but I wanted to take some notes:
Thanks! |
@pearkes thx for discussing this :) Tags and meta are replaced when re-registering a service, only checks are left behind. As for the param, another option would be to introduce a new config file param like |
@Aestek I am not a fan of the agent config, or in that case, it should be possible to put: replace_checks=0 if default behavior is set to 1. We currently have some agents (I mean, now, only a single case known to me) were one our beloved users did mix behaviors (service created by chef, additional healthcheck added later by a script) |
I have a question. If so, why not "fail" only if the re-registered health check actually fails ? ( retain the state of the old content until the result of the new content is available ) |
@glakshmeepathi no it wont fail momentarily, the state (passing, warning...) will stay the same until the check is re-ran with the new parameters. |
Fixes point `#2` of: #4903 When registering a service each healthcheck status is saved and restored (https://github.com/hashicorp/consul/blob/master/agent/agent.go#L1914) to avoid unnecessary flaps in health state. This change extends this feature to single check registration by moving this protection in `AddCheck()` so that both `PUT /v1/agent/service/register` and `PUT /v1/agent/check/register` behave in the same idempotent way. #### Steps to reproduce 1. Register a check : ``` curl -X PUT \ http://127.0.0.1:8500/v1/agent/check/register \ -H 'Content-Type: application/json' \ -d '{ "Name": "my_check", "ServiceID": "srv", "Interval": "10s", "Args": ["true"] }' ``` 2. The check will initialize and change to `passing` 3. Run the same request again 4. The check status will quickly go from `critical` to `passing` (the delay for this transission is determined by https://github.com/hashicorp/consul/blob/master/agent/checks/check.go#L95)
Having read the discussion it seems to me that we want to support idempotent service registrations. I think a parameter is the best way to do it for now, since we don't want to start api I think the parameter name Other than that I think #4905 is good to go. |
Hey there, Feel free to check out the community forum as well! |
Hey there, This issue has been automatically closed because there hasn't been any activity for at least 90 days. If you are still experiencing problems, or still have questions, feel free to open a new one 👍 |
Stale bot closed this but AFAICT it was actually fixed by #4905 and release in Consul 1.6.1. |
Hey there, This issue has been automatically locked because it is closed and there hasn't been any activity for at least 30 days. If you are still experiencing problems, or still have questions, feel free to open a new one 👍. |
Issue with Service registration and checks
When registering several times a service on a local Agent using
/agent/service/register
, if checks are removed, those checks are not removed from the service.So, if at t0, we register service redis with
checks:[ check0, check1]
and then, at t1 we registerchecks:[check2]
, the service will have the following services:redis: [check0, check1, check2]
Ideally, in a RESTfull way, it should be
redis: [check2]
It means the PUT registration is not idempotent. This is a bit problematic sometimes.
However, it probably means that some people are relying on this behavior, so changing the behaviour might break clients.
We thus propose to add an optional parameter for
/agent/service/register?idempotent=true
to enable this idempotent behavior.@Aestek will soon provide a patch.
Issue with checks registration
Another bug can be found in
/agent/check/register
: when registering several times (in a idempotent way) a check, the state will override the state of check. Consider this:Register check1 on Service2
At first run, the check will be initialized => Ok
At second registration time, the check (which was passing) will be reset to Critical (default value from registration) and will flap.
@Aestek will also propose another patch to address this: if a check already exists and is not modified, do not change the current status of the check.
Fixing those 2 issues will ease deployment of services using idempotent systems such as chef.
@pearkes Would you consider this for inclusion in upstream?
Kind Regards
The text was updated successfully, but these errors were encountered: