-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUGFIX #7374 When a service does not exists in an alias, consider it failing #7384
BUGFIX #7374 When a service does not exists in an alias, consider it failing #7384
Conversation
f42227c
to
c9b5759
Compare
@pierresouchay I dont develop in so I dont understand it deeply. But I see checking for service existence so I think it would fix my problem. |
@ShimmerGlass review? |
b83aba2
to
20f0ebf
Compare
20f0ebf
to
7591ca5
Compare
…sider it failing In current implementation of Consul, check alias cannot determine if a service exists or not. Because a service without any check is semantically considered as passing, so when no healthchecks are found for an agent, the check was considered as passing. But this make little sense as the current implementation does not make any difference between: * a non-existing service (passing) * a service without any check (passing as well) In order to make it work, we have to ensure that when a check did not find any healthcheck, the service does indeed exists. If it does not, lets consider the check as failing.
6f52904
to
87d5f73
Compare
I thought about the problem and I think you are right. This feature was introduced with #4320:
With that in mind, a service should also not be healthy if there is no sidecar proxy! Now that I have made up my mind, I will take time to review your PR soon. |
@i0rek Thank you. This feature is actually great because it would allow us renaming services easily (the old service name would be an alias to the new one). But Allowing to have a dangling link to a service is a showstopper for us. |
The implementation makes extra requests to determine if a service exists. This shouldn't be necessary because there is already a request to the servers to BUT Do you have other ideas? I will also discuss this internally. |
@i0rek Yes, I know the implementation is suboptimal, but the new request is Done ONLY when no check is found on service. Some time ago, we implemented this: #3551 It allows to check for a service from its name (returns a collection) or by its ID, so you can:
This is almost Ok, but has a big drawback: the agent itself is responding to the request (we used this for our Load-Balancers for instance, but because we know that our agents are running everywhere and no ACL on Consul's port where present), so in some environments where the agent HTTP is not enabled (or network ACLs are present), it would not work. We could, however, implement something similar in health. Regarding the return code, it really depends of what you want: if your query the service by its name, IMHO, to follow Consul usual patterns, we might return empty collection, but if we add an endpoint in querying by ID, 404 might be suitable. Regarding Health.ServiceChecks => if it would return all HealthChecks for a given service, we already have this info in /v1/health/service/ => the big downside is that the output can be huge - especially on large services but it includes both node and service checks. There is also an alternative was by using I already thought about those alternatives while writing the PR, but I wanted to keep it as small as possible |
|
The filtering could help actually:
=> if [] is returned => Service does not exists The big advantage is that the call returns only the node we target, not the 1500 others (we are in this case). Only problem: I tried this at first when I didd the PR, but IIRC, we don't know the ServiceName we are looking for! |
Yeah, we don't have the service name. :( |
The simpler would probably to expose /v1/health/services-by-node/ (maybe with filter on serviceID) and return same kind of output as /v1/health/service/ When I worked on PR, that's was my best solution I could think of, but it is far more work than just checking on existence of service when there is doubt as I did here (and to be fair, it is easier for me, because I enforce all services to have at least 1 health check, so I would not pay the price that much in my infrastructure). |
@i0rek Do you think this additional call is a blocker for now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Ideally the extra call is not needed with a better RPC endpoint for this type of problem. But since I don't have time to implement it now, I think it is better to error on the side of correctness. Which is why I think this is a good change with further potential for improvement.
Thanks @pierresouchay!
…ng (#7384) In current implementation of Consul, check alias cannot determine if a service exists or not. Because a service without any check is semantically considered as passing, so when no healthchecks are found for an agent, the check was considered as passing. But this make little sense as the current implementation does not make any difference between: * a non-existing service (passing) * a service without any check (passing as well) In order to make it work, we have to ensure that when a check did not find any healthcheck, the service does indeed exists. If it does not, lets consider the check as failing.
Previously we were setting the alias_service field of alias checks to the service name instead of the service id. This check would then pass because Consul would consider an alias for a non-existent service id to be okay. With hashicorp/consul#7384, now the check will fail if the service id doesn't exist and so those connect services will be considered unhealthy and be unroutable.
This fixes a unit test failure over in enterprise due to #7384
In current implementation of Consul, check alias cannot determine if a service exists or not. Because a service without any check is semantically considered as passing, so when no healthchecks are found for an agent, the check was considered as passing.
But this make little sense as the current implementation does not make any difference between:
In order to make it work, we have to ensure that when a check did not find any healthcheck, the service does indeed exists. If it does not, lets consider the check as failing.
It should fix #7374