-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quarkus + Stork/consul round-robin service discovery cache expiration over nodes that are down (Baremetal) #24343
Comments
So you'd like Stork to remove a service instance from the ones the client tries if the instance is not available? |
I think that makes sense to check if the service is ready before adding this service to the pool of services, and if is not ready then remove this instance from the pool. By this way, we will avoid Current behavior:
In the end, all requests will succeed, but the price is too high (in terms of request and load to the available node). Also, maybe the service is up...but in another k8s node, because...was moved to another node (AWS spot instances or ephemeral nodes(cloud)). In those cases, the new service in another node is going to be registered again, but the old one is going to remain also in the pool. If by configuration could be possible to let Quarkus/stork know where the readiness URL is located, then this config could be used by Quarkus/stork when a service is added or |
It is out of scope of Stork to perform heartbeat for service instances. Also, it is possible for a load balancer add an instnace to some kind of block list after a failure. Out of existing load balancers, the least-response-time one is closest to doing it but it just treats failures as a very long response time, so that less requests are directed to such an instance. Does this answer your doubts? |
Describe the bug
QuarkusVersion: 2.7.4.Final
Reproducer: quarkus-qe/quarkus-test-suite#572
cmd:
mvn clean verify -Dall-modules -pl service-discovery/stork-consul -Dit.test=StorkServiceDiscoveryIT#storkLoadBalancerServiceNodeDown
Even if Qaurkus/stork is not fault-tolerant and doesn't "detect" that a service node is down, there is a cache expiration property
stork.pong-replica.service-discovery.refresh-period
that in combination with a "retry" policy could do the "job". However, if a node is down and the cache has already expired, the Stork load balancer still dispatching requests to those nodes.Expected behavior
If a service node is down and cache expiration time has exceeded, I expected that Quarkus/stork only add a configuration into the cache if the service node is up and ready (maybe by calling to
/q/health/ready
)Actual behavior
A service node that is down is in Stork as an available node even if the stork cache has expired.
How to Reproduce?
Reproducer: quarkus-qe/quarkus-test-suite#572
cmd:
mvn clean verify -Dall-modules -pl service-discovery/stork-consul -Dit.test=StorkServiceDiscoveryIT#storkLoadBalancerServiceNodeDown
Output of
uname -a
orver
No response
Output of
java -version
No response
GraalVM version (if different from Java)
No response
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response
The text was updated successfully, but these errors were encountered: