-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to drain endpoints before removal #7218
Comments
I think I understand what you are after, but I'm not sure how it can be implemented cleanly. How would Envoy know that a session is OK to keep sending to an endpoint? Do you expect this to be somehow encoded in the cookie? |
@mattklein123 thanks for the prompt reply ;) I think the cookie does not need to have any information on the upcoming drain, I'm expecting a way to signal envoy of this drain in the configuration somewhere. I think the best I can explain this is the way is quoting how Nginx drains sticky sessions (more at https://www.nginx.com/blog/nginx-plus-backend-upgrades-individual-servers#server-api-persistence)
So, from what I understand, Envoy could do this 2 ways -
More reading
|
I just tried separating the draining endpoints and healthy endpoints into 2 different clusters, and then pointed the route to both of them via
|
I'm sorry I don't fully understand what you are after. Session stickiness is supported effectively via hashing of an incoming request cookie. How can Envoy understand whether to continue to route existing sessions without some state in the cookie itself? Do you mean that you want users with existing cookies to keep routing, but no new cookies to be issued that target that upstream? |
@mattklein123 Yep, exactly 👍 I want to drain an endpoint/upstream and I want to give it 1 hour to drain. In that one hour, I want already existing sticky sessions to keep routing to that very upstream, but I don't want any newer sticky sessions to be issued to that upstream. |
OK I think I understand what you are after. Right now the cookie hashing policies just effectively store a hash value in the cookie and are meant for use with a consistent hashing load balancer. When a host is failed, the rest of the hosts will rehash. There is no actual state stored in the cookie itself. I think we could support something like this but it would take a bunch of thinking on how to do it. @alyssawilk any thoughts on this one given the existing cookie support? |
I'm not terribly familiar with Envoy upstream selection. I also kind of see how this would work for H2, where you can check connection lifetime vs drain time if we move a bunch of APIs around but I'm less convinced it works well for HTTP/1 given that you may have a preexisting pipeline connection and then prefetch more, and have inconsistent state? I think rather than base this on connection lifetime it might make sense to do some logic around cookie max-age, where if the client connection were in continued use Envoy could continue extending the max-age and if the client went idle the cookie would expire. Then it's just a question of having a state where your backend selection always uses the hash when the cookie is present, and only selects from non-draining backend when the cookie is not present? |
After kicking this around at Datawire for a bit, we think that it would work to add a way to mark a given upstream as "blacklisted for new hashes only":
Does that sound like a reasonable approach? |
Conceptually it would work, but I think the implementation will be complex, as then we need a per-host draining concept (which I guess we have today but I would need to refresh the details). Worse, and this is where I would really have to think it through, I'm not convinced this works well at all if the upstream hosts are stable and things rehash. Are we assuming upstream stability? Even if we assume upstream stability, once a host drains and then goes away, 1/N approximately are going to rehash again anyway so don't you still need to support migration? |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions. |
@mattklein123 @alyssawilk can we mark this as 'help wanted'? |
@mattklein123 this isn't a concern now with sessionFilter |
Title: Unable to drain endpoints before removal
Description:
I'm trying to implement endpoint connection draining (and not draining envoy itself) with sticky sessions configured.
At t=0, there are 5 endpoints configured in my envoy cluster with cookie based sticky sessions configured.
At t=10, I wish to drain 1 out of the 5 endpoints I have i.e. I do not want any new connections to be routed to that endpoint, but I want the current sessions to persist before I remove the endpoint from the list.
In order to drain an endpoint, I've tried 2 approaches -
Set
'health_status': 'DRAINING'
in the endpoint (See https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/health_check.proto#envoy-api-enum-core-healthstatus)This takes care of not sending any new connections to the endpoint, but does not allow older sticky sessions to persist and they're broken off.
I got leads for this from Feature request: a means to refuse subsequent TCP connections while allowing current connections enough time to drain #2920 (comment), but doesn't seem to work 🤷♂️
Set
load_balancing_weight: 1
in the endpoint in the hope that a lower weight will not route any new requests to the endpoint but the sticky sessions will persist, but that didn't happen.So, what am I missing here? Any pointers are appreciated! Thanks :)
Config:
This is how the endpoint configuration looks when I'm trying to drain it -
The text was updated successfully, but these errors were encountered: