-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: a means to refuse subsequent TCP connections while allowing current connections enough time to drain #2920
Comments
cc @rshriram |
Yeah I think we can probably add listener drain configuration to better determine what happens when a listener enters drain mode. I.e., an option can be to just start refusing connections immediately when draining starts. |
In Approach 2. above, typically a different signal is used for a graceful stop, such as SIGHUP. SIGTERM well-known behavior should not be altered. |
Note also in the observations for 2. above, that keeping the listener alive after SIGTERM is clearly erroneous, a bug in its own right; that is unrelated to the desire for a TCP drain functionality. |
That may be a formatting oversight in the issue. I'm pretty sure Envoy just quits on SIGTERM. Let me fix that in the issue description. |
Now, Envoy support graceful shutdown? |
To sum up my research to date, and of the three pathways above aught to work, but none will, due to the underlying libevent listener implementation in 2.1.8 released Feb of '17. It appears deliberate that disabling a listener simply removes the socket from the poll queue, collecting connections against the backq, until then listener is re-enabled. Work on the listener implementation has been merged to the 2.2.0 dev master, so I am working on a minimal repro case to discuss as a libevent ticket, solicit workarounds effective for 2.0/2.1/2.2, and determine whether this represents a defect or enhancement. |
Can you elaborate on what issues you're having with libevent? I'm working on implementing something similar right now. I'm trying to solve a slightly different problem, but it has a lot of overlap. But I have envoy closing its listener (as reported by netstat). |
See #3307. Would that resolve this issue for you? |
@ggreenway Yes, thank you. In the future, we may want a means for certain listeners to opt-out of this behavior. For example, in a sidecar proxy scenario, during the shutdown & drain time, we'd want to close the ingress listeners but leave any egress listeners open until the very end. But we can wait on that enhancement. For now, #3307 would be great. |
@ggreenway @rosenhouse note that we already support selective per-listener drain options via https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/lds.proto#L121 (and use it for same reasons at Lyft). @ggreenway please also consider ^ when you are looking into the final solution. |
@rosenhouse For clarity, in your ideal scenario, can you write a timeline (or ordered list) of when ingress and egress listeners would be drained and closed? |
@emalm does that look right? |
Thanks |
Thanks, @rosenhouse, that sequence looks right to me! |
Not sure if this is important to you or not, but 1 caveat to keep in mind: if you have no way to tell your clients to stop connecting before you close listeners (to cause TCP clients to see connection refused), the listener may have some connections in the listen() backlog when the listener is closed. There is no race-free way to avoid this with the socket API. Those connections will get RST, but they may have already sent some request data. |
@ggreenway interesting, I wasn't aware of that. Do you have any references you could point to that describe this? |
The documentation is scarce, and results may be OS-dependent. I did testing on linux, using a simple test client and server application. The server calls listen() but never calls accept(). The client connects (using blocking socket calls), then sends some bytes. tcpdump shows the client SYN, server SYN/ACK, client ACK, and the client sending data, and the server ACK'ing the data. The reason is that, at least on linux, accept() will always return a fully-established connection socket, not a half-open socket. So to meet that requirement, the server OS must tell the client that is is connected. Once it is connected, the client may fill up at least 1 TCP Window worth of data. |
@ggreenway I see we've been mulling the same issue. I'm also convinced libevent is wrong; They do no such thing, of course; the listener remains open accepting connections, while the event loop stops accepting them during the disabled period. @rosenhouse Greg's answer above is good, James Briggs calls out how this can be accomplished with iptables. http://www.jebriggs.com/blog/2016/10/linux-graceful-service-shutdown-techniques/ Stevens Unix Network Programming vol 1 section 4.5 goes into quite a bit of detail across the variety of network stack implementations, section 30.7 goes into the two queue lengths. Linux does not implement changing the backlog queue length to 0 (queueing incoming SYNs to later be dropped). I'm beginning to look at whether packet filtering might help us here to achieve our choice of dropping the SYN on the floor, or ACK+RST the syn request. |
@wrowe technically I understand why all of this is being discussed, but practically, it's really hard for me to understand why this makes operational sense for any deployment. This is exactly why Envoy supports health check draining. In what situation are you not able to drain traffic via failing health checks of some type? |
I understand your point about libevent. I imagine that was written from the point of view of a library that provides events for things (and also happens to manage sockets sometimes). I think it could be rephrased more correctly as "These functions temporarily disable or reenable listening for events for new connections." But I think that's beside the point, because the socket API doesn't provide for what you want to do (at least on linux). As you pointed out, you'll probably need to do some kind of packet filtering first. If you're going down the road of packet filtering, you may not need any change to envoy. You can add a rule to drop all SYNs, wait until connections close or you've waited as long as you want to, then kill envoy. |
@mattklein123 front-end health check failing doesn't really work with TcpProxy, does it? |
But in the absence of |
In every deployment I've ever done it does. Basically, HTTP health check can be used to fail an Envoy out of some rotation, whether that be a load balancer, DNS, BGP, etc. Then after some period of time the process can be terminated/restarted/etc. I've just never seen a need for this type of control so it's kind of surprising to me. |
Yeah, I see what you mean. I meant specifically |
I'm having trouble following this thread starting from this comment. Up to that comment, the conversation has been around making the current linux socket api drain in-flight connections (i.e. that have finished the 3-way handshake but aren't accepted yet by the application). @mattklein123 can you explain to me what do you mean by this comment and this one |
@jvshahid My understanding from the conversation is that this feature request (OP) cannot be achieved in a non-racy (i.e. correct) way using sockets API alone. However, as you said on our call earlier, the race condition may be unlikely enough that it meets our SLOs, especially when combined with our service discovery deregistration mechanism (NATS deregistration message to Cloud Foundry ingress router). End-users would see failures only when the NATS deregistration message is lost and when this TCP race condition is hit. For that reason, I think we'd still find this feature valuable. The alternative would be for us to do active healthchecking from the ingress router to the backend Envoy. That would be a lot of work in gorouter (and elsewhere in cloud foundry), and I'd have concerns about how it scales when a single ingress router is proxying to >10^5 backends (as we have in large Cloud Foundry deployments). |
But I would also like to better understand the health-checking approach that @mattklein123 and @ggreenway seem to be discussing, in light of the scale targets we have in mind for CF.
Our setup is N ingress proxies, each forwarding to M sidecar'd application instances. We constrain the drain duration to T seconds. With active health-checking from each ingress to each sidecar, that means each ingress router needs to average M/T health-checks per-second. A large Cloud Foundry deployment has: So each ingress router is averaging 20k health-checks per second. Is this realistic? Are we thinking about this wrong? |
@rosenhouse without knowing all the details of your architecture is hard to comment, but the way I would likely approach this would be to utilize a workflow in which you:
|
EDIT: We don't quite do this in Cloud Foundry |
Coming back to this ticket (sorry for the long pause), I think we want to take 3. GET request to /healthcheck/fail out of consideration for this behavior change. While cases 1. and 2. strongly suggest closing most listening sockets immediately, which I'll get into in a follow-up post, Case 3. doesn't fit this pattern. If we consider the Note: behind this documented feature, the listening endpoint would no longer respond with a healthcheck failure status header that it promises to deliver, see; [Edit to add] the endpoint is further unable to to respond to the ok filter ("The /healthcheck/ok admin endpoint reverses this behavior.") Given these contradictions, the entire healthcheck facility should be considered "advisory" and have no effect on polling the listener. My follow-ups will focus on the first two cases, removing a configured listener or taking the process out of service. |
I came here from #7841. I built an internal tool at my company for handling graceful Envoy shutdown along with hot restarts. It was written in Rust and does the following:
We provide the following knobs:
We're using Do note that we are not doing anything with TCP, rather we are relying on load-balancer health checks to ensure draining occurs. We set the eviction timeout to the maximum allowed request duration, but this could be set more aggressively. It would be possible for us to add a hook into this system in order to use I'm not sure if I will be able to open source this effort, but wanted to provide some breadcrumbs for others if they need a solution to this. |
We're implementing lame duck mode for our project, so I may be taking this issue on. Not sure yet though, still scoping. |
Given that Envoy sets REUSE_PORT by default now, it seems there is another reason to stop accepting connections in draining listeners: this would help shift traffic to another instance of envoy bound to the same port. We're thinking specifically of outbound connections that sidecar Envoy intercepts (hence, client "lame duck mode" does not help). We're aware of the inherent race in Linux TCP accept queue implementation as @ggreenway mentioned. |
I stumbled across this feature request while working on shutdown behavior of Envoy used as sidecar proxy for ingress traffic and was surprised to not see any mention of the We have gRPC service endpoints fronted by Envoy sidecars running on Kubernetes. During application shutdown, we somewhat control the sequence of POSIX signals sent to both Envoy and gRPC but are bound to terminate within a 30 seconds window. The current implementation uses active health checks and the |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
Is this still applicable given native sidecar support in Kubernetes? |
update
We've edited this issue to be less prescriptive about solution. it now presents 3 possible solutions that we can see
summary
Given I've configured Envoy with LDS serving a TCP proxy listener on some port
and there are connections in flight
I would like a way to refuse subsequent TCP connections to that port while allowing current established connections to drain
We tried the following approaches, but none of them achieve our goals:
SIGTERM
GET
request to/healthcheck/fail
steps to reproduce
write a
bootstrap.yaml
likewrite a lds-current.yaml file like
launch envoy (I'm using v1.6.0)
confirm that the TCP proxy is working
Possible approach 1: remove the listener
update the LDS to return an empty set of listeners. this is a two step process. first, write an empty LDS response file lds-empty.yaml
second, move that file on top of the file being watched:
in the Envoy stdout logs you'll see a line
attempt to connect to the port where the listener used to be:
expected behavior
Would like to see all new TCP connections be refused immediately, as if a listener had never been added in the first place. Existing TCP connections should continue to be serviced.
actual behavior
the port is still open, even after the LDS update occurs
clients can connect to the port, but the TCP proxying seems to hang (can't tell where)
this state remains until --drain-time-s time has elapsed (30 seconds in this example). At that point the port is finally closed, so you see
Possible approach 2: pkill -SIGTERM envoy
If instead of removing the listeners we signaled Envoy
Envoy exits immediately without allowing current connections to drain
EDITED to remove incorrect bit about listeners staying open after SIGTERM.
Possible approach 3: admin healthcheck fail
We could instead
GET /healthcheck/fail
to trigger this behavior. As above, we would expect that new TCP connections should be refused while existing TCP connections are serviced.background
In Cloud Foundry, we have the following setup currently:
Each application instance has a sidecar Envoy which terminates TLS connections from the shared ingress router. Applications may not speak HTTP, so we use basic TCP connectivity checks from the shared Router to the Envoy in order to infer application health and determine if a client connection should be load-balanced to that Envoy. When the upstream Envoy accepts the TCP connection, the Router considers that upstream healthy. When the upstream refuses the TCP connection, the Router considers that upstream unhealthy.
During a graceful shutdown, the scheduler ought to be able to drain the Envoy before terminating the application. This would mean that the Envoy ought to service any in-flight TCP connections without accepting any new ones.
acknowledgements
h/t @jvshahid and @emalm for investigation and edits
The text was updated successfully, but these errors were encountered: