Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect early close not available on non-Linux systems #4294

Closed
zuercher opened this issue Aug 29, 2018 · 1 comment
Closed

detect early close not available on non-Linux systems #4294

zuercher opened this issue Aug 29, 2018 · 1 comment

Comments

@zuercher
Copy link
Member

zuercher commented Aug 29, 2018

ConnectionImpl has a "detect early close" feature that's enabled by default. Under Linux (specifically libevent using epoll), we can detect via EPOLLRDHUP that a connection has been closed by the remote host without actually attempting to read from the socket and even if there is readable data pending in the kernel. In this case, libevent generates an EV_CLOSED event.

Envoy makes use of this feature in the HTTP1 stack by disabling reads from a downstream connection (e.g. exerting flow control) while the upstream request/response is being handled. Detection of an early close allows the upstream request/response to be short-circuited if the downstream client disconnects before receiving the full upstream response. In the absence of early close detection, the upstream request/response cannot be short circuited and the associated resources stay allocated until the response is completed or some other error occurs (e.g. upstream timeout).

Partial workaround:
Without libevent's EV_FEATURE_EARLY_CLOSE, we detect remote close by receiving a read event and completing a successful read that returns no data. (An unclosed socket returns data or an EAGAIN error.) However, read disabled is meant to exert flow control so we never wish to read data when reads are disabled (except an actual close).

By leaving read events enabled in the ConnectionImpl when detect_early_close_ is enabled and read_enabled_ is false, we can detect the read event triggered when the remote end of a connection is closed. In ConnectionImpl::onFileEvent we can detect a read event with detect_early_close_ && !read_enabled_ and make some attempt to detect a remote close without reading. Available options in OS X are getsockopt with SO_NREAD or recv with the MSG_PEEK flag. If these indicate no data, we assume the socket is being closed and simulate a close event. (N.B. That assumption might not be valid.) If data is present, we record that we saw a read event with data so that we can simulate a read event if reads are re-enabled. This workaround fixes the integration tests that depend on early close detection, except in the case of SSL. When an SSL connection is terminated, a close-notify message is sent and so we see actual data available. What we really want is some system call that indicates a socket was closed by the remote even if data is present. This doesn't seem to exist in OS X.

An alternative means to detect the closed socket is to actually attempt a small read from the underlying TransportSocket object and allow the data to be be buffered if the read succeeds. If the remote is actually closed with no application data, the SslSocket will read the close-notify event and then detect connection closed and RawBufferSocket will see a closed socket.

The difficulty with both approaches is that a client might actually send (unexpected) data and then close, which we still wouldn't be able to detect without reading all the data. Reading the arbitrarily large data defeats flow control. And if we leave it in place, we get the exact behavior we're trying to fix.

@zuercher zuercher added bug no stalebot Disables stalebot from closing an issue labels Aug 29, 2018
zuercher added a commit to turbinelabs/envoy that referenced this issue Aug 30, 2018
Implements early close detection on OS X by inspecting the TCP
state using getsockopt.

*Risk Level*: low (conditional compilation)
*Testing*: integration tests pass (tested against envoyproxy#4232)
*Docs Changes*: n/a
*Release Notes*: n/a
*Fixes*: envoyproxy#4294

Signed-off-by: Stephan Zuercher <[email protected]>
@zuercher
Copy link
Member Author

zuercher commented Sep 4, 2018

I was eventually able to develop a hack to fix this bug on OS X. See #4299 -- it involves using getsockopt(_, IPPROTO_TCP, TCP_CONNECTION_INFO, _, _) to obtain information about the TCP connection, which includes the TCP finite state machine's current state, to determine if the TCP stack is waiting for Envoy to close the connection (which implies that the remote has already done so). It doesn't seem worth the complexity, but may be useful for other ports in the future.

@zuercher zuercher added wontfix and removed no stalebot Disables stalebot from closing an issue labels Sep 4, 2018
@zuercher zuercher closed this as completed Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant