detect early close not available on non-Linux systems #4294

zuercher · 2018-08-29T17:15:08Z

ConnectionImpl has a "detect early close" feature that's enabled by default. Under Linux (specifically libevent using epoll), we can detect via EPOLLRDHUP that a connection has been closed by the remote host without actually attempting to read from the socket and even if there is readable data pending in the kernel. In this case, libevent generates an EV_CLOSED event.

Envoy makes use of this feature in the HTTP1 stack by disabling reads from a downstream connection (e.g. exerting flow control) while the upstream request/response is being handled. Detection of an early close allows the upstream request/response to be short-circuited if the downstream client disconnects before receiving the full upstream response. In the absence of early close detection, the upstream request/response cannot be short circuited and the associated resources stay allocated until the response is completed or some other error occurs (e.g. upstream timeout).

Partial workaround:
Without libevent's EV_FEATURE_EARLY_CLOSE, we detect remote close by receiving a read event and completing a successful read that returns no data. (An unclosed socket returns data or an EAGAIN error.) However, read disabled is meant to exert flow control so we never wish to read data when reads are disabled (except an actual close).

By leaving read events enabled in the ConnectionImpl when detect_early_close_ is enabled and read_enabled_ is false, we can detect the read event triggered when the remote end of a connection is closed. In ConnectionImpl::onFileEvent we can detect a read event with detect_early_close_ && !read_enabled_ and make some attempt to detect a remote close without reading. Available options in OS X are getsockopt with SO_NREAD or recv with the MSG_PEEK flag. If these indicate no data, we assume the socket is being closed and simulate a close event. (N.B. That assumption might not be valid.) If data is present, we record that we saw a read event with data so that we can simulate a read event if reads are re-enabled. This workaround fixes the integration tests that depend on early close detection, except in the case of SSL. When an SSL connection is terminated, a close-notify message is sent and so we see actual data available. What we really want is some system call that indicates a socket was closed by the remote even if data is present. This doesn't seem to exist in OS X.

An alternative means to detect the closed socket is to actually attempt a small read from the underlying TransportSocket object and allow the data to be be buffered if the read succeeds. If the remote is actually closed with no application data, the SslSocket will read the close-notify event and then detect connection closed and RawBufferSocket will see a closed socket.

The difficulty with both approaches is that a client might actually send (unexpected) data and then close, which we still wouldn't be able to detect without reading all the data. Reading the arbitrarily large data defeats flow control. And if we leave it in place, we get the exact behavior we're trying to fix.

The text was updated successfully, but these errors were encountered:

Implements early close detection on OS X by inspecting the TCP state using getsockopt. *Risk Level*: low (conditional compilation) *Testing*: integration tests pass (tested against envoyproxy#4232) *Docs Changes*: n/a *Release Notes*: n/a *Fixes*: envoyproxy#4294 Signed-off-by: Stephan Zuercher <[email protected]>

zuercher · 2018-09-04T22:35:17Z

I was eventually able to develop a hack to fix this bug on OS X. See #4299 -- it involves using getsockopt(_, IPPROTO_TCP, TCP_CONNECTION_INFO, _, _) to obtain information about the TCP connection, which includes the TCP finite state machine's current state, to determine if the TCP stack is waiting for Envoy to close the connection (which implies that the remote has already done so). It doesn't seem worth the complexity, but may be useful for other ports in the future.

zuercher added bug no stalebot Disables stalebot from closing an issue labels Aug 29, 2018

zuercher mentioned this issue Aug 29, 2018

test: Stop fake_upstream methods from accidentally succeeding #4232

Merged

zuercher mentioned this issue Aug 30, 2018

network: implement early close detection for OS X #4299

Closed

zuercher added wontfix and removed no stalebot Disables stalebot from closing an issue labels Sep 4, 2018

zuercher closed this as completed Sep 4, 2018

mattklein123 mentioned this issue Jan 9, 2019

tap: introduce HTTP tap filter #5515

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect early close not available on non-Linux systems #4294

detect early close not available on non-Linux systems #4294

zuercher commented Aug 29, 2018 •

edited

Loading

zuercher commented Sep 4, 2018 •

edited

Loading

detect early close not available on non-Linux systems #4294

detect early close not available on non-Linux systems #4294

Comments

zuercher commented Aug 29, 2018 • edited Loading

zuercher commented Sep 4, 2018 • edited Loading

zuercher commented Aug 29, 2018 •

edited

Loading

zuercher commented Sep 4, 2018 •

edited

Loading