-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connection issues with connect native tasks in bridge networking #10933
Comments
Hi @shoenig -- I wonder if this is traefik's fault (or can you reproduce this with arbitrary connect native jobs?). If the connect native job were to tear down the connection to the socket mid response etc then this could probably also causes issues like this. It should probably not get logged at If really traefik is at fault I wonder why/how because it simply uses the consul SDK :D |
I'm fairly sure this is a result of re-using some connection handling bits intended for long-lived gRPC connections between the nomad agent and consul agent, and experiencing disconnects when being used with the Consul HTTP listener instead. In a world where everything is easy we'd just proxy the HTTP requests, but that doesn't work when either Consul or the app is expecting to use mTLS. We could just get rid of the log statement since the TCP proxy gets re-created on the next HTTP request anyway, but I want to poke around a bit more and see if I can't manage the proxy lifecycle per request. |
When creating a TCP proxy bridge for Connect tasks, we are at the mercy of either end for managing the connection state. For long lived gRPC connections the proxy could reasonably expect to stay open until the context was cancelled. For the HTTP connections used by connect native tasks, we experience connection disconnects. The proxy gets recreated as needed on follow up requests, however we also emit a WARN log when the connection is broken. This PR lowers the WARN to a TRACE, because these disconnects are to be expected. Ideally we would be able to proxy at the HTTP layer, however Consul or the connect native task could be configured to expect mTLS, preventing Nomad from MiTM the requests. We also can't mange the proxy lifecycle more intelligently, because we have no control over the HTTP client or server and how they wish to manage connection state. What we have now works, it's just noisy. Fixes #10933
When creating a TCP proxy bridge for Connect tasks, we are at the mercy of either end for managing the connection state. For long lived gRPC connections the proxy could reasonably expect to stay open until the context was cancelled. For the HTTP connections used by connect native tasks, we experience connection disconnects. The proxy gets recreated as needed on follow up requests, however we also emit a WARN log when the connection is broken. This PR lowers the WARN to a TRACE, because these disconnects are to be expected. Ideally we would be able to proxy at the HTTP layer, however Consul or the connect native task could be configured to expect mTLS, preventing Nomad from MiTM the requests. We also can't mange the proxy lifecycle more intelligently, because we have no control over the HTTP client or server and how they wish to manage connection state. What we have now works, it's just noisy. Fixes #10933
When creating a TCP proxy bridge for Connect tasks, we are at the mercy of either end for managing the connection state. For long lived gRPC connections the proxy could reasonably expect to stay open until the context was cancelled. For the HTTP connections used by connect native tasks, we experience connection disconnects. The proxy gets recreated as needed on follow up requests, however we also emit a WARN log when the connection is broken. This PR lowers the WARN to a TRACE, because these disconnects are to be expected. Ideally we would be able to proxy at the HTTP layer, however Consul or the connect native task could be configured to expect mTLS, preventing Nomad from MiTM the requests. We also can't mange the proxy lifecycle more intelligently, because we have no control over the HTTP client or server and how they wish to manage connection state. What we have now works, it's just noisy. Fixes #10933
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
As pointed out by @apollo13 in #10804 (comment)
I believe those errors are coming from theconsul_grpc_sock
hook, which I don't think even needs to be present for non-sidecar using connect tasks? Although it's also suspicious there are connection errors...JK it's the
consul_http_sock
hook using shared code from the other one.The text was updated successfully, but these errors were encountered: