-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fabio routing to wrong back end #421
Comments
The quickest way for me to reproduce this would be to have the full routing table and the request that gets misrouted. Can you provide that? DM me on Twitter if you need a secure channel first.
—
Frank Schröder
… On 18. Jan 2018, at 03:03, Craig Day ***@***.***> wrote:
We are experiencing multiple instances of Fabio routing requests to the wrong backend service. Once it starts happening it persists, typically until some change is made to the routing table by a service restart or similar. It's quite catastrophic because once it starts happening it's as if a whole bunch of URL/endpoints just disappear and start returning 404 not found because the requests are landing on a backend that doesn't serve them.
We have traced using tcpdump the requests coming in and out of Fabio and have proved beyond doubt that it is making the wrong routing decisions.
The attached dump shows a request coming into fabio for d2mx-prod-admin.dionglobal.com and then leaving fabio destined for a service on port 30398 but the routing table/consul indicates that the service on 30398 is analyser.sequoiadirect.com.au.
We were initially running on fabio-1.5.0-go1.8.3-linux_amd64 but moved to fabio-1.5.4-go1.9.2-linux_amd64 to see if fixed the problem, but it hasnt.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thank's Frank. What's the best way or format to capture the routing table? The table doesn't actually change very often, and AFAIK, the table as it is now will be the same as when the failure starts to occur. |
@craigday try |
@craigday Is this still an issue? |
Yep. Have sent you the routing table just now. |
This hit us again this morning. Is there any further info we can provide? Can you enumerate any possible theories or code paths that might be suspect, so we can help with the analysis? |
Looking. |
@craigday I'm awake now and DM'ing you on Twitter. |
Got hit by this as well running fabio 1.5.7 Requests to all our services started to get routed intermittent to our hash-ui service. Causing lots of weirdness. Stopping/Purging the hashi-ui service and starting it again made the problem dissapear for now. |
@atillamas FYI we believe we know what is causing this issue. Websocket requests that fail upgrade are left open and connected to their original back end. Nnginx out front is pooling these and sending requests straight through to this backend, completely bypassing the Fabio routing. Our workaround, for now, was to isolate the websocket onto an isolated fabio cluster. I believe Fabio should be detecting these failed upgrades and closing the connections. |
The websocket proxy is implemented as a raw tcp proxy which relies on the client and server to close the connection. When a websocket upgrade fails the upstream server may keep the connection open. If a proxy like nginx is used in front of fabio it will keep its connection to fabio open effectively establishing a direct channel between nginx and the upstream server which will be used for any request forwarded by nginx to fabio. Adding a 'Connection: close' header to the upstream request should indicate to the server to close the connection. If that works then we can keep the raw tcp proxy for websockets. Otherwise, fabio needs to handle the handshake and close the connection itself. Fixes #421
The way fabio currently handles websockets (via a raw tcp connection) makes detection a bit difficult. We can go back to a protocol proxy which relays WS messages instead. However, a first attempt is to inject a |
@craigday Can you try that patch and see if it works? I'll add an integration test to simulate that behavior later. |
The websocket proxy is implemented as a raw tcp proxy which relies on the client and server to close the connection. When a websocket upgrade fails the upstream server may keep the connection open. If a proxy like nginx is used in front of fabio it will keep its connection to fabio open effectively establishing a direct channel between nginx and the upstream server which will be used for any request forwarded by nginx to fabio. Adding a 'Connection: close' header to the upstream request should indicate to the server to close the connection. If that works then we can keep the raw tcp proxy for websockets. Otherwise, fabio needs to handle the handshake and close the connection itself. Fixes #421
ws close on failed handshake (#421)
We are experiencing multiple instances of Fabio routing requests to the wrong backend service. Once it starts happening it persists, typically until some change is made to the routing table by a service restart or similar. It's quite catastrophic because once it starts happening it's as if a whole bunch of URL/endpoints just disappear and start returning 404 not found because the requests are landing on a backend that doesn't serve them.
We have traced using tcpdump the requests coming in and out of Fabio and have proved beyond doubt that it is making the wrong routing decisions.
The attached dump shows a request coming into fabio for
d2mx-prod-admin.dionglobal.com
and then leaving fabio destined for a service on port30398
but the routing table/consul indicates that the service on30398
isanalyser.sequoiadirect.com.au
.tcpro-tcpro-tcpro passing 172.25.135.14 30398 [urlprefix-analyser.sequoiadirect.com.au:9999/, urlprefix-analyser.boursedata.com.au:9999/, urlprefix -analyser.d2mx.com.au:9999/
We were initially running on fabio-1.5.0-go1.8.3-linux_amd64 but moved to fabio-1.5.4-go1.9.2-linux_amd64 to see if fixed the problem, but it hasnt.
The text was updated successfully, but these errors were encountered: