Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add nginx keepalive_timeout env to prevent race condition on gorouter #637

Merged
merged 1 commit into from
May 31, 2023

Conversation

loadpi
Copy link

@loadpi loadpi commented May 25, 2023

the current default of 30 causes the same inconsistent 502's as https://community.pivotal.io/s/article/5004y00001buMQz1626802995951?language=en_US

and we need to set it to >90s so the gorouter is in charge of closing the connections
https://docs.cloudfoundry.org/adminguide/routing-keepalive.html

tl;dr

Reason for this change:

With GoRouter's 90s keep alive enabled and the build pack's 30s it caused a intermittent EOF errors on the gorouter because it was trying to use an already closed connection...

{"log_level":3,"timestamp":"2023-05-24T09:26:55.408725231Z","message":"backend-endpoint-failed","source":"vcap.gorouter","data":{"route-endpoint":{"ApplicationId":"zzz","Addr":"zzz","Tags":{"app_id":"zzz","app_name":"xastest","component":"route-emitter","instance_id":"0","organization_id":"zzz","organization_name":"zzz","process_id":"zzz","process_instance_id":"zzz","process_type":"web","source_id":"zzz","space_id":"zzz","space_name":"zzz"},"RouteServiceUrl":""},"error":"EOF","attempt":1,"vcap_request_id":"zzz","retriable":false,"num-endpoints":1,"got-connection":true,"wrote-headers":true,"conn-reused":true,"dns-lookup-time":0,"dial-time":0,"tls-handshake-time":0}}

and on the client this header

'x-cf-routererror': 'endpoint_failure (EOF)'

Here are the two links as ref:

Testing this change:

Without any changes:

CF-nonp-4> cf push --no-start -b https://github.com/loadpi/cf-mendix-buildpack.git\#Add-nginx-keepalive-timeout -m 2G -k 4G -p SimpleAppRetrieves.mda xastest -d dev.mendixcloud.com
CF-nonp-4> cf start xastest
CF-nonp-4> cf ssh xastest
$ cat app/nginx/conf/nginx.conf | grep _timeout
    keepalive_timeout 30;

Then with the envvar

CF-nonp-4> cf set-env xastest NGINX_KEEPALIVE_TIMEOUT "100"
CF-nonp-4> cf restage xastest
CF-nonp-4> cf ssh xastest
$ cat app/nginx/conf/nginx.conf | grep _timeout
    keepalive_timeout 100;

This app starts posting to /xas twice every 10s and on long running tests with puppeteer the error's are not present when the nginx keepalive timeout is 100s

@loadpi loadpi force-pushed the Add-nginx-keepalive-timeout branch from 74f31c4 to 4b2c1e5 Compare May 25, 2023 19:43
@loadpi loadpi changed the base branch from master to develop May 25, 2023 19:47
@loadpi loadpi force-pushed the Add-nginx-keepalive-timeout branch from 4b2c1e5 to 54bef2e Compare May 25, 2023 20:16
@loadpi loadpi force-pushed the Add-nginx-keepalive-timeout branch from 54bef2e to 72f29ce Compare May 26, 2023 08:27
@sailhenz sailhenz merged commit 146e1f5 into mendix:develop May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants