diff --git a/README.md b/README.md index 75f1543c..81078477 100644 --- a/README.md +++ b/README.md @@ -268,7 +268,8 @@ are [documented here](Longhelp.md#templates). > < HTTP/1.1 200 OK ``` - * Some of the features of marathon-lb assume that it is the only instance of itself running in a PID namespace. i.e. marathon-lb assumes that it is running in a container. Certain features like the `/_mlb_signal` endpoints and the `/_haproxy_getpids` endpoint (and by extension, zero-downtime deployments) may behave unexpectedly if more than one instance of marathon-lb is running in the same PID namespace or if there are other HAProxy processes in the same PID namespace. + * Some of the features of marathon-lb assume that it is the only instance of itself running in a PID namespace. i.e. marathon-lb assumes that it is running in a container. Certain features like the `/_mlb_signal` endpoints and the `/_haproxy_getpids` endpoint (and by extension, zero-downtime deployments) may behave unexpectedly if more than one instance of marathon-lb is running in the same PID namespace or if there are other HAProxy processes in the same PID namespace. + * You may want to set the `HAPROXY_RELOAD_SIGTERM_DELAY` environment variable to a value such as `5m`. This value is passed directly to the `sleep` command, which is executed after every HAProxy reload before sending a SIGTERM to the old HAProxy PIDs (see [service/haproxy/run](service/haproxy/run)). For cases where you expect long-lived TCP connections, you may _not_ want to terminate HAProxy before all connections finish. See [this discussion](http://www.serverphorums.com/read.php?10,862139) for more on HAProxy reloads, and issues [#5](https://github.com/mesosphere/marathon-lb/issues/5), [#71](https://github.com/mesosphere/marathon-lb/issues/71), [#267](https://github.com/mesosphere/marathon-lb/issues/267), [#276](https://github.com/mesosphere/marathon-lb/issues/276), and [#318](https://github.com/mesosphere/marathon-lb/issues/318) for more. If you are reloading so frequently that PIDs are being reused within the delay you specify, this may result in SIGTERMs being sent to the wrong PIDs. ## Zero-downtime Deployments diff --git a/run b/run index d2734849..f02a0a86 100755 --- a/run +++ b/run @@ -16,6 +16,11 @@ else exit 1 fi +if [ -n "${HAPROXY_RELOAD_SIGTERM_DELAY-}" ]; then + echo $HAPROXY_RELOAD_SIGTERM_DELAY > $HAPROXY_SERVICE/env/HAPROXY_RELOAD_SIGTERM_DELAY +fi + + # Find the --ssl-certs arg if one was provided, # get the certs and remove them and the arg from the list # of positional parameters so we don't duplicate them diff --git a/service/haproxy/run b/service/haproxy/run index 52ee4d18..cd1d47e6 100755 --- a/service/haproxy/run +++ b/service/haproxy/run @@ -38,7 +38,11 @@ reload() { socat /var/run/haproxy/socket - <<< "show servers state" > /var/state/haproxy/global # Trigger reload - haproxy -p $PIDFILE -f /marathon-lb/haproxy.cfg -D -sf $(pidof haproxy) + HAPROXY_PIDS=$(pidof haproxy) + haproxy -p $PIDFILE -f /marathon-lb/haproxy.cfg -D -sf $HAPROXY_PIDS + if [ -n "${HAPROXY_RELOAD_SIGTERM_DELAY-}" ]; then + sleep $HAPROXY_RELOAD_SIGTERM_DELAY && kill $HAPROXY_PIDS 2> /dev/null & + fi # Remove the firewall rules removeFirewallRules