Race condition causing not released lock and too many HAProxy processes #267

mlowicki · 2016-07-18T19:00:11Z

Hi,

Initially I've asked on Marathon mailing list but gather more data and since I've discovered that https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run isn't provided by HAProxy but Marathon-lb then issue probably is caused by later.

Looks like lock isn't released and we've some hanging calls to flock:

root@lb2i:/marathon-lb/service/haproxy# ps aux | grep [f]lock
root       385  0.0  0.0   8464   868 ?        S    Jul13   0:00 flock 200
root       393  0.0  0.0   8464   856 ?        S    Jul13   0:00 flock 200
root       790  0.0  0.0   8464   856 ?        S    Jul13   0:00 flock 200
root      4951  0.0  0.0   8464   936 ?        S    Jul13   0:00 flock 200
root      6551  0.0  0.0   8464   856 ?        S    Jul13   0:00 flock 200
root     11841  0.0  0.0   8464   868 ?        S    Jul13   0:00 flock 200
root     11857  0.0  0.0   8464   864 ?        S    Jul13   0:00 flock 200
root     17515  0.0  0.0   8464   832 ?        S    Jul13   0:00 flock 200
root     17543  0.0  0.0   8464   796 ?        S    Jul13   0:00 flock 200
root     23167  0.0  0.0   8464   876 ?        S    Jul13   0:00 flock 200
root     24277  0.0  0.0   8464   856 ?        S    Jul13   0:00 flock 200
root     27799  0.0  0.0   8464   872 ?        S    Jul13   0:00 flock 200
root     27993  0.0  0.0   8464   880 ?        S    Jul13   0:00 flock 200
root     30333  0.0  0.0   8464   936 ?        S    Jul13   0:00 flock 200
root     34368  0.0  0.0   8464   884 ?        S    Jul13   0:00 flock 200
root     47233  0.0  0.0   8464   880 ?        S    Jul13   0:00 flock 200
root     48034  0.0  0.0   8464   936 ?        S    Jul13   0:00 flock 200
root     49056  0.0  0.0   8464   828 ?        S    Jul13   0:00 flock 200

root@lb2i:/marathon-lb/service/haproxy# flock --wait 5 ./run "echo d"
root@lb2i:/marathon-lb/service/haproxy# echo $?
1

> lsof ./run
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
flock     385 root  200r   REG  253,4     1228 8426486 ./run
flock     393 root  200r   REG  253,4     1228 8426486 ./run
flock     790 root  200r   REG  253,4     1228 8426486 ./run
flock    4951 root  200r   REG  253,4     1228 8426486 ./run
flock    6551 root  200r   REG  253,4     1228 8426486 ./run
flock   11841 root  200r   REG  253,4     1228 8426486 ./run
flock   11857 root  200r   REG  253,4     1228 8426486 ./run
flock   17515 root  200r   REG  253,4     1228 8426486 ./run
flock   17543 root  200r   REG  253,4     1228 8426486 ./run
flock   23167 root  200r   REG  253,4     1228 8426486 ./run
flock   24277 root  200r   REG  253,4     1228 8426486 ./run
flock   27799 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27800 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27801 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27802 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27803 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27804 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27805 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27806 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27807 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27808 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27809 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27810 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27811 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27812 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27813 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27814 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27815 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27816 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27817 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27818 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27819 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27820 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27821 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27822 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27823 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27824 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27825 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27826 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27827 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27828 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27829 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27830 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27831 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27832 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27833 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27834 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27835 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27836 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27837 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27838 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27839 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27840 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27841 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27842 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27843 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27844 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27845 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27846 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27847 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27914 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27915 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27916 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27917 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27918 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27919 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27920 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27921 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27922 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27923 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27924 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27925 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27926 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27927 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27928 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27929 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27930 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27931 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27932 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27933 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27934 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27935 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27936 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27937 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27938 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27939 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27940 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27941 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27942 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27943 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27944 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27945 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27946 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27947 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27948 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27949 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27950 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27951 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27952 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27953 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27954 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27955 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27956 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27957 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27958 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27959 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27960 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27961 root  200r   REG  253,4     1228 8426486 ./run
flock   27993 root  200r   REG  253,4     1228 8426486 ./run
flock   30333 root  200r   REG  253,4     1228 8426486 ./run
run     34362 root  200r   REG  253,4     1228 8426486 ./run
run     34362 root  255r   REG  253,4     1228 8426486 ./run
flock   34368 root  200r   REG  253,4     1228 8426486 ./run
flock   47233 root  200r   REG  253,4     1228 8426486 ./run
flock   48034 root  200r   REG  253,4     1228 8426486 ./run
flock   49056 root  200r   REG  253,4     1228 8426486 ./run

I'm running:

cat /proc/version
Linux version 4.5.0-0.bpo.1-amd64 ([email protected]) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.5.1-1~bpo8+1 (2016-04-20)

Initially we've suspected it's related to nbproc we're using from HAProxy but even without it we've reached the state when we had more than one HAProxy process (right after changing configuration, killing old containers and then Marathon starts new one). It doesn't reproduce always though so probably there is some kind of race condition

Lists of HAProxy processes (there are 96 but should be 48 but we had also box with almost 500 HAProxy processes) - https://gist.github.com/mlowicki/1b3672db46c2dcd26f1db7bba004ad99.

The text was updated successfully, but these errors were encountered:

brndnmtthws · 2016-07-18T22:04:40Z

I'm definitely not seeing this in my own tests. Are you using the provided Docker image? Do you have long lived connections?

The recommendation from the HAProxy docs is to not use a value of nbproc > 1, unless you have a strong reason to do so. To quote the docs:

In order to increase the CPU capacity, it is possible to make HAProxy run as
several processes, using the "nbproc" directive in the global section. There
are some limitations though :

health checks are run per process, so the target servers will get as many
checks as there are running processes ;

maxconn values and queues are per-process so the correct value must be set
to avoid overloading the servers ;

outgoing connections should avoid using port ranges to avoid conflicts

stick-tables are per process and are not shared between processes ;

each peers section may only run on a single process at a time ;

the CLI operations will only act on a single process at a time.

I've never tried to use the multiprocess feature of HAProxy, so I can't comment on how well it works.

rasputnik · 2016-07-18T22:07:13Z

I wonder if this is the same issue I've seen on the Bamboo project:

QubitProducts/bamboo#200

brndnmtthws · 2016-07-18T22:13:05Z

Similar issues have been reported in the past: #5, #28, #71

mlowicki · 2016-07-18T22:17:38Z

Fix there was to upgrade Go which isn't used in our case.

@brndnmtthws As I wrote problem reproduces also without using nbproc. We're using mesosphere/marathon-lb:v1.3.1.

Scenario in my head is as follows:

1st reload is triggered and reaches https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run#L41
in more or less the same time 2nd reload is triggered

HAProxy handling 1st reload spawns new processes (still leaving old ones - X). When this is still in progress 2nd reload applies and in https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run#L41 $(cat $PIDFILE) evaluates to list of new processed spawned by 1st reload. X haven't been still terminated. When L41 is launched then removing X is somehow cancelled and we're ending up with new set of process (after 2nd reload) and X.

brndnmtthws · 2016-07-18T22:24:16Z

This is pretty strange, because MLB shouldn't reload HAProxy in parallel.

There's a check in MLB which blocks to make sure the reload actually occurs: https://github.com/mesosphere/marathon-lb/blob/master/marathon_lb.py#L599-L603

And the thread which executes the reloads cannot run in parallel. The code above, however, doesn't check which PIDs have changed, only if there's any change at all. that could be made a little more robust (i.e., check that a new process was spawned).

mlowicki · 2016-07-18T22:30:56Z

Yeah, noticed it and seems as you mentioned that could be improved a bit (theoretically still can be vulnerable to my scenario - there are also 2 calls to sleep in https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run#L41 but as it happens randomly it might be somehow bypassed).

Not sure if we should check only that new processes have been added but maybe also that old ones are gone to avoid my scenario? I might be that $(cat $PIDFILE) doesn't report processes waiting to be drained off but still present in OS' process table?

brndnmtthws · 2016-07-18T22:32:37Z

Checking that old processes are gone doesn't work because there's an assumption that old processes may stick around forever after a reload in the event of long-lived connections.

mlowicki · 2016-07-18T22:44:13Z

We might test on our side if this is it by using -st instead of -sf in https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run#L41.

I guess would be good to know first if HAProxy reports in PIDFILE also processes waiting for connections to be closed:

If yes then probably further reloads maybe could fix it - although in our case we've seen old HAProxy processes are staying around for at least couple of days and having many changes in Marathon during that time.
If not then I would be more convinced to scenario described above.

When doing a graceful reload, use `pidof haproxy` rather than catting the pidfile to make sure that we send SIGUSR1 to _all_ existing HAProxy instances. This is to address #267.

When reloading, make sure there's actually a new PID by doing some basic set math. To address #267.

brndnmtthws · 2016-07-18T23:12:30Z

Go ahead and test out the mesosphere/marathon-lb:use-pidof-instead-of-pidfile and mesosphere/marathon-lb:check-for-new-pid Docker images. I'm not sure if those will fix it, but it may help.

When doing a graceful reload, use `pidof haproxy` rather than catting the pidfile to make sure that we send SIGUSR1 to _all_ existing HAProxy instances. This is to address #267.

When reloading, make sure there's actually a new PID by doing some basic set math. To address #267.

discordianfish · 2016-07-25T16:36:16Z

I don't believe this will help, the issue appears to be a issue with haproxy itself. Talk to to @lloesche for context.

h0tbird · 2016-08-25T16:40:39Z

This is also affecting zdd:

zdd: Existing app running 0 instances, new app running 3 instances
zdd: Waiting for 2 pids on http://10.136.114.49:9090
zdd: Waiting for 2 pids on http://10.136.104.102:9090
zdd: Waiting for 2 pids on http://10.136.84.63:9090
zdd: Existing app running 0 instances, new app running 3 instances
zdd: Waiting for 2 pids on http://10.136.114.49:9090
zdd: Waiting for 2 pids on http://10.136.104.102:9090
zdd: Waiting for 2 pids on http://10.136.84.63:9090
zdd: Existing app running 0 instances, new app running 3 instances
zdd: Waiting for 2 pids on http://10.136.114.49:9090
zdd: Waiting for 2 pids on http://10.136.104.102:9090
zdd: Waiting for 2 pids on http://10.136.84.63:9090
zdd: Timed out waiting for tasks to fully drain, find any draining tasks and continue with deployment...

h0tbird · 2016-09-20T10:19:24Z

Is this fixed in HAProxy 1.6.9? I have updated to af45a72 and I can't reproduce the issue anymore. Will you cut a new release soon?

h0tbird · 2016-09-20T10:58:01Z

Sorry, the issue seems to reproduce after all :(

This is to address issues #5, #71, #267, #276, and #318.

JinsYin · 2017-03-04T02:53:27Z

You can try add the follow info.
"volumes": [
{
"containerPath": "/dev/log",
"hostPath": "/dev/log",
"mode": "RW"
}
]

brndnmtthws added a commit that referenced this issue Jul 18, 2016

Use pidof instead of pidfile w/ reload.

8fbccf7

When doing a graceful reload, use `pidof haproxy` rather than catting the pidfile to make sure that we send SIGUSR1 to _all_ existing HAProxy instances. This is to address #267.

brndnmtthws added a commit that referenced this issue Jul 18, 2016

Check for new PID after reload.

f49da7f

When reloading, make sure there's actually a new PID by doing some basic set math. To address #267.

This was referenced Jul 18, 2016

Use pidof instead of pidfile w/ reload. #268

Merged

Check for new PID after reload. #269

Merged

brndnmtthws added pr ready enhancement labels Jul 19, 2016

brndnmtthws added a commit that referenced this issue Jul 20, 2016

Check for new PID after reload. (#269)

e36e8db

When reloading, make sure there's actually a new PID by doing some basic set math. To address #267.

brndnmtthws removed the pr ready label Jul 25, 2016

h0tbird mentioned this issue Sep 20, 2016

zdd: waiting for X pids on HAProxy katosys/kato#36

Closed

brndnmtthws added a commit that referenced this issue Sep 26, 2016

Optionally send SIGTERM after reload.

ef22fbf

This is to address issues #5, #71, #267, #276, and #318.

brndnmtthws added a commit that referenced this issue Sep 26, 2016

Optionally send SIGTERM after reload.

e273faf

This is to address issues #5, #71, #267, #276, and #318.

brndnmtthws mentioned this issue Sep 26, 2016

Optionally send SIGTERM after reload. #321

Merged

brndnmtthws added a commit that referenced this issue Sep 26, 2016

Optionally send SIGTERM after reload.

806d93a

This is to address issues #5, #71, #267, #276, and #318.

brndnmtthws added a commit that referenced this issue Sep 29, 2016

Optionally send SIGTERM after reload.

f1df667

This is to address issues #5, #71, #267, #276, and #318.

brndnmtthws added a commit that referenced this issue Sep 29, 2016

Optionally send SIGTERM after reload.

568ec34

This is to address issues #5, #71, #267, #276, and #318.

brndnmtthws added a commit that referenced this issue Sep 29, 2016

Optionally send SIGTERM after reload. (#321)

1b82aa0

This is to address issues #5, #71, #267, #276, and #318.

brndnmtthws closed this as completed Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition causing not released lock and too many HAProxy processes #267

Race condition causing not released lock and too many HAProxy processes #267

mlowicki commented Jul 18, 2016 •

edited

Loading

brndnmtthws commented Jul 18, 2016

rasputnik commented Jul 18, 2016 •

edited

Loading

brndnmtthws commented Jul 18, 2016

mlowicki commented Jul 18, 2016

brndnmtthws commented Jul 18, 2016

mlowicki commented Jul 18, 2016

brndnmtthws commented Jul 18, 2016

mlowicki commented Jul 18, 2016

brndnmtthws commented Jul 18, 2016

discordianfish commented Jul 25, 2016

h0tbird commented Aug 25, 2016

h0tbird commented Sep 20, 2016

h0tbird commented Sep 20, 2016

JinsYin commented Mar 4, 2017

Race condition causing not released lock and too many HAProxy processes #267

Race condition causing not released lock and too many HAProxy processes #267

Comments

mlowicki commented Jul 18, 2016 • edited Loading

brndnmtthws commented Jul 18, 2016

rasputnik commented Jul 18, 2016 • edited Loading

brndnmtthws commented Jul 18, 2016

mlowicki commented Jul 18, 2016

brndnmtthws commented Jul 18, 2016

mlowicki commented Jul 18, 2016

brndnmtthws commented Jul 18, 2016

mlowicki commented Jul 18, 2016

brndnmtthws commented Jul 18, 2016

discordianfish commented Jul 25, 2016

h0tbird commented Aug 25, 2016

h0tbird commented Sep 20, 2016

h0tbird commented Sep 20, 2016

JinsYin commented Mar 4, 2017

mlowicki commented Jul 18, 2016 •

edited

Loading

rasputnik commented Jul 18, 2016 •

edited

Loading