Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

Race condition causing not released lock and too many HAProxy processes #267

Closed
mlowicki opened this issue Jul 18, 2016 · 14 comments
Closed

Comments

@mlowicki
Copy link

mlowicki commented Jul 18, 2016

Hi,

Initially I've asked on Marathon mailing list but gather more data and since I've discovered that https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run isn't provided by HAProxy but Marathon-lb then issue probably is caused by later.

Looks like lock isn't released and we've some hanging calls to flock:

root@lb2i:/marathon-lb/service/haproxy# ps aux | grep [f]lock
root       385  0.0  0.0   8464   868 ?        S    Jul13   0:00 flock 200
root       393  0.0  0.0   8464   856 ?        S    Jul13   0:00 flock 200
root       790  0.0  0.0   8464   856 ?        S    Jul13   0:00 flock 200
root      4951  0.0  0.0   8464   936 ?        S    Jul13   0:00 flock 200
root      6551  0.0  0.0   8464   856 ?        S    Jul13   0:00 flock 200
root     11841  0.0  0.0   8464   868 ?        S    Jul13   0:00 flock 200
root     11857  0.0  0.0   8464   864 ?        S    Jul13   0:00 flock 200
root     17515  0.0  0.0   8464   832 ?        S    Jul13   0:00 flock 200
root     17543  0.0  0.0   8464   796 ?        S    Jul13   0:00 flock 200
root     23167  0.0  0.0   8464   876 ?        S    Jul13   0:00 flock 200
root     24277  0.0  0.0   8464   856 ?        S    Jul13   0:00 flock 200
root     27799  0.0  0.0   8464   872 ?        S    Jul13   0:00 flock 200
root     27993  0.0  0.0   8464   880 ?        S    Jul13   0:00 flock 200
root     30333  0.0  0.0   8464   936 ?        S    Jul13   0:00 flock 200
root     34368  0.0  0.0   8464   884 ?        S    Jul13   0:00 flock 200
root     47233  0.0  0.0   8464   880 ?        S    Jul13   0:00 flock 200
root     48034  0.0  0.0   8464   936 ?        S    Jul13   0:00 flock 200
root     49056  0.0  0.0   8464   828 ?        S    Jul13   0:00 flock 200
root@lb2i:/marathon-lb/service/haproxy# flock --wait 5 ./run "echo d"
root@lb2i:/marathon-lb/service/haproxy# echo $?
1
> lsof ./run
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
flock     385 root  200r   REG  253,4     1228 8426486 ./run
flock     393 root  200r   REG  253,4     1228 8426486 ./run
flock     790 root  200r   REG  253,4     1228 8426486 ./run
flock    4951 root  200r   REG  253,4     1228 8426486 ./run
flock    6551 root  200r   REG  253,4     1228 8426486 ./run
flock   11841 root  200r   REG  253,4     1228 8426486 ./run
flock   11857 root  200r   REG  253,4     1228 8426486 ./run
flock   17515 root  200r   REG  253,4     1228 8426486 ./run
flock   17543 root  200r   REG  253,4     1228 8426486 ./run
flock   23167 root  200r   REG  253,4     1228 8426486 ./run
flock   24277 root  200r   REG  253,4     1228 8426486 ./run
flock   27799 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27800 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27801 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27802 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27803 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27804 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27805 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27806 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27807 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27808 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27809 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27810 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27811 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27812 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27813 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27814 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27815 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27816 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27817 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27818 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27819 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27820 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27821 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27822 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27823 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27824 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27825 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27826 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27827 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27828 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27829 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27830 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27831 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27832 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27833 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27834 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27835 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27836 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27837 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27838 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27839 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27840 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27841 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27842 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27843 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27844 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27845 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27846 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27847 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27914 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27915 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27916 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27917 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27918 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27919 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27920 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27921 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27922 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27923 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27924 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27925 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27926 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27927 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27928 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27929 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27930 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27931 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27932 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27933 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27934 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27935 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27936 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27937 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27938 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27939 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27940 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27941 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27942 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27943 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27944 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27945 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27946 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27947 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27948 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27949 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27950 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27951 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27952 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27953 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27954 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27955 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27956 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27957 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27958 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27959 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27960 root  200r   REG  253,4     1228 8426486 ./run
haproxy 27961 root  200r   REG  253,4     1228 8426486 ./run
flock   27993 root  200r   REG  253,4     1228 8426486 ./run
flock   30333 root  200r   REG  253,4     1228 8426486 ./run
run     34362 root  200r   REG  253,4     1228 8426486 ./run
run     34362 root  255r   REG  253,4     1228 8426486 ./run
flock   34368 root  200r   REG  253,4     1228 8426486 ./run
flock   47233 root  200r   REG  253,4     1228 8426486 ./run
flock   48034 root  200r   REG  253,4     1228 8426486 ./run
flock   49056 root  200r   REG  253,4     1228 8426486 ./run

I'm running:

cat /proc/version
Linux version 4.5.0-0.bpo.1-amd64 ([email protected]) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.5.1-1~bpo8+1 (2016-04-20) 

Initially we've suspected it's related to nbproc we're using from HAProxy but even without it we've reached the state when we had more than one HAProxy process (right after changing configuration, killing old containers and then Marathon starts new one). It doesn't reproduce always though so probably there is some kind of race condition

Lists of HAProxy processes (there are 96 but should be 48 but we had also box with almost 500 HAProxy processes) - https://gist.github.com/mlowicki/1b3672db46c2dcd26f1db7bba004ad99.

@brndnmtthws
Copy link
Contributor

I'm definitely not seeing this in my own tests. Are you using the provided Docker image? Do you have long lived connections?

The recommendation from the HAProxy docs is to not use a value of nbproc > 1, unless you have a strong reason to do so. To quote the docs:

In order to increase the CPU capacity, it is possible to make HAProxy run as
several processes, using the "nbproc" directive in the global section. There
are some limitations though :

  • health checks are run per process, so the target servers will get as many
    checks as there are running processes ;
  • maxconn values and queues are per-process so the correct value must be set
    to avoid overloading the servers ;
  • outgoing connections should avoid using port ranges to avoid conflicts
  • stick-tables are per process and are not shared between processes ;
  • each peers section may only run on a single process at a time ;
  • the CLI operations will only act on a single process at a time.

I've never tried to use the multiprocess feature of HAProxy, so I can't comment on how well it works.

@rasputnik
Copy link

rasputnik commented Jul 18, 2016

I wonder if this is the same issue I've seen on the Bamboo project:

QubitProducts/bamboo#200

@brndnmtthws
Copy link
Contributor

Similar issues have been reported in the past: #5, #28, #71

@mlowicki
Copy link
Author

Fix there was to upgrade Go which isn't used in our case.

@brndnmtthws As I wrote problem reproduces also without using nbproc. We're using mesosphere/marathon-lb:v1.3.1.

Scenario in my head is as follows:

HAProxy handling 1st reload spawns new processes (still leaving old ones - X). When this is still in progress 2nd reload applies and in https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run#L41 $(cat $PIDFILE) evaluates to list of new processed spawned by 1st reload. X haven't been still terminated. When L41 is launched then removing X is somehow cancelled and we're ending up with new set of process (after 2nd reload) and X.

@brndnmtthws
Copy link
Contributor

This is pretty strange, because MLB shouldn't reload HAProxy in parallel.

There's a check in MLB which blocks to make sure the reload actually occurs: https://github.com/mesosphere/marathon-lb/blob/master/marathon_lb.py#L599-L603

And the thread which executes the reloads cannot run in parallel. The code above, however, doesn't check which PIDs have changed, only if there's any change at all. that could be made a little more robust (i.e., check that a new process was spawned).

@mlowicki
Copy link
Author

Yeah, noticed it and seems as you mentioned that could be improved a bit (theoretically still can be vulnerable to my scenario - there are also 2 calls to sleep in https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run#L41 but as it happens randomly it might be somehow bypassed).

Not sure if we should check only that new processes have been added but maybe also that old ones are gone to avoid my scenario? I might be that $(cat $PIDFILE) doesn't report processes waiting to be drained off but still present in OS' process table?

@brndnmtthws
Copy link
Contributor

Checking that old processes are gone doesn't work because there's an assumption that old processes may stick around forever after a reload in the event of long-lived connections.

@mlowicki
Copy link
Author

We might test on our side if this is it by using -st instead of -sf in https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run#L41.

I guess would be good to know first if HAProxy reports in PIDFILE also processes waiting for connections to be closed:

  • If yes then probably further reloads maybe could fix it - although in our case we've seen old HAProxy processes are staying around for at least couple of days and having many changes in Marathon during that time.
  • If not then I would be more convinced to scenario described above.

brndnmtthws added a commit that referenced this issue Jul 18, 2016
When doing a graceful reload, use `pidof haproxy` rather than catting
the pidfile to make sure that we send SIGUSR1 to _all_ existing HAProxy
instances.

This is to address #267.
brndnmtthws added a commit that referenced this issue Jul 18, 2016
When reloading, make sure there's actually a new PID by doing some basic
set math.

To address #267.
@brndnmtthws
Copy link
Contributor

Go ahead and test out the mesosphere/marathon-lb:use-pidof-instead-of-pidfile and mesosphere/marathon-lb:check-for-new-pid Docker images. I'm not sure if those will fix it, but it may help.

brndnmtthws added a commit that referenced this issue Jul 20, 2016
When doing a graceful reload, use `pidof haproxy` rather than catting
the pidfile to make sure that we send SIGUSR1 to _all_ existing HAProxy
instances.

This is to address #267.
brndnmtthws added a commit that referenced this issue Jul 20, 2016
When reloading, make sure there's actually a new PID by doing some basic
set math.

To address #267.
@discordianfish
Copy link
Contributor

I don't believe this will help, the issue appears to be a issue with haproxy itself. Talk to to @lloesche for context.

@h0tbird
Copy link

h0tbird commented Aug 25, 2016

This is also affecting zdd:

zdd: Existing app running 0 instances, new app running 3 instances
zdd: Waiting for 2 pids on http://10.136.114.49:9090
zdd: Waiting for 2 pids on http://10.136.104.102:9090
zdd: Waiting for 2 pids on http://10.136.84.63:9090
zdd: Existing app running 0 instances, new app running 3 instances
zdd: Waiting for 2 pids on http://10.136.114.49:9090
zdd: Waiting for 2 pids on http://10.136.104.102:9090
zdd: Waiting for 2 pids on http://10.136.84.63:9090
zdd: Existing app running 0 instances, new app running 3 instances
zdd: Waiting for 2 pids on http://10.136.114.49:9090
zdd: Waiting for 2 pids on http://10.136.104.102:9090
zdd: Waiting for 2 pids on http://10.136.84.63:9090
zdd: Timed out waiting for tasks to fully drain, find any draining tasks and continue with deployment...

@h0tbird
Copy link

h0tbird commented Sep 20, 2016

Is this fixed in HAProxy 1.6.9? I have updated to af45a72 and I can't reproduce the issue anymore. Will you cut a new release soon?

@h0tbird
Copy link

h0tbird commented Sep 20, 2016

Sorry, the issue seems to reproduce after all :(

brndnmtthws added a commit that referenced this issue Sep 26, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 26, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 26, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 29, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 29, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 29, 2016
This is to address issues #5, #71, #267, #276, and #318.
@JinsYin
Copy link

JinsYin commented Mar 4, 2017

You can try add the follow info.
"volumes": [
{
"containerPath": "/dev/log",
"hostPath": "/dev/log",
"mode": "RW"
}
]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants