Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory exhaustion using haproxy image #194

Closed
killianmuldoon opened this issue Oct 4, 2022 · 3 comments
Closed

Memory exhaustion using haproxy image #194

killianmuldoon opened this issue Oct 4, 2022 · 3 comments

Comments

@killianmuldoon
Copy link

When starting the haproxy image it gets OOM-killed after using up all the memory on my system (32GB + 8GB Swap) almost immediately.

I'm running using the below command - where the config file is this one

docker run -v $(pwd)/images/haproxy:/usr/local/etc/conf:ro -m 1000000000 haproxy:2.6 -f /usr/local/etc/conf/haproxy.cfg

(Note I've set a 1GB memory limit on the above to demonstrate the problem so anyone trying to replicate doesn't get exhausted completely)

I've tested this behaviour on all versions back to the haproxy 2.2 image.

This issue can be resolved by setting ulimits as below:

docker run --ulimit nofile=8053:8053 -v $(pwd)/images/haproxy:/usr/local/etc/conf:ro -m 1000000000 haproxy:2.6 -f /usr/local/etc/conf/haproxy.cfg

Or by setting the connection limit e.g.

docker run -v $(pwd)/images/haproxy:/usr/local/etc/conf:ro -m 1000000000 haproxy:2.6 -f /usr/local/etc/conf/haproxy.cfg -n 1000000

The issue looks similar to the one investigated and closed in haproxy/haproxy#1751

I'm wondering why the haproxy docker image might use up so much memory on startup if those limits aren't set and if this is just a docker issue, or related to the binary itself. I wasn't able to replicate this using the 2.4 version of haproxy on the same system.


SYSTEM INFORMATION

Docker version

docker version
Client: Docker Engine - Community
 Version:           20.10.18
 API version:       1.41
 Go version:        go1.18.6
 Git commit:        b40c2f6
 Built:             Thu Sep  8 23:12:30 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.18
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.6
  Git commit:       e42327a
  Built:            Thu Sep  8 23:10:10 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.8
  GitCommit:        9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Fedora version

cat /etc/os-release
NAME="Fedora Linux"
VERSION="36 (Workstation Edition)"
ID=fedora
VERSION_ID=36
VERSION_CODENAME=""
PLATFORM_ID="platform:f36"
PRETTY_NAME="Fedora Linux 36 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:36"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f36/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=36
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=36
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation

Kernel version

5.19.12-200.fc36.x86_64
@yosifkit
Copy link
Member

yosifkit commented Oct 4, 2022

This sounds identical to docker-library/rabbitmq#545. The cause is that fedora and other rpm-based distros set an astronomically large value for open files (1073741816 vs 65536). So, if you are running on a rpm-based OS that sets an extremely high open files limit, then you need to set --ulimit nofile= to a more reasonable value.

@dlipovetsky
Copy link

I think the root cause is HAProxy allocating resources for each connection, up to the maximum, and deriving that maximum (maxconn) from the (very high) kernel default file descriptor limit, which is the effective limit when the container runtime file limit is infinity.

If your platform only supports select and reports "select
FAILED" on startup, you need to reduce maxconn until it works (slightly
below 500 in general). If this value is not set, it will automatically be
calculated based on the current file descriptors limit reported by the
"ulimit -n" command, possibly reduced to a lower value if a memory limit
is enforced, based on the buffer size, memory allocated to compression, SSL
cache size, and use or not of SSL and the associated maxsslconn (which can
also be automatic).
-- https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#maxconn

@lknite
Copy link

lknite commented Apr 14, 2023

Making a note for folks who end up here via kubernetes:

Looks like kubernetes relies on this to be fixed at the container service level, in my case this is containerd fixed like this:

# sed -i 's/LimitNOFILE=infinity/LimitNOFILE=65535/' /usr/lib/systemd/system/containerd.service
# systemctl daemon-reload
# systemctl restart containerd
# k delete deployment <asdf>
  • Note: Once, after a yum update this fix reset and nodes started crashing again. The fix was easy once identified, something to keep in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants