nomad coping with OS resources limits #6557

notnoop · 2019-10-25T14:26:22Z

When aiming to run many allocations on a single client, it is easy to overlook adjusting the OS resource limits and have nomad hits these limits. As of 0.10.0, Nomad may start user allocations but partially crash before persisting their state, resulting into many leaked processes that Nomad no longer manages. When a client gets to that state, the best option is to destroy client.

Some sample flags for linux:

file limits configured by ulimit. In production environments, it needs to be set high, e.g. 65K. A sample relevant issue is Too many open files #3686 (comment) .
Kernel max pid and thread counts, configured by /proc/sys/kernel/pid_max and /proc/sys/kernel/threads-max or sysctl. These are global limits to possible running pids/threads in whole system. Nomad will fail to spin up more tasks, but worse may succeed at spinning up the tasks but then crash on spinning up monitoring threads
- https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
- https://stackoverflow.com/questions/39585839/what-are-the-factor-on-which-pid-max-depends-in-linux
Cgroup/systemd: Specially in systemd environment, user sessions and services may be started inside a PIDS cgroup with restricted maximum tasks that restricts running pids/threads in a similar fashion to the kernel setting, but targeting individual sessions/services rather than globally.
- It's controlled by via systemd unit files TasksMax: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html ; or in cgroups filesystem (e.g. /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max)
Memory and Swap limits: If processes use more memory than afforded by system, the OOM killer may arbitrary kill important tasks (including nomad agents).
TODO (any other ones?!)

Hitting these limits is damaging to many other critical services, even if Nomad remained healthy. Docker has some known issues and some guidance for tweaking these values[1][2].

We can try to address this by many means; here is a sample of possible actions:

Documentation:
- Highlight above limits and any other relevant ones in Guides for productionizing nomad
- Possibly, highlight sample error messages and logs that are related to above
- Issue guidance for monitoring resource exhausion in system and how to best set alerts
PreFlight Checks:
- Have nomad issue warnings/alerts if limits are low or if a client is close to exhausting
Resiliency patterns
- Before running a task, reserve resources that are needed for task and fail if not enough
- Handle resources exhaustion and ensure that any started tasks get leaked
- Ensure that a client that consistently fail to start clients is marked unhealthy and no longer is assigned work

[1] https://success.docker.com/article/how-to-reserve-resource-temporarily-unavailable-errors-due-to-tasksmax-setting
[2] docker/for-linux#73

The text was updated successfully, but these errors were encountered:

See also: - hashicorp/nomad#6557 - hashicorp/nomad#3686

tgross · 2019-11-01T15:58:38Z

Another example I just documented in #6607 is kernel tunables for bridge networking, which took up a lot of time to figure out in #6580 (comment)

In that scenario we want a particular set of tunables for network namespace / Connect support, but probably don't want to enforce them because it'll break the QEMU task driver:

net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1

(These aren't strictly "resource limitations" but can be set at runtime on clients and change outside of Nomad's control, so I suspect the solutions are going to be similar.)

notnoop added type/enhancement theme/client theme/resource-utilization stage/needs-discussion labels Oct 25, 2019

bdossantos added a commit to bdossantos/ansible-nomad that referenced this issue Oct 28, 2019

chore: set LimitNOFILE and LimitNPROC to infinity

ab7b6df

See also: - hashicorp/nomad#6557 - hashicorp/nomad#3686

tgross mentioned this issue Nov 1, 2019

Cannot use consul connect deployed through nomad with envoy proxy on centos 7 #6580

Closed

notnoop mentioned this issue May 6, 2020

Pending allocations in nomad 0.11, unable to process periodic tasks #7787

Closed

notnoop mentioned this issue Jul 21, 2020

Orphaned allocations when stopping a job with failures #8475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nomad coping with OS resources limits #6557

nomad coping with OS resources limits #6557

notnoop commented Oct 25, 2019 •

edited by preetapan

Loading

tgross commented Nov 1, 2019 •

edited

Loading

nomad coping with OS resources limits #6557

nomad coping with OS resources limits #6557

Comments

notnoop commented Oct 25, 2019 • edited by preetapan Loading

tgross commented Nov 1, 2019 • edited Loading

notnoop commented Oct 25, 2019 •

edited by preetapan

Loading

tgross commented Nov 1, 2019 •

edited

Loading