Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad > 0.9.x causing locking contention on servers with > 100 containers and journald #6759

Closed
42wim opened this issue Nov 22, 2019 · 5 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/logging type/enhancement

Comments

@42wim
Copy link
Contributor

42wim commented Nov 22, 2019

Nomad version

0.9.6

Operating system and Environment details

Linux 3.10.0-957.27.2.el7.x86_64 / centos 7.7

Issue

We noticed some very high load on some of our cluster nodes.

image

Looking at the top of such a system shows the system is mostly idle and almost no wait states. But with an increasing load average

top - 14:13:42 up 2 days, 19:55,  1 user,  load average: 62.34, 54.50, 55.67
Tasks: 2719 total,   2 running, 2715 sleeping,   0 stopped,   2 zombie
%Cpu(s):  3.5 us,  5.1 sy,  0.0 ni, 87.9 id,  3.3 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 16334451+total, 10182440+free, 44146440 used, 17373664 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 11339288+avail Mem

Some further debug lead to the docker threads being the culprit, which shows that they are in call_rwsem_down_read_failed and call_rwsem_down_write_failed

# cat /proc/`pidof dockerd-current`/task/*/stack|grep -i call|sort|uniq -c
     51 [<ffffffffb83888d8>] call_rwsem_down_read_failed+0x18/0x30
     23 [<ffffffffb8388907>] call_rwsem_down_write_failed+0x17/0x30
    137 [<ffffffffb8776ddb>] system_call_fastpath+0x22/0x27

This are apparently minor pagefaults, which can also be seen with ps

# ps -eo min_flt,maj_flt,cmd,args,uid,gid|sort -n| tail -n 3
50461617    1 /usr/sbin/journalbeat -c /e /usr/sbin/journalbeat -c /e     0     0
1383481165  7 /usr/lib/systemd/systemd-jo /usr/lib/systemd/systemd-jo     0     0
10429516026 2262 /usr/bin/dockerd-current -- /usr/bin/dockerd-current --  0     0

So docker is causing lots of page faults together with systemd, after some rabbitholes. This seems to becaused with the mmap() calls in systemd-journald

305788 mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 2401</var/log/journal/0e18c7d54b7f443da8457ca1cdfc066f/system@0af9ce60fd5b401da31bf70686f02374-0000000022ace036-0005979905d59b87.journal>, 0x4986000 <unfinished ...>
233349 mmap(NULL, 8388608, PROT_READ, MAP_SHARED, 6664</run/log/journal/0e18c7d54b7f443da8457ca1cdfc066f/system@d5a96e452e9842289ad7679c5f22ef9e-00000000002e0c5d-000597d3403943b9.journal>, 0xb4b000 <unfinished ...>

A first solution was moving our journald logs from disk (persistent) to memory (volatile) but this didn't fix it.

We also tried newer kernels up to 5.3.11 but fixing this issue in the kernel itself seems to be still a work in progress. For those that want to dive into a deeper rabbit hole: https://lwn.net/Articles/786105/

So as it doesn't seem there's a fix for the symptoms, we went for trying to fix the cause.
And the process causing this is docker_logger from nomad that constantly reads from every container.

We now have a quick fix (by disabling part of this plugin) by adding a return on line 81 in docker_logger.go

go func() {
defer close(d.doneCh)
stdout, stderr, err := d.openStreams(ctx, opts)

go func() { 
 	defer close(d.doneCh) 
        d.logger.Info("no log streaming please")
        return
 	stdout, stderr, err := d.openStreams(ctx, opts) 

It seems we can't just easily remove the whole logger / docker_logger plugin because it's hardcoded.

Afterwards we get normal load averages again

image

Reproduction steps

  • Run 100+ containers with nomad on one host use the journald logging driver.
  • Make sure those containers output a lot of logging to stdout.
  • The load should increase after a couple of hours.

Suggested solution

Add an option to make docker_logger optional.
When you're using journalbeat to send the journald logs to an ELK stack there's no need to have nomad logging output (as in 0.8.x)

@tgross
Copy link
Member

tgross commented Nov 22, 2019

Hi @42wim! Thanks for submitting this issue!

I've seen this exact sort of behavior with dockerd before with other applications that tail the docker socket for the logs, but I agree it's a problem we don't want to perpetuate with Nomad. We're looking into some design improvements to how we manage logs with docker for this reason. I'm going to mark this as an enhancement and I'll try to update here when I have a better idea of what that's going to look like.

@tgross tgross added this to the unscheduled milestone Nov 22, 2019
@notnoop notnoop modified the milestones: unscheduled , 0.10.3 Dec 4, 2019
@tgross tgross modified the milestones: 0.10.3, near-term, unscheduled Jan 6, 2020
@42wim
Copy link
Contributor Author

42wim commented Feb 12, 2020

Seems like PR #6820 will fix this issue (haven't tested it yet)

@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Aug 24, 2020
@cb22
Copy link

cb22 commented Jan 6, 2021

In case it's useful for anyone else, we avoid this problem in our cluster by running with Nomad's Docker log collection turned off, all of our Docker containers set to log via journald (in non-blocking mode, since the data is not critical and 99% of people wouldn't want a logging failure to cause a service interruption!), and a simple system job that continuously streams filtered data off journald and stores it in the exact same files and locations Nomad expects.

This works quite well, makes both nomad logs and the UI work, and we haven't encountered any issues with it in production yet.

@42wim
Copy link
Contributor Author

42wim commented Jan 6, 2021

Forgot about this, the "disable log collection" option in #6820 fixed this issue for me, so I'm going to close it.

@42wim 42wim closed this as completed Jan 6, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/logging type/enhancement
Projects
None yet
Development

No branches or pull requests

4 participants