Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nomad.client.allocs.memory.mapped_file metric #10067

Closed
optiz0r opened this issue Feb 22, 2021 · 4 comments
Closed

Add nomad.client.allocs.memory.mapped_file metric #10067

optiz0r opened this issue Feb 22, 2021 · 4 comments
Labels
good first issue stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/metrics type/enhancement
Milestone

Comments

@optiz0r
Copy link
Contributor

optiz0r commented Feb 22, 2021

Nomad version

Tested on 0.12.x, but metrics documentation does not imply any differences on newer versions

Operating system and Environment details

CentOS 7.7, docker 19.03

Issue

We see oomkills happen when the nomad.client.allocs.memory.rss is well below nomad.client.allocs.memory.allocation limit. nomad.client.allocs.memory.usage may be above or below the allocation limit prior to the oomkiller being invoked.

cgroup memory stats include:

  • rss - anonymous memory (not reclaimable)
  • mapped_file - bytes of mapped files (inc tmpfs, shm) (not reclaimable)
  • cache - page cache, including mapped_file (partially reclaimable)

I believe nomad presents these:

  • rss as nomad.client.allocs.memory.usage
  • cache as nomad.client.allocs.memory.cache
  • usage=rss+cache as nomad.client.allocs.memory.usage

From the cgroup kernel memory documentation (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt)

Note:
Only anonymous and swap cache memory is listed as part of 'rss' stat.
This should not be confused with the true 'resident set size' or the
amount of physical memory used by the cgroup.
'rss + mapped_file" will give you resident set size of cgroup.
(Note: file and shmem may be shared among other cgroups. In that case,
mapped_file is accounted only when the memory cgroup is owner of page
cache.)

The oomkiller is invoked when the cgroups resident memory set (rss+mapped_file) exceeds the cgroup memory limit (nomad.client.allocs.memory.allocation). Since the cache and usage metrics contain both reclaimable and non-reclaimable memory, study of these alone are not sufficient to determine how close an allocation is or was to being oomkilled prior to the event. The available metrics are insufficient to review whether memory limits need increasing to avoid a future oomkill.

If mapped_file were made available, then it would be possible (by addition of rss+mapped_file, or subtraction of cache and mapped_file) to determine whether usage is close to the allocation limit because of non-reclaimable memory (and thus in danger of being oomkilled), or reclaimable memory (which is burstable above the allocation limit and therefore no danger of being oomkilled).

It looks like these stats are being pulled from docker (

func DockerStatsToTaskResourceUsage(s *docker.Stats) *cstructs.TaskResourceUsage {
), and mapped_file is already available as a separate statistic ( https://docs.docker.com/config/containers/runmetrics/) and is just not being exposed by nomad.

From an example oomkill, with job memory limit set to 9GB, a task was killed when rss+mappedfile (7183932KB + 2017296KB) exceeded 9GB. Our metric graphs of the task leading up to the oomkill event showed rss to be well below the limit,while usage was regularly spiking above the allocation limit. It is likely that prior to the oomkill cache contained reclaimable memory, but at the point of oomkill this was almost completely reclaimed and so a kill was unavoidable (the difference between cache and mapped_file is approx 15MB at the point of oomkill). However, it's impossible to tell what the relative levels of reclaimable memory versus mapped file memory was leading up to the crash.

image

[Mon Feb 22 16:22:28 2021] qtp808417649-16 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
[Mon Feb 22 16:22:28 2021] Task in /docker/c1fc236fd0d45a32849a9160f9081571c9922bf0384593a4f6b9014e2021ff5c killed as a result of limit of /docker/c1fc236fd0d45a32849a9160f9081571c9922bf0384593a4f6b9014e2021ff5c
[Mon Feb 22 16:22:28 2021] memory: usage 9216000kB, limit 9216000kB, failcnt 326614
[Mon Feb 22 16:22:28 2021] memory+swap: usage 9216000kB, limit 9216000kB, failcnt 17
[Mon Feb 22 16:22:28 2021] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[Mon Feb 22 16:22:28 2021] Memory cgroup stats for /docker/860bf75154d1b2acac9184500d2ce53cab35eb31ef74c5209c9a0e6072c4a56b: cache:2032068KB rss:7183932KB rss_huge:0KB mapped_file:2017296KB swap:0KB inactive_anon:1961656KB active_anon:7253880KB inactive_file:256KB active_file:196KB unevictable:0KB
[Mon Feb 22 16:22:28 2021] Killed process 20367 (java), UID 12345, total-vm:17197659108kB, anon-rss:7115792kB, file-rss:14412kB, shmem-rss:3867124kB
@shoenig
Copy link
Member

shoenig commented Feb 23, 2021

Thanks for the issue @optiz0r, this is super descriptive!

Seems like adding the mapped_file metric should be reasonable. Would you like to try making a PR for this?

@shoenig shoenig added stage/accepted Confirmed, and intend to work on. No timeline committment though. good first issue theme/metrics type/enhancement labels Feb 23, 2021
deblasis added a commit to deblasis/nomad that referenced this issue Nov 12, 2021
- added `mapped_file` in DockerStatsToTaskResourceUsage
- added `mapped_file` in LibcontainerExecutor.handleStats

Fixes hashicorp#10067

Signed-off-by: Alessandro De Blasis <[email protected]>
@deblasis
Copy link
Contributor

Hi there!

many thanks to @optiz0r for the fantastic issue description. 👍
Basically, my PR #11500 was served on a silver platter and I couldn't resist 😄

@tgross
Copy link
Member

tgross commented Jan 10, 2022

Closed by #11500 and will ship in the upcoming release.

@tgross tgross closed this as completed Jan 10, 2022
@tgross tgross added this to the 1.2.4 milestone Jan 10, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/metrics type/enhancement
Projects
None yet
Development

No branches or pull requests

4 participants