Add nomad.client.allocs.memory.mapped_file metric #10067
Labels
good first issue
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
theme/metrics
type/enhancement
Milestone
Nomad version
Tested on 0.12.x, but metrics documentation does not imply any differences on newer versions
Operating system and Environment details
CentOS 7.7, docker 19.03
Issue
We see oomkills happen when the nomad.client.allocs.memory.rss is well below nomad.client.allocs.memory.allocation limit. nomad.client.allocs.memory.usage may be above or below the allocation limit prior to the oomkiller being invoked.
cgroup memory stats include:
I believe nomad presents these:
From the cgroup kernel memory documentation (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt)
Note:
Only anonymous and swap cache memory is listed as part of 'rss' stat.
This should not be confused with the true 'resident set size' or the
amount of physical memory used by the cgroup.
'rss + mapped_file" will give you resident set size of cgroup.
(Note: file and shmem may be shared among other cgroups. In that case,
mapped_file is accounted only when the memory cgroup is owner of page
cache.)
The oomkiller is invoked when the cgroups resident memory set (rss+mapped_file) exceeds the cgroup memory limit (nomad.client.allocs.memory.allocation). Since the cache and usage metrics contain both reclaimable and non-reclaimable memory, study of these alone are not sufficient to determine how close an allocation is or was to being oomkilled prior to the event. The available metrics are insufficient to review whether memory limits need increasing to avoid a future oomkill.
If mapped_file were made available, then it would be possible (by addition of rss+mapped_file, or subtraction of cache and mapped_file) to determine whether usage is close to the allocation limit because of non-reclaimable memory (and thus in danger of being oomkilled), or reclaimable memory (which is burstable above the allocation limit and therefore no danger of being oomkilled).
It looks like these stats are being pulled from docker (
nomad/drivers/docker/util/stats_posix.go
Line 18 in 9a36ebe
From an example oomkill, with job memory limit set to 9GB, a task was killed when rss+mappedfile (7183932KB + 2017296KB) exceeded 9GB. Our metric graphs of the task leading up to the oomkill event showed rss to be well below the limit,while usage was regularly spiking above the allocation limit. It is likely that prior to the oomkill cache contained reclaimable memory, but at the point of oomkill this was almost completely reclaimed and so a kill was unavoidable (the difference between cache and mapped_file is approx 15MB at the point of oomkill). However, it's impossible to tell what the relative levels of reclaimable memory versus mapped file memory was leading up to the crash.
The text was updated successfully, but these errors were encountered: