Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad exec driver leaks cgroups, causing host system running out of memory #6823

Closed
fho opened this issue Dec 9, 2019 · 4 comments · Fixed by #6839
Closed

Nomad exec driver leaks cgroups, causing host system running out of memory #6823

fho opened this issue Dec 9, 2019 · 4 comments · Fixed by #6839

Comments

@fho
Copy link
Contributor

fho commented Dec 9, 2019

Nomad version

Reproduced with:

Operating system and Environment details

Reproduced with Linux kernels:

  • Ubuntu 4.15.0-1050-gcp
  • ArchLinux 4.19.87-1-lts

Issue

Nomad does not remove cgroups for terminated exec tasks.
This causes that more and more memory is used on the host system by the kernfs_node_cache and task_struct SLAB caches.
This causes that the host system becomes unstable by running out of memory, starting to swap and then page allocation failure happens.

Reproduction steps

1.) Start a batch job via nomad that:

  • runs a command that is available in the exec chroot and finish fast, e.g. /bin/ls
  • runs periodically every 1 second (optionally with prohibit_overlap = true)
    2.)
  • Monitor the number of cgroups on the system created by nomad,
    e.g. via watch -n 1 'find $(ls /sys/fs/cgroup/*/nomad -d) -type d| wc -l', the number is continously growing
  • Monitor slab caches via slabtop -s c -d1, the kernfs_node_cache and task_struct caches are continuously growing

Somewhen the system runs out of available memory, swaps and page allocation failures happen.

Fix: Remove cgroups when an exec task terminates

Job file (if appropriate)

job "example" {
  periodic {
    cron = "*/1 * * * * * *"
    prohibit_overlap = true
  }
  datacenters = ["sandbox"]
  type = "batch"
  group "cache" {
    count = 1

    task "cgroupleak" {
      driver = "exec"
      config {
        command = "/bin/ls"
      }
      resources {
        cpu    = 20 # 500 MHz
        memory = 10 # 256MB
      }
      service {
        name = "cgroupleak"
      }
    }
  }
}
@notnoop notnoop self-assigned this Dec 9, 2019
@notnoop notnoop added this to the 0.10.3 milestone Dec 9, 2019
@notnoop
Copy link
Contributor

notnoop commented Dec 9, 2019

Thanks @fho . I'll investigate this and update you very soon!

@fho
Copy link
Contributor Author

fho commented Dec 13, 2019

Thanks a lot for the fast response and fix!

@notnoop
Copy link
Contributor

notnoop commented Dec 13, 2019

@fho anytime! It'll go out in 0.10.3. Thank you so much for reporting it.

For context, Nomad leaked cgroups in a regression since 0.9.0 :(. If an exec task exits with zero exit code, nomad 0.9 didn't clean up the cgroups. Nomad 0.10.2 fixed this issue in #6722 . But systemd cgroup was special, and we didn't properly clean it up; we addressed it in #6839 .

Let us know if you have any questions or further observations!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.