Skip to content
This repository has been archived by the owner on Nov 8, 2024. It is now read-only.

nomad-driver-lxc compatibility with LXC 4.0 #30

Open
kwianeck opened this issue Sep 8, 2021 · 5 comments
Open

nomad-driver-lxc compatibility with LXC 4.0 #30

kwianeck opened this issue Sep 8, 2021 · 5 comments

Comments

@kwianeck
Copy link

kwianeck commented Sep 8, 2021

Acording to https://discuss.linuxcontainers.org/t/lxc-4-0-lts-has-been-released/7182, cgroup specification for container and it's monitor have been separated.

My observation is that driver is not requesting to lxc to fully cleanup container. Instead, when stopping task, lxc removes cgroup for container but leaves cgroups for lxc.monitor under /sys/fs/cgroup//
in this way, within the time, we can expect hundreds of left objects for many lxc.monitors of containers which have been removed long time back

ex.

nomad job definition:
job "alpine2" {
  datacenters = ["DC1"]
  type = "service"

  group "lxc-alpine2" {
    count = 1
    
    task "lxc-alpine2" {
      driver = "lxc"
      config {
        log_level = "trace"
        verbosity = "verbose"
        template = "/usr/share/lxc/templates/lxc-alpine"
      }
      resources {
        cpu      = 500
        memory   = 256
      }
    }
  }
}

Output after alpine2 removal (nomad job stop alpine2)

nomad-lxc-client:/sys/fs/cgroup/devices# lxc-ls
alpine1
nomad-lxc-client:/sys/fs/cgroup/devices# ls | grep lxc.
lxc.monitor.alpine1
lxc.monitor.lxc-alpine2-b88164f4-99ce-c0d8-d8a1-d68df8762bab
lxc.monitor.lxc-container-58c0cd07-beae-5638-317f-34bde2622e06
lxc.monitor.lxc-container-992811d2-3ec6-995f-69f6-08bbfc4d1521
lxc.payload.alpine1
lxc.pivot
nomad-lxc-client:/sys/fs/cgroup/devices# 

as you can see, lxc.payload directory has gone (container's specific cgroups), however, lxc-monitor cgroups stay

using LXC version 4.0.6
...
--- Control groups ---
Cgroups: enabled

Cgroup v1 mount points: 
/sys/fs/cgroup/systemd
/sys/fs/cgroup/pids
/sys/fs/cgroup/blkio
/sys/fs/cgroup/perf_event
/sys/fs/cgroup/cpu,cpuacct
/sys/fs/cgroup/net_cls,net_prio
/sys/fs/cgroup/freezer
/sys/fs/cgroup/cpuset
/sys/fs/cgroup/memory
/sys/fs/cgroup/rdma
/sys/fs/cgroup/devices
/sys/fs/cgroup/hugetlb

Cgroup v2 mount points: 
/sys/fs/cgroup/unified

Cgroup v1 clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled
...
@kwianeck
Copy link
Author

kwianeck commented Sep 9, 2021

From what i can see, nomad-lxc-driver does not release handles to the most recently created container in /sys/fs/cgroup//lxc.monitor... as per below

├─lxc.monitor.b1-b74fc4d5-5dc6-715f-f4ab-9024b01ba7a6
│ ├─25364 /opt/nomad/data/plugins/nomad-driver-lxc
│ └─28310 [lxc monitor] /var/lib/lxc b1-b74fc4d5-5dc6-715f-f4ab-9024b01ba7a6
└─lxc.monitor.b2-c0881597-2453-9836-a739-c362bb2dd990
  └─27746 [lxc monitor] /var/lib/lxc b2-c0881597-2453-9836-a739-c362bb2dd990

b1 has been created after b2. as you can see, b2 is not occupied by driver's process. to properly cleanup the container (remove all its artifacts from nomad client) i need to either restart nomad which restarts also driver's process, or create another container to release driver's handles to container which will be removed later. i am not programmer so cannot see how it can be fixed in the driver's code

i used nomad 1.1.3 and 1.1.4. i recompiled the driver for both versions with 2 different pkg.in/go-lxc.v2 drivers (version from 2018 and 2021). no differences i guess the issue is in the driver's code and not in any dependecied library

@kwianeck
Copy link
Author

hey,

is there anyone who has similar issue and found solution?

@mccaddon
Copy link

mccaddon commented Oct 14, 2021

@kwianeck are you using Centos/Redhat/VzLinux? I had encountered reoccurring issues with lxc/lxd and cgroups on Centos (using lxd snap package) but they have went away on Ubuntu 20.04.

We are using LXD/LXC on Ubuntu 20.04, version 4.0.7 and during my testing I encountered issues that I couldn't resolve. The error complains about network type configuration, I am guessing that it should use the default lxc profile by default? I've included my default profile and other details. I hope it helps resolve issues with this plugin as I'd love to start using nomad for all my lxd/lxc containers!

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"

$ lxc --version
4.0.7

$ lxc profile show default
config: {}
description: Default LXD profile
devices:
  root:
    path: /
    pool: default
    type: disk
name: default
used_by: []


$ cat test.nomad
job "example-lxc" {
  datacenters = ["dc1"]
  type        = "service"

  group "example" {
    task "example" {
      driver = "lxc"

      config {
        log_level = "info"
        verbosity = "verbose"
        template  = "/usr/share/lxc/templates/lxc-busybox"
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}


# nomad agent -dev -bind 0.0.0.0 -log-level INFO -plugin-dir /opt/nomad/data/plugins
    2021-10-14T17:47:59.345Z [INFO]  client.driver_mgr.nomad-driver-lxc: starting lxc task: driver=lxc @module=lxc driver_cfg="{Template:/usr/share/lxc/templates/lxc-busybox Distro: Release: Arch: ImageVariant: ImageServer: GPGKeyID: GPGKeyServer: DisableGPGValidation:false FlushCache:false ForceCache:false TemplateArgs:[] LogLevel:info Verbosity:verbose Volumes:[]}" timestamp=2021-10-14T17:47:59.345Z
    2021-10-14T17:47:59.691Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=073fb260-00f9-3ae7-704b-c9a3f7a777d9 task=example error="rpc error: code = Unknown desc = error setting network type configuration: setting config item for the container failed"

Thank you.

@h0tw1r3
Copy link
Contributor

h0tw1r3 commented May 26, 2022

It looks like go-lxc only supports cgoups (not cgroups2). I found a few other incompatibilities and bugs while testing lxc 4 support. Will create a merge request and tag this issue "soon".

@h0tw1r3 h0tw1r3 mentioned this issue Jun 5, 2022
@h0tw1r3
Copy link
Contributor

h0tw1r3 commented Jun 24, 2022

#37 was enough to bring up containers, but there were a few problems addressed in #38

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants