cgroup mount detection is not robust to unusual cgroup configurations #19735

googol · 2024-01-14T19:58:30Z

Nomad version

The affected client version:

Nomad v1.7.2
BuildDate 2023-12-13T19:59:42Z
Revision 64e3dca9274b493e38a49fda3a70fd31d0485b91

This is also the version on the server.

Version details for v1.6.4 being used as comparison in logs below

Nomad v1.6.4
BuildDate 2023-12-07T08:27:54Z
Revision dbd5f36a24a924e2ba4dd6195af6a45c922ac8c6

Operating system and Environment details

Unraid Version 6.12.6 2023-12-01 (based on slackware-64 version 15). Kernel 6.1.64.

Using prebuilt nomad binary downloaded from hashicorp, with custom packaging & startup scripts required by unraid

Issue

All allocations fail with the following error messages:

    2024-01-14T21:32:37.463+0200 [ERROR] client.alloc_runner: prerun failed: alloc_id=ed6e46b6-5c6b-3448-b486-4e053b4ac9de error="pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied"
    2024-01-14T21:32:37.463+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=ed6e46b6-5c6b-3448-b486-4e053b4ac9de task=lgtv2mqtt type="Setup Failure" msg="failed to setup alloc: pre-run hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied" failed=true

Reproduction steps

Start nomad v1.7.2 client, try to run a job on it.

Expected Result

Job runs as normal

Actual Result

No jobs can get allocated

Nomad Client logs (if appropriate)

logs from startup of nomad client v1.7.2:

Starting nomad
==> Config enable_syslog is `true` with log_level=INFO
==> Loaded configuration from /boot/config/plugins/nomad/config.d/client.hcl, /boot/config/plugins/nomad/config.d/mounts.hcl, /boot/config/plugins/nomad/config.d/vault.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 10.42.0.70:4646
            Bind Addrs: HTTP: [0.0.0.0:4646]
                Client: true
             Log Level: INFO
                Region: global (DC: homelab)
                Server: false
               Version: 1.7.2

==> Nomad agent started! Log data will stream in below:

    2024-01-14T21:40:27.646+0200 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/mnt/user/appdata/nomad/plugins
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.647+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2024-01-14T21:40:27.648+0200 [INFO]  client: using state directory: state_dir=/mnt/user/appdata/nomad/client
    2024-01-14T21:40:27.649+0200 [INFO]  client: using alloc directory: alloc_dir=/mnt/user/appdata/nomad/alloc
    2024-01-14T21:40:27.649+0200 [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
    2024-01-14T21:40:27.669+0200 [WARN]  client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented"
    2024-01-14T21:40:27.674+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2024-01-14T21:40:27.679+0200 [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=tunl0
    2024-01-14T21:40:27.681+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1
    2024-01-14T21:40:27.686+0200 [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wg0
    2024-01-14T21:40:27.689+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
    2024-01-14T21:40:27.692+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=virbr0
    2024-01-14T21:40:27.788+0200 [INFO]  client.fingerprint_mgr.vault: Vault is available: cluster=default
    2024-01-14T21:40:37.792+0200 [INFO]  client.proclib.cg1: initializing nomad cgroups: cores=0-7
    2024-01-14T21:40:37.792+0200 [ERROR] client.proclib.cg1: failed to set clone_children on nomad cpuset cgroup: error="open /sys/fs/cgroup/cpuset/nomad/cgroup.clone_children: permission denied"
    2024-01-14T21:40:37.792+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2024-01-14T21:40:37.792+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2024-01-14T21:40:37.792+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=device

Nomad 1.6.4 log on the same machine

Starting nomad
==> Config enable_syslog is `true` with log_level=INFO
==> Loaded configuration from /boot/config/plugins/nomad/config.d/client.hcl, /boot/config/plugins/nomad/config.d/mounts.hcl, /boot/config/plugins/nomad/config.d/vault.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 10.42.0.70:4646
            Bind Addrs: HTTP: [0.0.0.0:4646]
                Client: true
             Log Level: INFO
                Region: global (DC: homelab)
                Server: false
               Version: 1.6.4

==> Nomad agent started! Log data will stream in below:

    2024-01-14T21:51:58.363+0200 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/mnt/user/appdata/nomad/plugins
    2024-01-14T21:51:58.364+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2024-01-14T21:51:58.364+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2024-01-14T21:51:58.364+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2024-01-14T21:51:58.364+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2024-01-14T21:51:58.364+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2024-01-14T21:51:58.365+0200 [INFO]  client: using state directory: state_dir=/mnt/user/appdata/nomad/client
    2024-01-14T21:51:58.366+0200 [INFO]  client: using alloc directory: alloc_dir=/mnt/user/appdata/nomad/alloc
    2024-01-14T21:51:58.366+0200 [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
    2024-01-14T21:51:58.387+0200 [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
    2024-01-14T21:51:58.389+0200 [WARN]  client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented"
    2024-01-14T21:51:58.394+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2024-01-14T21:51:58.399+0200 [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=tunl0
    2024-01-14T21:51:58.401+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1
    2024-01-14T21:51:58.406+0200 [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wg0
    2024-01-14T21:51:58.409+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
    2024-01-14T21:51:58.412+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=virbr0
    2024-01-14T21:51:58.493+0200 [INFO]  client.fingerprint_mgr.vault: Vault is available
    2024-01-14T21:52:08.497+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2024-01-14T21:52:08.497+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2024-01-14T21:52:08.497+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=device

Node status

Interesting thing here is that nomad v1.7.2 reports cgroups v1 even though the system has cgroups v2 (and nomad 1.6.4 reports it correctly)

# nomad node status -self -verbose
ID              = d0dd4ee1-9a82-c786-fd35-3e688ac846f1
Name            = drogon
Node Pool       = default
Class           = <none>
DC              = homelab
Drain           = false
Eligibility     = eligible
Status          = ready
CSI Controllers = <none>
CSI Drivers     = <none>
Uptime          = 4h40m32s

Host Volumes
Name                     ReadOnly  Source
# Removed

Drivers
Driver    Detected  Healthy  Message   Time
docker    true      true     Healthy   2024-01-14T21:43:26+02:00
exec      true      true     Healthy   2024-01-14T21:43:26+02:00
java      false     false    <none>    2024-01-14T21:43:26+02:00
qemu      true      true     Healthy   2024-01-14T21:43:26+02:00
raw_exec  false     false    disabled  2024-01-14T21:43:26+02:00

Node Events
Time                       Subsystem  Message                         Details
2024-01-14T21:43:27+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:41:41+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:40:38+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:40:14+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:39:46+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:39:44+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:27:35+02:00  Drain      Node drain complete             <none>
2024-01-14T21:26:51+02:00  Drain      Node drain strategy set         <none>
2024-01-14T17:18:37+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T17:15:59+02:00  Cluster    Node heartbeat missed           <none>

Allocated Resources
CPU      Memory   Disk
0/0 MHz  0 B/0 B  0 B/0 B

Allocation Resource Utilization
CPU      Memory
0/0 MHz  0 B/0 B

Host Resource Utilization
CPU       Memory          Disk
39/0 MHz  714 MiB/16 GiB  (shfs)

Allocations
No allocations placed

Attributes
cpu.arch                        = amd64
cpu.frequency                   = 4000
cpu.modelname                   = Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
cpu.numcores                    = 8
cpu.reservablecores             = 8
cpu.totalcompute                = 32000
cpu.usablecompute               = 32000
driver.docker                   = 1
driver.docker.bridge_ip         = 172.17.0.1
driver.docker.os_type           = linux
driver.docker.runtimes          = io.containerd.runc.v2,io.containerd.runtime.v1.linux,runc
driver.docker.version           = 20.10.24
driver.exec                     = 1
driver.qemu                     = 1
driver.qemu.version             = 7.2.0
kernel.arch                     = x86_64
kernel.name                     = linux
kernel.version                  = 6.1.64-Unraid
memory.totalbytes               = 16647389184
nomad.advertise.address         = 10.42.0.70:4646
nomad.bridge.hairpin_mode       = false
nomad.revision                  = 64e3dca9274b493e38a49fda3a70fd31d0485b91
nomad.service_discovery         = true
nomad.version                   = 1.7.2
numa.node.count                 = 1
numa.node0.cores                = 0-7
os.cgroups.version              = 1
os.name                         = slackware
os.signals                      = SIGSTOP,SIGHUP,SIGILL,SIGPIPE,SIGQUIT,SIGIO,SIGTTIN,SIGUSR1,SIGXCPU,SIGALRM,SIGINT,SIGSEGV,SIGSYS,SIGABRT,SIGIOT,SIGTERM,SIGXFSZ,SIGNULL,SIGBUS,SIGTRAP,SIGTTOU,SIGTSTP,SIGCONT,SIGFPE,SIGKILL,SIGPROF,SIGUSR2,SIGWINCH
os.version                      = 15.0+
plugins.cni.version.bandwidth   = v1.4.0
plugins.cni.version.bridge      = v1.4.0
plugins.cni.version.dhcp        = v1.4.0
plugins.cni.version.dummy       = v1.4.0
plugins.cni.version.firewall    = v1.4.0
plugins.cni.version.host-device = v1.4.0
plugins.cni.version.host-local  = v1.4.0
plugins.cni.version.ipvlan      = v1.4.0
plugins.cni.version.loopback    = v1.4.0
plugins.cni.version.macvlan     = v1.4.0
plugins.cni.version.portmap     = v1.4.0
plugins.cni.version.ptp         = v1.4.0
plugins.cni.version.sbr         = v1.4.0
plugins.cni.version.static      = v1.4.0
plugins.cni.version.tap         = v1.4.0
plugins.cni.version.tuning      = v1.4.0
plugins.cni.version.vlan        = v1.4.0
plugins.cni.version.vrf         = v1.4.0
unique.hostname                 = drogon
unique.network.ip-address       = 10.42.0.70
unique.storage.bytesfree        = 240949764096
unique.storage.bytestotal       = 256060481536
unique.storage.volume           = shfs
vault.accessible                = true
vault.cluster_id                = 68f34609-8077-1a60-7578-13f59359f3ca
vault.cluster_name              = vault-cluster-5a052fb2
vault.version                   = 1.15.4

Meta
connect.gateway_image     = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version}
connect.log_level         = info
connect.proxy_concurrency = 1
connect.sidecar_image     = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version}

Nomad v1.6.4 on the same machine

ID              = d0dd4ee1-9a82-c786-fd35-3e688ac846f1
Name            = drogon
Node Pool       = default
Class           = <none>
DC              = homelab
Drain           = false
Eligibility     = eligible
Status          = ready
CSI Controllers = <none>
CSI Drivers     = <none>
Uptime          = 4h47m5s

Host Volumes
Name                     ReadOnly  Source
# Removed

Drivers
Driver    Detected  Healthy  Message   Time
docker    true      true     Healthy   2024-01-14T21:52:08+02:00
exec      true      true     Healthy   2024-01-14T21:52:08+02:00
java      false     false    <none>    2024-01-14T21:52:08+02:00
qemu      true      true     Healthy   2024-01-14T21:52:08+02:00
raw_exec  false     false    disabled  2024-01-14T21:52:08+02:00

Node Events
Time                       Subsystem  Message                         Details
2024-01-14T21:43:27+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:41:41+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:40:38+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:40:14+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:39:46+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T21:39:44+02:00  Cluster    Node heartbeat missed           <none>
2024-01-14T21:27:35+02:00  Drain      Node drain complete             <none>
2024-01-14T21:26:51+02:00  Drain      Node drain strategy set         <none>
2024-01-14T17:18:37+02:00  Cluster    Node reregistered by heartbeat  <none>
2024-01-14T17:15:59+02:00  Cluster    Node heartbeat missed           <none>

Allocated Resources
CPU          Memory      Disk
0/33600 MHz  0 B/16 GiB  0 B/224 GiB

Allocation Resource Utilization
CPU          Memory
0/33600 MHz  0 B/16 GiB

Host Resource Utilization
CPU            Memory          Disk
472/33600 MHz  606 MiB/16 GiB  (shfs)

Allocations
No allocations placed

Attributes
cpu.arch                        = amd64
cpu.frequency                   = 4200
cpu.modelname                   = Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
cpu.numcores                    = 8
cpu.reservablecores             = 8
cpu.totalcompute                = 33600
driver.docker                   = 1
driver.docker.bridge_ip         = 172.17.0.1
driver.docker.os_type           = linux
driver.docker.runtimes          = io.containerd.runc.v2,io.containerd.runtime.v1.linux,runc
driver.docker.version           = 20.10.24
driver.exec                     = 1
driver.qemu                     = 1
driver.qemu.version             = 7.2.0
kernel.arch                     = x86_64
kernel.name                     = linux
kernel.version                  = 6.1.64-Unraid
memory.totalbytes               = 16647389184
nomad.advertise.address         = 10.42.0.70:4646
nomad.bridge.hairpin_mode       = false
nomad.revision                  = dbd5f36a24a924e2ba4dd6195af6a45c922ac8c6
nomad.service_discovery         = true
nomad.version                   = 1.6.4
os.name                         = slackware
os.signals                      = SIGPIPE,SIGPROF,SIGSYS,SIGWINCH,SIGXFSZ,SIGFPE,SIGIOT,SIGUSR2,SIGCONT,SIGSEGV,SIGNULL,SIGTSTP,SIGTTOU,SIGXCPU,SIGQUIT,SIGTERM,SIGTTIN,SIGBUS,SIGKILL,SIGSTOP,SIGTRAP,SIGUSR1,SIGABRT,SIGINT,SIGIO,SIGHUP,SIGILL,SIGALRM
os.version                      = 15.0+
plugins.cni.version.bandwidth   = v1.4.0
plugins.cni.version.bridge      = v1.4.0
plugins.cni.version.dhcp        = v1.4.0
plugins.cni.version.dummy       = v1.4.0
plugins.cni.version.firewall    = v1.4.0
plugins.cni.version.host-device = v1.4.0
plugins.cni.version.host-local  = v1.4.0
plugins.cni.version.ipvlan      = v1.4.0
plugins.cni.version.loopback    = v1.4.0
plugins.cni.version.macvlan     = v1.4.0
plugins.cni.version.portmap     = v1.4.0
plugins.cni.version.ptp         = v1.4.0
plugins.cni.version.sbr         = v1.4.0
plugins.cni.version.static      = v1.4.0
plugins.cni.version.tap         = v1.4.0
plugins.cni.version.tuning      = v1.4.0
plugins.cni.version.vlan        = v1.4.0
plugins.cni.version.vrf         = v1.4.0
unique.cgroup.mountpoint        = /sys/fs/cgroup
unique.cgroup.version           = v2
unique.hostname                 = drogon
unique.network.ip-address       = 10.42.0.70
unique.storage.bytesfree        = 240949563392
unique.storage.bytestotal       = 256060481536
unique.storage.volume           = shfs
vault.accessible                = true
vault.cluster_id                = 68f34609-8077-1a60-7578-13f59359f3ca
vault.cluster_name              = vault-cluster-5a052fb2
vault.version                   = 1.15.4

Meta
connect.gateway_image     = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version}
connect.log_level         = info
connect.proxy_concurrency = 1
connect.sidecar_image     = docker.io/envoyproxy/envoy:v${NOMAD_envoy_version}

Cgroups mount

# mount -l | grep cgroup
cgroup_root on /sys/fs/cgroup type tmpfs (rw,relatime,size=8192k,mode=755,inode64)
none on /sys/fs/cgroup type cgroup2 (rw,relatime)

Cgroup controllers

# cat /sys/fs/cgroup/cgroup.controllers 
cpuset cpu io memory hugetlb pids

The text was updated successfully, but these errors were encountered:

shoenig · 2024-01-16T14:22:40Z

Hi @googol can you describe the Cgroups Mount output more? At a passive glance it seems like there are two mounts over /sys/fs/cgroup

googol · 2024-01-16T15:14:50Z

What sort of info would be useful? Any useful commands to post the output of? It does look odd to me as well, but I don't know why it's set up that way, that's how it comes in the OS. Nomad 1.6.4 does manage to work with it though

The snippet in the issue body is the relevant lines from mount -l

googol · 2024-01-17T21:04:44Z

The same error occurs with 1.7.3:

Nomad 1.7.3 startup log

Starting nomad
==> Config enable_syslog is `true` with log_level=INFO
==> Loaded configuration from /boot/config/plugins/nomad/config.d/client.hcl, /boot/config/plugins/nomad/config.d/mounts.hcl, /boot/config/plugins/nomad/config.d/vault.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 10.42.0.70:4646
            Bind Addrs: HTTP: [0.0.0.0:4646]
                Client: true
             Log Level: INFO
                Region: global (DC: homelab)
                Server: false
               Version: 1.7.3

==> Nomad agent started! Log data will stream in below:

    2024-01-17T22:58:12.001+0200 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/mnt/user/appdata/nomad/plugins
    2024-01-17T22:58:12.002+0200 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2024-01-17T22:58:12.002+0200 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2024-01-17T22:58:12.002+0200 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2024-01-17T22:58:12.003+0200 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2024-01-17T22:58:12.003+0200 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2024-01-17T22:58:12.003+0200 [INFO]  client: using state directory: state_dir=/mnt/user/appdata/nomad/client
    2024-01-17T22:58:12.004+0200 [INFO]  client: using alloc directory: alloc_dir=/mnt/user/appdata/nomad/alloc
    2024-01-17T22:58:12.004+0200 [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
    2024-01-17T22:58:12.025+0200 [WARN]  client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented"
    2024-01-17T22:58:12.049+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2024-01-17T22:58:12.054+0200 [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=tunl0
    2024-01-17T22:58:12.056+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth1
    2024-01-17T22:58:12.060+0200 [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wg0
    2024-01-17T22:58:12.063+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
    2024-01-17T22:58:12.066+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=virbr0
    2024-01-17T22:58:12.069+0200 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=nomad
    2024-01-17T22:58:12.133+0200 [INFO]  client.fingerprint_mgr.vault: Vault is available: cluster=default
    2024-01-17T22:58:22.137+0200 [INFO]  client.proclib.cg1: initializing nomad cgroups: cores=0-7
    2024-01-17T22:58:22.138+0200 [ERROR] client.proclib.cg1: failed to set clone_children on nomad cpuset cgroup: error="open /sys/fs/cgroup/cpuset/nomad/cgroup.clone_children: permission denied"
    2024-01-17T22:58:22.138+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2024-01-17T22:58:22.138+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2024-01-17T22:58:22.138+0200 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2024-01-17T22:58:22.269+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=089461bb-be0d-0aeb-9de2-00a54934a3a0 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.286+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=1b62c2a3-97a4-fa9f-1cb6-c3d3c04696a3 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.298+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=2c1b50d8-bda6-795e-ea9d-6b14c5916b82 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.308+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=379c17f3-807a-bc84-699c-332f9075aa2f task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.319+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=465177e8-f46b-0f9a-fe46-d902a2cb6ddb task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.320+0200 [INFO]  client: node registration complete
    2024-01-17T22:58:22.330+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=48fa55d3-652d-2cb6-120d-6a8e6b794b73 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.341+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=607283fc-0869-8b62-58ec-7c09275bd64e task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.357+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=6598f106-2954-d37f-e0b3-3fd9d43181e8 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.369+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=6c73cff1-87f9-fb92-934e-229b8e07103b task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.379+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=702e1e6a-cb61-5cc2-6f5e-c69638d2105e task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.391+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=7125d727-022b-721e-b71b-fa8bc4341537 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.402+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=78298bf8-c293-72fb-86cb-90107f883b73 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.412+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=809d5daf-1f40-e7b3-f5b9-bfc65688bcc5 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.423+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=8217ec1f-905e-601f-4fff-f292314cec73 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.433+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=946758e4-25c7-c106-5cae-468309319b3b task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.445+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=9e401a0e-0720-7143-f2da-520c14f8f025 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.456+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=ddecd2fe-4204-70c8-e0ac-eafe40a10e0e task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.467+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=f81530e2-3536-10ff-ac1a-50f258100e20 task=xxx type=Received msg="Task received by client" failed=false
    2024-01-17T22:58:22.477+0200 [INFO]  client: started client: node_id=d0dd4ee1-9a82-c786-fd35-3e688ac846f1
    2024-01-17T22:58:22.478+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=946758e4-25c7-c106-5cae-468309319b3b error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied"
    2024-01-17T22:58:22.478+0200 [INFO]  client.gc: marking allocation for GC: alloc_id=946758e4-25c7-c106-5cae-468309319b3b
    2024-01-17T22:58:22.479+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=ddecd2fe-4204-70c8-e0ac-eafe40a10e0e error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied"
    2024-01-17T22:58:22.479+0200 [ERROR] client.alloc_runner: postrun failed: alloc_id=6598f106-2954-d37f-e0b3-3fd9d43181e8 error="hook \"cpuparts_hook\" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied"
    2024-01-17T22:58:22.479+0200 [INFO]  client.gc: marking allocation for GC: alloc_id=78298bf8-c293-72fb-86cb-90107f883b73

shoenig · 2024-01-18T14:38:06Z

Yeah sorry @googol I don't think I'll be able to debug this until I have a chance to load up slackware in a VM and poke around. It's unclear what the cgroup mount configuration is and Nomad 1.7 makes some assumptions in how that should look.

googol · 2024-01-18T14:47:06Z

Ok, that piece of code clearly explains why I'm seeing this problem, and raises some new questions:

why are there two mounts for /sys/fs/cgroups
is that a slackware or unraid thing
would there be a better algorithm for the cgroup detection? The current one is clearly wrong, since only cgroups v2 is available on this system. Maybe it should error out completely if it cannot determine the version for sure, or just look at all the /sys/fs/cgroups mounts and if any of them are cgroup2 choose v2

googol · 2024-01-18T15:35:16Z

This looks like an unraid specific thing, from what I've looked up now. The init script /etc/rc.d/rc.S on my live system has this snippet for configuring cgroups:

# Mount Control Groups filesystem interface:
if grep -wq cgroup /proc/filesystems ; then
  # Check if unraidcgroup1 is passed over in command line
  if grep -wq unraidcgroup1 /proc/cmdline ; then
    if [ -d /sys/fs/cgroup ]; then
      # See linux-*/Documentation/cgroups/cgroups.txt (section 1.6)
      # Check if we have some tools to autodetect the available cgroup controllers
      if [ -x /bin/cut -a -x /bin/tail ]; then
        # Mount a tmpfs as the cgroup filesystem root
        mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
        # Autodetect available controllers and mount them in subfolders
        controllers="$(/bin/cut -f 1 /proc/cgroups | /bin/tail -n +2)"
        for i in $controllers; do
          mkdir /sys/fs/cgroup/$i
          mount -t cgroup -o $i $i /sys/fs/cgroup/$i
        done
        unset i controllers
        # Eric S. figured out this needs to go here...
        echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy
      else
        # We can't use autodetection so fall back mounting them all together
        mount -t cgroup cgroup /sys/fs/cgroup
      fi
    else
      mkdir -p /dev/cgroup
      mount -t cgroup cgroup /dev/cgroup
    fi
  else
    if [ -d /sys/fs/cgroup ]; then
      # See https://docs.kernel.org/admin-guide/cgroup-v2.html (section Mounting)
      # Mount a tmpfs as the cgroup2 filesystem root
      mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
      mount -t cgroup2 none /sys/fs/cgroup
    else
      mkdir -p /dev/cgroup
      mount -t cgroup2 none /dev/cgroup
    fi
  fi
fi

The upstream slackware64-15 sources seem to have a slightly simpler script though:

# Mount Control Groups filesystem interface:
if [ -z "$container" ]; then
  if grep -wq cgroup /proc/filesystems ; then
    if [ -d /sys/fs/cgroup ]; then
      # See linux-*/Documentation/cgroups/cgroups.txt (section 1.6)
      # Check if we have some tools to autodetect the available cgroup controllers
      if [ -x /bin/cut -a -x /bin/tail ]; then
        # Mount a tmpfs as the cgroup filesystem root
        mount -t tmpfs -o mode=0755,size=8M cgroup_root /sys/fs/cgroup
        # Autodetect available controllers and mount them in subfolders
        controllers="$(/bin/cut -f 1 /proc/cgroups | /bin/tail -n +2)"
        for i in $controllers; do
          mkdir /sys/fs/cgroup/$i
          mount -t cgroup -o $i $i /sys/fs/cgroup/$i
        done
        unset i controllers
      else
        # We can't use autodetection so fall back mounting them all together
        mount -t cgroup cgroup /sys/fs/cgroup
      fi
    else
      mkdir -p /dev/cgroup
      mount -t cgroup cgroup /dev/cgroup
    fi
  fi
fi

I'll raise an issue with unraid to verify

googol · 2024-02-09T13:36:24Z

Update:

The odd looking mount of cgroups is unraid-specific, base slackware doesn't include cgroups v2 support
Unraid has merged a patch to remove the tmpfs mount, so for me this should be resolved in the next unraid release. I'll let you know when I've been able to test it.

I think the cgroup detection logic should be changed from the current model to something a bit more robust. Since it is valid to mount cgroups v2 on top of a tmpfs, checking the first listed mount on the /sys/fs/cgroup path just is not enough.

tgross · 2024-11-08T20:55:38Z

Doing some issue board cleanup and noticed this got left in a bit of a limbo. I've re-titled it to reflect the current state and marked it for roadmapping.

googol · 2024-11-08T21:45:41Z

Thanks Tim! Looks like I've forgotten to report back like I said I would in my last comment, but unraid released their changes which fixed my problem as expected. So yeah my original immediate problem is solved (running nomad client on unraid), but of course this could come up in other systems for you.

Thanks for the help on this, @shoenig pointing to the relevant bit of the code helped me get this fixed on unraid's side!

googol added the type/bug label Jan 14, 2024

shoenig added the stage/waiting-reply label Jan 16, 2024

tgross assigned shoenig Jan 17, 2024

shoenig added stage/needs-investigation and removed stage/waiting-reply labels Jan 18, 2024

shoenig removed their assignment Jan 18, 2024

googol mentioned this issue Jan 30, 2024

Extra tmpfs mount on /sys/fs/cgroup causing problems with Hashicorp Nomad cgroup version detection unraid/webgui#1598

Closed

tgross added this to Nomad - Community Issues Triage Jun 24, 2024

tgross moved this to Triaging in Nomad - Community Issues Triage Jun 24, 2024

tgross moved this from Triaging to Needs Roadmapping in Nomad - Community Issues Triage Nov 8, 2024

tgross changed the title ~~Nomad v1.7.2 fails to set up alloc: pre-run hook "cpuparts_hook" failed: open /sys/fs/cgroup/cpuset/nomad/share/cpuset.cpus: permission denied, incorrectly detects cgroups as v1~~ cgroup mount detection is not robust to unusual cgroup configurations Nov 8, 2024

tgross added theme/cgroups cgroups issues stage/accepted Confirmed, and intend to work on. No timeline committment though. hcc/jira and removed stage/needs-investigation labels Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cgroup mount detection is not robust to unusual cgroup configurations #19735

cgroup mount detection is not robust to unusual cgroup configurations #19735

googol commented Jan 14, 2024 •

edited

Loading

shoenig commented Jan 16, 2024

googol commented Jan 16, 2024 •

edited

Loading

googol commented Jan 17, 2024

shoenig commented Jan 18, 2024

googol commented Jan 18, 2024

googol commented Jan 18, 2024

googol commented Feb 9, 2024

tgross commented Nov 8, 2024

googol commented Nov 8, 2024

cgroup mount detection is not robust to unusual cgroup configurations #19735

cgroup mount detection is not robust to unusual cgroup configurations #19735

Comments

googol commented Jan 14, 2024 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Nomad Client logs (if appropriate)

Node status

Cgroups mount

Cgroup controllers

shoenig commented Jan 16, 2024

googol commented Jan 16, 2024 • edited Loading

googol commented Jan 17, 2024

shoenig commented Jan 18, 2024

googol commented Jan 18, 2024

googol commented Jan 18, 2024

googol commented Feb 9, 2024

tgross commented Nov 8, 2024

googol commented Nov 8, 2024

googol commented Jan 14, 2024 •

edited

Loading

googol commented Jan 16, 2024 •

edited

Loading