nomad 1.3.0-rc.1 jobs hang/wont restart. cgroups v2? #12863
Labels
stage/accepted
Confirmed, and intend to work on. No timeline committment though.
theme/cgroups
cgroups issues
type/bug
Milestone
Nomad version
Nomad v1.3.0-rc.1 (31b0a18)
Operating system and Environment details
Ubuntu 22.04 Jammy Jellyfish 5.15.0-27-generic #28-Ubuntu SMP Thu Apr 14 04:55:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Issue
Seems like something wonky happens with cgroup v2 support. If I create a job that exits immediately, it stops being restarted. Nomad 1.2.3 (the last version I can use because of the plugin breakage #12071) seems to work fine, although I plan on downgrading nomad again to double check.
Given the job file included:
nomad ui for the allocation shows:
It is currently 18:26, no other restart attempts have been made. the logmon process for the alloc is still running, no processes underneath that or using the allocation dir according to lsof -n +D
If I change the constraint to a ubuntu 20.04 host, it restarts every secondish as expected.
Other issues I have not been able to reproduce with any success:
Also, Might be a bug in with the job, but .. /dev/null seems to disappear. edit: somtimes, for some jobs, but not all the time, this is how I noticed restarts were not, err, restarting. Trying to debug this issue I'm still working to nail this down, feels like it might be related. This is a raw_exec job that make their own restricted mount namespace, it includes /dev/null is and it is writable. Seems to work fine on nomad 1.2.3 on the same host
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: