-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad can no longer launch commands with raw_exec if /sys/fs/cgroup does not exist (old kernels) #8565
Comments
Hi @dposton80! The error you're seeing is bubbling up from the third-party
That support is going to be dependent on which task drivers you have enabled, so that makes it a little tricky to state a specific version. But unfortunately it looks like we don't even document that (or at least anywhere I would expect to see it), so I'm going to mark this as a documentation bug at least. |
As a potential workaround, you can disable plugin "raw_exec" {
config {
no_cgroups = true
}
} raw_exec driver uses cgroup to improve process tracking for metric collection and shutdown purposes, so you may notice some odd behavior with child processes not tracked or killed if they don't clean up properly. |
Thanks for the responses. I already tried the no_cgroups option, it doesn't seem to make any difference I'm afraid. Seems the call to configureResourceContainer in drivers/shared/executor/executor.go line 283 should be skipped if command.BasicProcessCgroup is false? |
Well, that's unfortunate. This is a bug that we should fix - no cgroup operation should occur when |
Have any progress ? I have the same issue in latest nomad 0.12.7 |
When raw_exec is configured with [`no_cgroups`](https://www.nomadproject.io/docs/drivers/raw_exec#no_cgroups), raw_exec shouldn't attempt to create a cgroup. Prior to this change, we accidentally always required freezer cgroup to do stats PID tracking. We already have the proper fallback in place for metrics, so only need to ensure that we don't create a cgroup for the task. Fixes #8565
@qianglchina Thanks for your patience. I have just merged a fix to be included in the next Nomad release. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
0.12.0
Operating system and Environment details
RHEL6, kernel version 2.6.32-754.30.2.el6.x86_64
Issue
Appreciate that older kernel versions may not be supported, in which case please close this. However it may be useful for others.
I was able to run nomad Nomad v0.11.3 on RHEL6 if I ran it with a newer version of libc (which I did via patchelf --set-interpreter glibc-2.19/lib/ld-linux-x86-64.so.2 --set-rpath glibc-2.19/lib). It could successfully launch commands OK and generally worked well.
However with Nomad v 0.12.0, this no longer worked - the exec driver fails with:
Without tracing on, this would just appear as a task 'Driver Failure' with error 'failed to launch command with executor: rpc error: code = Unavailable desc = transport is closing'
The problem seems to be line 45 of /vendor/github.com/opencontainers/runc/libcontainer/cgroups/utils.go in IsCgroup2UnifiedMode
where unifiedMountpoint is "/sys/fs/cgroup". Seems the code now panics if this doesn't exist.
If this were just to return false instead (or the panic were avoided some other way), I think everything would work (seems the code otherwise tolerates cgroups initialization returning an error).
If it's not intended to support certain kernel versions, it might be good to have an error on startup.
The text was updated successfully, but these errors were encountered: