Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark CGroups as off when missing essential controllers #19176

Merged
merged 8 commits into from
Dec 15, 2023

Conversation

DavidVentura
Copy link
Contributor

@DavidVentura DavidVentura commented Nov 26, 2023

On Linux systems that have been built without all necessary CGroup controllers enabled, you may get a non-descriptive error when initializing cgroupslib:

[ERROR] client.proclib.cg2: failed to create nomad cgroup: error="write /sys/fs/cgroup/cgroup.subtree_control: invalid argument"

A similarly-opaque error is logged on every run of cpuparts_hook, but I could not find a simple way of clarifying the error logs.

This PR checks whether all expected controllers are available, and will log an error if they are not.
On top of that, the controllers will be enabled one at a time, giving a more clear error when one of them is not supported.

You can validate this PR by appending cgroup_disable=cpuset to your kernel's command line arguments (/etc/default/grub or similar)

@hashicorp-cla
Copy link

hashicorp-cla commented Nov 26, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DavidVentura! The thing that jumps out at me with this PR is that we're now potentially making a partially-applied update to the cgroups because we're not writing updating a single write.

I'd like to pull in @shoenig on this, because it's an area he's been working in a lot recently. But my gut tells me that having the check for the controllers by reading them is a nice UX improvement but we should maybe leave the writes alone. But I think I might be missing something about the setup.

A few housekeeping items aside:

  • You'll need to take the PR out of draft for CI to run.
  • Can you run make cl to add a changelog entry?

@DavidVentura
Copy link
Contributor Author

I wasn't entirely convinced on whether the writes should be partially applied or not; the behavior for systems with missing support for a controller changes:

When applying a single write for all controllers, (echo "cpuset cpu pids" > /sys/fs..), if any controller is missing, none of them will be applied.
When applying a write per controller (echo "+cpuset" > /sys/fs/..; echo "+cpu" > /sys/fs..), only the missing controllers will not be applied.

I can also attempt to apply them all, but I'm not sure what that'd achieve, as anything dependent on the actual controller is bound to fail regardless.

@DavidVentura DavidVentura marked this pull request as ready for review December 4, 2023 10:38
@shoenig
Copy link
Member

shoenig commented Dec 4, 2023

@DavidVentura can you describe more about your environment? I would expect pretty much any distro with cgroups v2 enabled at all to also have the basic controllers available.

I'm hesitant to be supportive of an environment where this isn't true - Nomad relies a great deal on these controllers being enabled to provide the resource isolation gauntness that it does.

@DavidVentura
Copy link
Contributor Author

DavidVentura commented Dec 4, 2023

I am experimenting with nomad on risc-v, the default kernel shipped in most VisionFive2 distros comes with the cgroup controllers disabled (see PR to enable it).

It took me a while to figure out that the controllers being disabled was causing the preexec_fn to fail for raw_exec, so I wanted to make Nomad more clearly log whenever there's an unsupported cgroup configuration.

I couldn't figure out how to completely disable raw_exec in a case where the cgroup config was not supported, but that'd also be fine with me 😁

@shoenig
Copy link
Member

shoenig commented Dec 4, 2023

Ahh, I have been blissfully unaware of RISC-V development so far, kinda neat if you can already get Nomad running on it!

Thinking about this a little more, it might make sense to simply check for the enabled controllers at the point in time we switch on cgroups v1 or v2 (or off - which is already a thing). If we can detect the necessary controllers are not enabled we can return OFF here and Nomad should already do the "right" thing for a system without cgroups.

https://github.com/hashicorp/nomad/blob/main/client/lib/cgroupslib/mount.go#L41-L43

@DavidVentura
Copy link
Contributor Author

I think your idea is a better way to implement this functionality.
I started making the necessary changes and realized that the callers for GetMode() do not really respect OFF -- they will default to CG2. (see 1, 2)

Making this change would require propagating errors back through quite a few places; I'm happy to do it, but want to double check with you before starting

@tgross
Copy link
Member

tgross commented Dec 14, 2023

In #19481 we've got a report with the identical symptoms but in this case the cpuset controller is activated. The results is that they've got /sys/fs/cgroup/nomad.slice/cpuset.cpus, but not /sys/fs/cgroup/nomad.slice/share.slice/cpuset.cpus. My read of the cgroupslib suggests that path only works if we're in the OFF case, as @DavidVentura is saying. Which I wouldn't expect to be the case if cgroups v2 is mounted, which it is.

I'm following up on this report and hopefully I'll be able to come back here with some insights.

@DavidVentura DavidVentura changed the title Log a more descriptive error message when failing to enable CGroup controllers Mark CGroups as off when missing essential controllers Dec 14, 2023
@tgross tgross self-assigned this Dec 14, 2023
@DavidVentura
Copy link
Contributor Author

DavidVentura commented Dec 14, 2023

I've updated this per @shoenig's recommendation; in the end, I didn't need to update all the paths which behave incorrectly on the OFF case, as they are not called at all.
This change results in the java and exec drivers getting disabled if you are missing some of the essential controllers.
The message is not super clear in the UI and there are currently no logs to indicate what the problem may be.

image

@tgross
Copy link
Member

tgross commented Dec 14, 2023

Thanks @DavidVentura! I'll review this tomorrow.

@tgross
Copy link
Member

tgross commented Dec 15, 2023

Follow-up on my comment in #19176 (comment) is that the memory cgroup controller was disabled in that case. So this PR should help that as well.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @DavidVentura!

@tgross tgross added this to the 1.7.x milestone Dec 15, 2023
@tgross
Copy link
Member

tgross commented Dec 15, 2023

Once CI is complete, I'll merge this and get it backported so it can ship in the next release of Nomad 1.7.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.7.x backport to 1.7.x release line theme/cgroups cgroups issues type/enhancement
Projects
Development

Successfully merging this pull request may close these issues.

pre-run hook "cpuparts_hook" failed when memory cgroup controller disabled
4 participants