Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Add cgroup v2 support to CoreCLR #25906

Closed
wants to merge 1 commit into from
Closed

Conversation

omajid
Copy link
Member

@omajid omajid commented Jul 26, 2019

Upstream cgroup v2 documentation is available at: https://www.kernel.org/doc/Documentation/cgroup-v2.txt

Some notable differences between cgroup v1 and v2, from a CoreCLR point of view, include:

  • cgroup v2 has a single hierarchy, so we just look for a single cgroup2 entry in /proc/self/mountinfo (without looking for a subsystem match).

  • Since cgroup v2 has a single hierarchy, /proc/self/cgroup generally has a single line 0::/path. There's no need to match subsystems or hierarchy ids here.

  • memory.limit_in_bytes is now memory.max. It can contain the literal max to indicate no limit.

  • memory.usage_in_bytes is now memory.current

  • cpu.cfs_quota_us and cpu.cfs_period_us have been combined into a single cpu.max file with the format $MAX $PERIOD. The max value can be a literal max to indicate a limit is not active.

AFAIK no container runtimes support cgroupv2 yet. I have tested this on a cgroup2 using host, outside of containers, only. Tested on Fedora 31 with podman.

Fixes #25821

@omajid
Copy link
Member Author

omajid commented Jul 26, 2019

cc @tmds

@omajid
Copy link
Member Author

omajid commented Jul 30, 2019

@jerboaa (thanks!) pointed out that crun can be used to run containers with memory limits:

Linking the code below with coreclr lets me verify the memory limits, at least:

$ sudo podman run --runtime=/usr/bin/crun --volume $(pwd):/clrcg2 --memory=30M --rm -ti fedora:30 /clrcg2/test
initialized cgroup
31457280
1 724992
0 0
cleaned up cgroups

Unfortunately, cpus isn't supported by podman/crun with cgroup v2 yet.

Code:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

#include "cgroup.h"

size_t GetRestrictedPhysicalMemoryLimit();
bool GetPhysicalMemoryUsed(size_t* used);
bool GetCpuLimit(uint32_t* cpu);

int main(int, char *[])
{
    InitializeCGroup();
    printf("initialized cgroup\n");

    size_t physical_memory_limit = GetRestrictedPhysicalMemoryLimit();
    printf("%lu\n", physical_memory_limit);

    size_t used_memory = 0;
    bool okay_memory_used = GetPhysicalMemoryUsed(&used_memory);
    printf("%d %lu\n", okay_memory_used, used_memory);

    uint32_t cpu_limit = 0;
    bool okay_cpu_limit = GetCpuLimit(&cpu_limit);
    printf("%d %u\n", okay_cpu_limit, cpu_limit);

    CleanupCGroup();
    printf("cleaned up cgroups\n");
    return 0;
}

@omajid omajid closed this Aug 15, 2019
@omajid omajid reopened this Aug 15, 2019
@jerboaa
Copy link

jerboaa commented Aug 16, 2019

@jerboaa (thanks!) pointed out that crun can be used to run containers with memory limits:
[...]
Unfortunately, cpus isn't supported by podman/crun with cgroup v2 yet.

That's not quite accurate. One can run with the cpu controller enabled on cgroups v2 only using podman and crun:

$ cat /sys/fs/cgroup/user.slice/cgroup.subtree_control
cpu io memory pids

$ podman --runtime=/usr/bin/crun run --memory=300M --cpu-quota=10000 --cpu-period=10000 -ti fedora:30 /bin/bash
[root@c6561b50c547 /]# cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cpu.max
10000 10000

Or using --cpu-shares:

$ podman --runtime=/usr/bin/crun run --memory=300M --cpu-shares=1024 -ti fedora:30 /bin/bash
[root@e18f01fca65a /]# cat /sys/fs/cgroup$(cat /proc/self/cgroup | cut -d':' -f3)/cpu.weight
39

src/gc/unix/cgroup.cpp Outdated Show resolved Hide resolved
Upstream cgroup v2 documentation is available at:
https://www.kernel.org/doc/Documentation/cgroup-v2.txt

Some notable differences between cgroup v1 and v2, from a CoreCLR point
of view, include:

- cgroup v2 has a single hierarchy, so we just look for a single "cgroup2"
  entry in /proc/self/mountinfo (without looking for a subsystem match).

- Since cgroup v2 has a single hierarchy, /proc/self/cgroup generally
  has a single line "0::/path". There's no need to match subsystems or
  hierarchy ids here.

- "memory.limit_in_bytes" is now "memory.max". It can contain the
  literal "max" to indicate no limit.

- "memory.usage_in_bytes" is now "memory.current"

- "cpu.cfs_quota_us" and "cpu.cfs_period_us" have been combined into a
  single "cpu.max" file with the format "$MAX $PERIOD". The max value
  can be a literal "max" to indicate a limit is not active.
@omajid omajid closed this Oct 30, 2019
@omajid omajid reopened this Oct 30, 2019
@omajid
Copy link
Member Author

omajid commented Oct 31, 2019

Anyone willing to review this?

cc @Maoni0 @janvorli @jkotas

@Maoni0
Copy link
Member

Maoni0 commented Oct 31, 2019

cc @lpereira too

@@ -211,11 +345,15 @@ class CGroup

if (strncmp(filesystemType, "cgroup", 6) == 0)
Copy link
Member

@tmds tmds Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the existing code doesn't support v2, it passes this condition.
Maybe we want to make this more strict?

if (strcmp(filesystemType, "cgroup") == 0 ||
    strcmp(filesystemType, "cgroup2") == 0)

Since we're comparing with a literal using strncmp doesn't add much benefit over strcmp. Or maybe the compiler flags don't allow it...

if (strncmp(filesystemType, "cgroup2", 8) == 0)
{
*is_cgroupv2 = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this assignment unconditional, so is_cgroupv2 is also assigned when filesystemType is "cgroup":
*is_cgroupv2 = strcmp(filesystemType, "cgroup2") == 0;

public:
static void Initialize()
{
s_memory_cgroup_path = FindCgroupPath(&IsMemorySubsystem);
s_cpu_cgroup_path = FindCgroupPath(&IsCpuSubsystem);
s_memory_cgroup_path = FindCGroupPath(&IsCGroup1MemorySubsystem);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change this so it stores the cgroup version in a static int s_cgroup_version?
Then we can switch on the static when the limits are retrieved?
like

    static bool GetCpuLimit(uint32_t *val)
    {
        if (s_cgroup_version == 1)
        {
            return GetCGroup1CpuLimit(val);
        }
        else if (s_cgroup_version == 2)
        {
            return GetCGroup2CpuLimit(val);
        }
        return false;
    }

Copy link
Member

@tmds tmds Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also allow to drop the SearchForFile function and pick the filename based on the s_cgroup_version.

Copy link

@lpereira lpereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a first pass here.

Comment on lines +168 to +176
size_t filename_len = strlen(s_cpu_cgroup_path) + strlen(CGROUP2_CPU_MAX_FILENAME);
filename = (char*)malloc(filename_len + 1);
if (filename == nullptr)
{
goto done;
}

strcpy(filename, s_cpu_cgroup_path);
strcat(filename, CGROUP2_CPU_MAX_FILENAME);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use asprintf() here and make this much cleaner:

if (asprintf(&filename, "%s/%s", s_cpu_group_path, CGROUP2_CPU_MAX_FILENAME) < 0)
    goto done;

Alternatively, just declare a char[PATH_MAX] on the stack and use snprintf() instead, especially if this is called frequently.

Comment on lines +193 to +209
max_quota_string = (char*) malloc(lineLen + 1);
if (max_quota_string == nullptr)
{
goto done;
}
period_string = (char*) malloc(lineLen + 1);
if (period_string == nullptr)
{
goto done;
}

sscanRet = sscanf(line, "%s %s", max_quota_string, period_string);
if (sscanRet != 2)
{
assert(!"Unable to parse " CGROUP2_CPU_MAX_FILENAME " file contents with sscanf.");
goto done;
}
Copy link

@lpereira lpereira Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dangerous: you're writing an unbounded amount of characters to both max_quota_string and period_string. Consider just slicing line instead by looking for ' ' with memchr() and writing a '\0' there. Saves two malloc() calls too:

char *space = memchr(line, ' ', lineLen);
if (!space) goto done;

max_quota_string = line;
*space = '\0';
period_string = space + 1;

Comment on lines +217 to +229
errno = 0;
max_quota = atoll(max_quota_string);
if (errno != 0)
{
goto done;
}

errno = 0;
period = atoll(period_string);
if (errno != 0)
{
goto done;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atoll() doesn't detect errors, so it's moot to set errno here as if you were to use strtol() instead. Prefer to use strtol(), even though you're reading a file from the system.

Comment on lines +184 to +187
if (getline(&line, &lineLen, file) == -1)
{
goto done;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long these lines can be? Might be worth considering using fgets() and allocating line in the stack instead to save a malloc().

Copy link
Member Author

@omajid omajid Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. The format of this line is $MAX $PERIOD. The values are in microseconds, and the first value allows the special case of a max literal. A default is max 10000. For more information, grep for cpu.max here. But there doesn't seem to be any explicit hard limits on the values here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's probably safe to say that you need the space for two longs in base 10 plus a space?

Unfortunately you can't use std::numeric_limits here, but you can employ a trick to calculate an approximation of the buffer size based on the integer width: 3 * sizeof(integer_type) will give the size for an integer of type integer_type, with space for the terminating NUL, when converted to string in base 10 (it may be larger, but never smaller than needed).

So 1 /* for the space */ + 2 * sizeof(long) */ should be sufficient for a line in this file (and to avoid a malloc roundtrip).

if (strncmp(filesystemType, "cgroup", 6) == 0)
{
if (strncmp(filesystemType, "cgroup2", 8) == 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also match "cgroup2foo". Is this OK, future-wise?

Comment on lines +435 to +437
int sscanfRet = sscanf(line,
"0::%s",
cgroup_path);
Copy link

@lpereira lpereira Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use scanf() with "%s" conversion specifier. It's an unbounded read. There are ways to limit the number of bytes read from it with precision modifiers; the man page mentions how it works. (Or, alternatively, check if the string starts with "0::" and advance the pointer if it does. No need to copy the string.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omajid, @lpereira It is not a problem here. The string storage was allocated to be sufficient (size of the whole line we are scanning). The same for the comment in the sscanf below.

Comment on lines +446 to +449
int sscanfRet = sscanf(line,
"%*[^:]:%[^:]:%s",
subsystem_list,
cgroup_path);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. But I only moved this code around. Is it better to fix it up now or do that cleanup in a separate change?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually a good idea to leave the code better than you found it.

return strcmp("cpu", strTok) == 0;
}

static bool GetCGroup2CpuLimit(UINT *val)
Copy link

@lpereira lpereira Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to GetCGroup1CpuLimit()... can't you use the same function for both, maybe passing a parameter, or reading s_cgroup_version inside to differentiate between versions?

Comment on lines +483 to +489
char* full_filename = (char*)PAL_malloc(len+1);
if (full_filename == nullptr)
{
return nullptr;
}
strcpy_s(full_filename, len+1, search_root);
strcat_s(full_filename, len+1, possible_file);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use snprintf() on a buffer declared with size PATH_MAX, and strdup() when found. Saves a roundtrip to malloc.

@jkotas jkotas added the area-PAL label Nov 2, 2019
@maryamariyan
Copy link
Member

Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:

  1. In your coreclr repository clone, create patch by running git format-patch origin
  2. In your runtime repository clone, apply the patch by running git apply --directory src/coreclr <path to the patch created in step 1>

@janvorli
Copy link
Member

@omajid are you planning to update this PR to reflect some of the feedback to which you've responded with thumbs up before we switch to the dotnet/runtime repo tomorrow?

@omajid
Copy link
Member Author

omajid commented Nov 12, 2019

@janvorli Hi! I definitely intend to update this PR and address all the feedback. Unfortunately, I dont think I can get it finished and tested by tomorrow. Is it okay if I create a new PR in dotnet/runtime with an updated fix based on the comments here?

@janvorli
Copy link
Member

Is it okay if I create a new PR in dotnet/runtime with an updated fix based on the comments here?

Of course!

@janvorli
Copy link
Member

I'm closing the PR in this repo.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adopting cgroup2 in coreclr
8 participants