Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pal cgroup v2 implementation #44990

Merged
merged 2 commits into from
Nov 20, 2020
Merged

Fix pal cgroup v2 implementation #44990

merged 2 commits into from
Nov 20, 2020

Conversation

tmds
Copy link
Member

@tmds tmds commented Nov 20, 2020

Fixes two issues in src/pal/src/misc/cgroup.cpp:

  • No subsystem match must be performed for cgroup v2.
  • Incorrect arguments for sscanf_s when reading cgroup path.

The src/gc/unix/cgroup.cpp implementation doesn't have these issues.

cc @janvorli @omajid

Fixes two issues in src/pal/src/misc/cgroup.cpp:

* No subsystem match must be performed for cgroup v2.
* Incorrect arguments for sscanf_s when reading cgroup path.

The src/gc/unix/cgroup.cpp implementation doesn't have these issues.
Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the nit.

src/coreclr/src/pal/src/misc/cgroup.cpp Outdated Show resolved Hide resolved
@janvorli
Copy link
Member

@tmds thank you for the fix!

@tmds
Copy link
Member Author

tmds commented Nov 20, 2020

@janvorli can/should we backport this to 5.0?

@stephentoub
Copy link
Member

can/should we backport this to 5.0?

What is the observable impact of the bug? Who will be affected? Are there workarounds?

@stephentoub
Copy link
Member

@tmds
Copy link
Member Author

tmds commented Nov 20, 2020

What is the observable impact of the bug? Who will be affected? Are there workarounds?

On systems with cgroup v2, .NET isn't aware of the cpu/memory limit that is configured for the container.
As a workaround you could manually specify the limit with the gc configuration settings. It wouldn't be practical to do this in a large container deployment. This allows to set the memory limit, not the cpu limit.

cgroup v2 is not wide-spread yet, but it gets adopted:

If this isn't backported, I think we'll be patching the packages we build for RHEL. Though we prefer to not do that.

There's related code in https://github.com/dotnet/runtime/blob/master/src/libraries/Common/src/Interop/Linux/cgroups/Interop.cgroups.cs; anything similar need fixing there?

This is ok, only for cgroup v1 the subsystem check is applied:

if (cgroupVersion == CGroupVersion.CGroup1)
{
bool validCGroup1Entry = ((postSeparatorlineParts[0] == "cgroup") &&
(Array.IndexOf(postSeparatorlineParts[2].Split(','), subsystem) >= 0));
if (!validCGroup1Entry)
{
continue;
}
}
else if (cgroupVersion == CGroupVersion.CGroup2)
{
bool validCGroup2Entry = postSeparatorlineParts[0] == "cgroup2";
if (!validCGroup2Entry)
{
continue;
}
}

@janvorli
Copy link
Member

I believe we should backport it.

@stephentoub stephentoub merged commit 70c51bb into dotnet:master Nov 20, 2020
@stephentoub
Copy link
Member

/backport to release/5.0

@janvorli, I'll let you fill in the details :)

@github-actions
Copy link
Contributor

Started backporting to release/5.0: https://github.com/dotnet/runtime/actions/runs/374520406

@janvorli
Copy link
Member

@stephentoub thank you :-)

@janvorli
Copy link
Member

@tmds, can you please share what level of testing have you done for the change? I need to summarize it in the porting request.

@tmds
Copy link
Member Author

tmds commented Nov 23, 2020

@janvorli I ran this program in a container with --memory set to 500M.

using System;

namespace console
{
    class Program
    {
        static void Main(string[] args)
        {
            while (true)
            {
                var buffer = new byte[1024];
            }
        }
    }
}

When I run it like this:

.dotnet/dotnet /tmp/console/bin/Debug/net5.0/console.dll

which doesn't use the change in ths PR, strace shows:

[pid  6818] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 18
[pid  6818] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 18
[pid  6818] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 18
[pid  6818] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 18
[pid  6818] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 18
[pid  6818] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 18
[pid  6818] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 18
[pid  6818] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 18

When I run it like this:

./artifacts/bin/testhost/net6.0-Linux-Debug-x64/shared/Microsoft.NETCore.App/6.0.0/corerun /tmp/console/bin/Debug/net5.0/console.dll

The change is included, and strace looks like:

[pid  6885] openat(AT_FDCWD, "/sys/fs/cgroup//memory.current", O_RDONLY) = 21
[pid  6885] openat(AT_FDCWD, "/sys/fs/cgroup//memory.current", O_RDONLY) = 21
[pid  6885] openat(AT_FDCWD, "/sys/fs/cgroup//memory.current", O_RDONLY) = 21
[pid  6885] openat(AT_FDCWD, "/sys/fs/cgroup//memory.current", O_RDONLY) = 21
[pid  6885] openat(AT_FDCWD, "/sys/fs/cgroup//memory.current", O_RDONLY) = 21
[pid  6885] openat(AT_FDCWD, "/sys/fs/cgroup//memory.current", O_RDONLY) = 21
[pid  6885] openat(AT_FDCWD, "/sys/fs/cgroup//memory.current", O_RDONLY) = 21
[pid  6885] openat(AT_FDCWD, "/sys/fs/cgroup//memory.current", O_RDONLY) = 21

When I don't set a memory limit, both look like the first strace.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants