Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET Core application running in docker gets OOMKilled if swapping is disabled #851

Closed
devKlausS opened this issue May 18, 2018 · 43 comments
Assignees
Milestone

Comments

@devKlausS
Copy link

We are running .NET Core applications inside docker containers on a kubernetes cluster. Docker containers get OOMKilled because GC is not executed.

To test this we created a test program which allocates arrays in a loop. These arrays should get deleted by the GC. The program is executed in a docker container with 100Mb memory limit. Kubernetes is not involved in the tests.

When running the test with swap enabled everything works as expected and the GC gets triggered when the total memory reaches 100Mb. The test never gets killed.

When running the test with swap disabled the test gets OOMKilled when the total memory reaches 100Mb. We have tested this behaviour with ServerGC=true|false and with .NET Core 2.0 and 2.1

Enabling swap is not an option cause it's neither recommended nor supported by Kubernetes.

Code

you can find out test program here: https://github.com/devKlausS/dotnet_mem_allocator

var rand = new Random();
List<byte[]> m_arrays = new List<byte[]>();

int iLoop = 0;
while (true)
{
    var array = new byte[iAllocBytes];
    var value = (byte)rand.Next();
    Parallel.ForEach(array, (item) =>
    {
        item = value;
    });
    rand.NextBytes(array);

    if(!bFree)
        m_arrays.Add(array);

    if(iCollect > 0 && iLoop % iCollect == 0)
    {
        Console.WriteLine("run GC");
        GC.Collect();
    }

    Console.Write("AllocatedBytesForCurrentThread: " + GC.GetAllocatedBytesForCurrentThread() / 1000000 + "Mb " +
        "TotalMemory: " + GC.GetTotalMemory(bForceFullCollection) / 1000000 + "Mb ");

    for(int i = 0; i < GC.MaxGeneration; i++)
    {
        Console.Write("CollectionCount("+i+"): " + GC.CollectionCount(i) + " ");
    }

    Console.WriteLine("");

    Thread.Sleep(iSleepMs);
    iLoop++;
}

Docker images

ServerGC=false alloc/free .NET CORE 2.0

docker run -ti -m 100M schaefferlinkbit/dotnet_mem_allocator:1.0.3895_20_client_GC /alloc_mb=20 /free=false

ServerGC=true alloc/free .NET CORE 2.0

docker run -ti -m 100M schaefferlinkbit/dotnet_mem_allocator:1.0.3895_20_server_GC /alloc_mb=20

ServerGC=false alloc/free .NET CORE 2.1

docker run -ti -m 100M schaefferlinkbit/dotnet_mem_allocator:1.0.3895_21_client_GC /alloc_mb=20 /free=false

ServerGC=true alloc/free .NET CORE 2.1

docker run -ti -m 100M schaefferlinkbit/dotnet_mem_allocator:1.0.3895_21_server_GC /alloc_mb=20

cgroup limits

After reading multiple issues according to .NET Core GC and docker limits we started to dig into the cgroup limits. We compiled https://github.com/dotnet/coreclr/blob/master/src/gc/unix/cgroup.cpp and executed the program inside the docker container. As shown in the following screenshot the the path of the memory.limit_in_bytes file is pointing to the directory of the host machine and the file open operation fails.
screenshot from 2018-05-18 11-27-29
The next screenshot shows the volume mounts of the container. The host machine path is mounted to /sys/fs/memory inside the container. When reading memory.limit_in_bytes we see ~300MB (we pass -m 300M to the docker container)
screenshot from 2018-05-18 11-27-40

It seems like CLR is not able to read the physical memory limits from cgroup and therefore the GC is not triggered correctly. As a result the process is killed because of OOM. We guess that CLR is triggering GC when it has to do swapping and this is the reason why our process is not killed when swapping is enabled.

@devKlausS
Copy link
Author

output of the test tool

Unable to find image 'schaefferlinkbit/dotnet_mem_allocator:1.0.3895_20_server_GC' locally
1.0.3895_20_server_GC: Pulling from schaefferlinkbit/dotnet_mem_allocator
cc1a78bfd46b: Pull complete
cec1142d0aac: Pull complete
c0197406f002: Pull complete
df9acbd8ab89: Pull complete
7b75e776a304: Pull complete
c6bbfd1ea25b: Pull complete
Digest: sha256:6deeb873312637db1f80c24590f53e916fb99044bad8ecdb305fa3cc667a50a3
Status: Downloaded newer image for schaefferlinkbit/dotnet_mem_allocator:1.0.3895_20_server_GC
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
server GC: true
NETCOREAPP: 2.0
CLR version: 4.0.30319.42000
allocated bytes: 20000000
sleep ms: 1000
free: True
force full collection: False
run GC every: 0
AllocatedBytesForCurrentThread: 20Mb TotalMemory: 20Mb CollectionCount(0): 0 CollectionCount(1): 0
AllocatedBytesForCurrentThread: 40Mb TotalMemory: 40Mb CollectionCount(0): 0 CollectionCount(1): 0
AllocatedBytesForCurrentThread: 60Mb TotalMemory: 60Mb CollectionCount(0): 0 CollectionCount(1): 0
AllocatedBytesForCurrentThread: 80Mb TotalMemory: 80Mb CollectionCount(0): 0 CollectionCount(1): 0

OOMKilled

Output of docker inspect

root@kube1:~# docker inspect amazing_hopper
[
    {
        "Id": "9d9fe1e71155a794544770ad9ea90b9ec9f29fa3524475e3001815d504ac7dcb",
        "Created": "2018-05-18T11:07:44.137620021Z",
        "Path": "dotnet",
        "Args": [
            "dotnet_mem_allocator_20_server_GC.dll",
            "/alloc_mb=20"
        ],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": true,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2018-05-18T11:07:44.779168397Z",
            "FinishedAt": "2018-05-18T11:07:49.905754133Z"
        },
        "Image": "sha256:ad9dabc998e76037a7f5b0aabce223fea0889598990bfa59b2bfc57c36a92097",
        "ResolvConfPath": "/var/lib/docker/containers/9d9fe1e71155a794544770ad9ea90b9ec9f29fa3524475e3001815d504ac7dcb/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/9d9fe1e71155a794544770ad9ea90b9ec9f29fa3524475e3001815d504ac7dcb/hostname",
        "HostsPath": "/var/lib/docker/containers/9d9fe1e71155a794544770ad9ea90b9ec9f29fa3524475e3001815d504ac7dcb/hosts",
        "LogPath": "/var/lib/docker/containers/9d9fe1e71155a794544770ad9ea90b9ec9f29fa3524475e3001815d504ac7dcb/9d9fe1e71155a794544770ad9ea90b9ec9f29fa3524475e3001815d504ac7dcb-json.log",
        "Name": "/amazing_hopper",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 104857600,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": [],
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": -1,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/202220049d389eafade942074b96ac900c0f6a96de71e207bee0ee36e298f300-init/diff:/var/lib/docker/overlay2/be362a6bbca67018ae7228f1fc1d753ba86dce6bbdcd7a9f418808ca39e058a1/diff:/var/lib/docker/overlay2/0cdfa3adb38eba1ff6782a930f92493c58d7b84cae9e90bbf837089a7921c857/diff:/var/lib/docker/overlay2/67193b82fdc0f1c1966c0fc8f56698a0b2d5ddd9f05f9fd20490aa412b6b8f0a/diff:/var/lib/docker/overlay2/e5fb63cca3a4d9fbc88024a83cf4028859485a2023fe0b5ba87617f66c6d982b/diff:/var/lib/docker/overlay2/629aa25dfe8a23636b38dae27ffff60583785d9fa481f1c028bf5c600a532082/diff:/var/lib/docker/overlay2/6f6bf2e72ac23645820450470eed4e6df9929e1725598146f8c78391c72f3fd9/diff",
                "MergedDir": "/var/lib/docker/overlay2/202220049d389eafade942074b96ac900c0f6a96de71e207bee0ee36e298f300/merged",
                "UpperDir": "/var/lib/docker/overlay2/202220049d389eafade942074b96ac900c0f6a96de71e207bee0ee36e298f300/diff",
                "WorkDir": "/var/lib/docker/overlay2/202220049d389eafade942074b96ac900c0f6a96de71e207bee0ee36e298f300/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [],
        "Config": {
            "Hostname": "9d9fe1e71155",
            "Domainname": "",
            "User": "",
            "AttachStdin": true,
            "AttachStdout": true,
            "AttachStderr": true,
            "ExposedPorts": {
                "7777/tcp": {}
            },
            "Tty": true,
            "OpenStdin": true,
            "StdinOnce": true,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "DOTNET_RUNNING_IN_CONTAINER=true",
                "DOTNET_VERSION=2.0.7",
                "DOTNET_DOWNLOAD_URL=https://dotnetcli.blob.core.windows.net/dotnet/Runtime/2.0.7/dotnet-runtime-2.0.7-linux-x64.tar.gz",
                "DOTNET_DOWNLOAD_SHA=d8f6035a591b5500a8b81188d834ed4153c4f44f1618e18857c610d0b332d636970fd8a980af7ae3fbff84b9f1da53aa2f45d8d305827ea88992195cd5643027"
            ],
            "Cmd": [
                "/alloc_mb=20"
            ],
            "Image": "schaefferlinkbit/dotnet_mem_allocator:1.0.3895_20_server_GC",
            "Volumes": null,
            "WorkingDir": "/app",
            "Entrypoint": [
                "dotnet",
                "dotnet_mem_allocator_20_server_GC.dll"
            ],
            "OnBuild": null,
            "Labels": {}
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "196aaab5c7d5d9594811745678d9c100721185b552ea00313a9ed756427e7f0e",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/196aaab5c7d5",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "39a588820992e724461d180e009c7afe75373efcaf7f41b049fb683af3829b3a",
                    "EndpointID": "",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null
                }
            }
        }
    }
]

@devKlausS devKlausS changed the title .net core application running in docker get OOMKilled if swapping is disabled .NET Core application running in docker gets OOMKilled if swapping is disabled May 18, 2018
@janvorli
Copy link
Member

@devKlausS thank you very much for the detailed analysis and repro details! I will look into it.

@janvorli
Copy link
Member

@devKlausS can you please tell me what Linux distro is the one you are using? I have tried to repro it on my Ubuntu 16.04 and the behavior is different. My container doesn't get OOM killed, but somehow hangs instead. I cannot stop it or kill it (the docker stop / kill commands just run for a while and then terminate with no message, but the container keeps running).
I've tried

docker run -ti -m 100M schaefferlinkbit/dotnet_mem_allocator:1.0.3895_21_server_GC /alloc_mb=20

And this is the last line of output before it hung:

AllocatedBytesForCurrentThread: 320Mb TotalMemory: 180Mb CollectionCount(0): 1 CollectionCount(1): 1

@dmpriso
Copy link

dmpriso commented May 18, 2018

@janvorli I'm working together with Klaus on this issue.
We were testing on several machines with Ubuntu 16.04 and Ubuntu 18.04.
Did you turn off swap using swapoff -a?

It seems like your OOM killer is disabled actually. Kernel cgroup documentation states that malloc call will hang (sleep) if OOM killer is disabled (which isn't very useful, either). It is however not disabled by default, and I only know how to disable it for a single container.

Kernel docs reg. OOM killer:
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
Pt. 10

@hpbieker
Copy link

This is similar to what I observed in dotnet/coreclr#16906. Docker does not respect the memory limit, so the GC is triggered too late. If swap is enabled it does not matter if the memory limit is violated, as the excess memory is just swapped out.

@dmpriso
Copy link

dmpriso commented May 18, 2018

@janvorli I'd like to add that to us the root cause seems to be that the memory limit, albeit implemented by @rahku in dotnet/coreclr#10064 is not read correctly by cgroups.cpp

@hpbieker we didn't find your issue actually although we were searching a lot :)
Unfortunately, enabling swap is not an option for our Kubernetes cluster. Under these circumstances I can't see a way to keep a .NET core microservice running stable in a linux k8s environment.

@janvorli
Copy link
Member

@dmpriso Ah, that is likely the reason for the hang. I've completely forgotten about that behavior, but remembered now that you've mentioned that.

@janvorli
Copy link
Member

Great, I've enabled the oom killer and disabled the swap and now I can repro it.

@janvorli
Copy link
Member

While I can repro the issue, in my case, the limit is read correctly from the cgroups files and the path is also correct:

(gdb) p mem_limit_filename
$3 = 0x699600 "/sys/fs/cgroup/memory/memory.limit_in_bytes"

(gdb) p physical_memory_limit
$11 = 104857600

So the fact that in your case, it was trying to use a wrong path most likely means that on your host, the /proc/self/mountinfo returns the info in a format that we don't recognize correctly.

Could you please run /bin/bash in your container (add --entrypoint=/bin/bash to the docker options) and get me the output of the following command?

cat /proc/self/mountinfo | grep "cgroup"

The fact that we still get killed due to the OOM on my machine even though the limit is read correctly could be due to the fact that the GC is not the only part of coreclr that consumes memory. There is also memory consumed by native allocations in the runtime, which includes the JIT, the managed assemblies, the native binaries etc.
And then also native memory allocated out of our control by 3rd party libraries, like openssl or even the C runtime itself.

Also, we are still missing low memory notification for GC on Unix (https://github.com/dotnet/coreclr/issues/5551), but even that would not prevent the OOM kill to happen in all cases (especially when memory is being allocated quickly).

@dmpriso
Copy link

dmpriso commented May 19, 2018

This is on dotnet_mem_allocator as per your instructions

root@16b158554826:/app# cat /proc/self/mountinfo | grep cgroup
699 698 0:85 / /sys/fs/cgroup ro,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,mode=755
700 699 0:29 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,xattr,name=systemd
701 699 0:32 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,blkio
702 699 0:33 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,net_cls,net_prio
703 699 0:34 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,freezer
704 699 0:35 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,hugetlb
705 699 0:36 / /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,rdma
706 699 0:37 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime master:20 - cgroup cgroup rw,cpuset
707 699 0:38 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:21 - cgroup cgroup rw,memory
708 699 0:39 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime master:22 - cgroup cgroup rw,perf_event
709 699 0:40 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime master:23 - cgroup cgroup rw,pids
710 699 0:41 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:24 - cgroup cgroup rw,devices
711 699 0:42 /docker/16b15855482634de05f9eb0025212311366dc757621f0f93e41361585efc03c6 /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct

This is on ubuntu:latest where we debugged cgroups.cpp

root@d241960b7cbd:/sys/fs/cgroup/memory# cat /proc/self/mountinfo | grep cgroup
531 530 0:77 / /sys/fs/cgroup ro,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,mode=755
532 531 0:29 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,xattr,name=systemd
533 531 0:32 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,blkio
534 531 0:33 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,net_cls,net_prio
535 531 0:34 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,freezer
536 531 0:35 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,hugetlb
537 531 0:36 / /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,rdma
538 531 0:37 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime master:20 - cgroup cgroup rw,cpuset
539 531 0:38 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime master:21 - cgroup cgroup rw,memory
540 531 0:39 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime master:22 - cgroup cgroup rw,perf_event
541 531 0:40 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime master:23 - cgroup cgroup rw,pids
542 531 0:41 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime master:24 - cgroup cgroup rw,devices
543 531 0:42 /docker/d241960b7cbd301407b51fe95b35607221cc32751d06e4ba894b531ed567294e /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime master:25 - cgroup cgroup rw,cpu,cpuacct
545 526 259:5 /home/daniel/dev/cgroup_mem /test rw,relatime - ext4 /dev/nvme0n1p5 rw,errors=remount-ro,data=ordered

Our cgroup_test tool also fails with the .NET core image:

daniel@daniel-ubuntu:~/dev/cgroup_mem$ docker run -ti -m 100M --entrypoint=/bin/bash -v $(pwd):/test schaefferlinkbit/dotnet_mem_allocator:1.0.3895_20_server_GC
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
root@2f2b32bd6792:/app# cd /test
root@2f2b32bd6792:/test# ls
cgroup.cpp  cgroup_test  mem_tester  mem_tester.cpp
root@2f2b32bd6792:/test# ./cgroup_test 
Failed

cgroup_test just consists of cgroup.cpp with a simple main function:

int main() 
{
    CGroup cg;
    size_t val;
    if (!cg.GetPhysicalMemoryLimit(&val)) 
    {
        std::cout << "Failed" << std::endl;
        return 1;
    }
    std::cout << "Memory limit bytes: " << val << std::endl;
    return 0;
}

Regarding the native allocations done by the runtime: Isn't code allocating unmanaged memory supposed to call GC.AddMemoryPressure()? Could it be a workaround to add some reserve using that method?

@dmpriso
Copy link

dmpriso commented May 24, 2018

@janvorli could you spot any difference in our and your mountinfo output?

@janvorli
Copy link
Member

janvorli commented May 24, 2018

@dmpriso this is weird - your dump matches my one and yet it gives me the correct limit. However, now I've realized something - you are testing code from https://github.com/dotnet/coreclr/blob/master/src/gc/unix/cgroup.cpp, which is used by the standalone GC (if you use it). I was debugging the copy of this code that's in PAL and is used by the GC embedded in the coreclr - https://github.com/dotnet/coreclr/blob/master/src/pal/src/misc/cgroup.cpp. While these should match (except for differences due to one using PAL types / functions and headers and the other using standard ones, there might be a subtle bug causing the malfunction of the one that you've tested in your little c++ test.
Let me try to do the same thing as you did and see how it behaves.

Regarding the native allocations done by the runtime: Isn't code allocating unmanaged memory supposed to call GC.AddMemoryPressure()?

User code - yes. But even with user code, it would be complicated if you use code that you don't control - e.g. 3rd party libraries. While you could possibly install hooks for the malloc / free calls, you cannot track mmap that some libraries can use. And even with mmap, it would be difficult to track how much physical memory it has allocated, since it can map memory in "lazy" manner when the physical pages are allocated at the first access to a memory page.

Could it be a workaround to add some reserve using that method?

The purpose of this method is to accumulate the allocated bytes and trigger a GC if we go over a threshold that we dynamically update each time we cross it. The idea is that you'd use it before doing native allocation and GC can free some memory if it seems it may be needed. But the accumulated pressure is not used by the GC itself.

@janvorli
Copy link
Member

Hmm, I've tried the same thing as you did - just compiling the https://github.com/dotnet/coreclr/blob/master/src/gc/unix/cgroup.cpp with your bit of code to test it added. And it still worked :-(.
I've instrumented the code a bit to dump the stuff it extracts, could you please compile and run the instrumented version? You can get it here: https://gist.github.com/janvorli/0614a42e612965ea3ad56d00743dc7a2

@dmpriso
Copy link

dmpriso commented May 24, 2018

@janvorli I feel sorry, while digging into that issue I must have picked up the file from the original commit b2b4ea2. Your snipped as well as the current version correctly read the limit. My fault!

So while my analysis is wrong, the issue still exists but for the reason you mentioned.

Do you have any idea how to work around that?
What would actually fix the issue? Would a failing malloc() (without OOMKill) trigger GC in that case?

@janvorli
Copy link
Member

One feature that is not implemented for Unix yet is the low memory notification (#5551). That would fix the issue in case the memory allocation doesn't happen with an extremely high rate that would prevent the notification to have the desired effect. What I mean is that if you allocate large blocks of memory in a tight loop and deplete the available memory in a fraction of second, the OOM might happen in the interval between the regular polls of available physical memory. So one poll would find there is still a plenty of memory, but we OOM before the next regular poll comes in.

@dmpriso
Copy link

dmpriso commented Jun 4, 2018

What would be required for a 100% solution, on top of that?

@devKlausS
Copy link
Author

devKlausS commented Oct 3, 2018

I tested with release .NET CORE 2.1.5 but unfortunately the issue still exists. I created new docker images based on 2.1.5 SDK and performed the tests like described above. Both test images get OOMKilled

ServerGC=false alloc/free .NET CORE 2.1.5

docker run -ti -m 100M schaefferlinkbit/dotnet_mem_allocator:1.0.5993_215_client_GC /alloc_mb=20

image

ServerGC=true alloc/free .NET CORE 2.1.5

docker run -ti -m 100M schaefferlinkbit/dotnet_mem_allocator:1.0.5993_215_server_GC /alloc_mb=20

image

@tmds
Copy link
Member

tmds commented Oct 9, 2018

Based on what @janvorli said, it is expected this test program still runs out of memory: the rate of allocation is high compared to the provisioned memory / poll interval.
Containers should be provisioned with extra memory to account for this (until a low memory notification mechanism is implemented).

Do your production applications run OOM with 2.1.5?

@janvorli
Copy link
Member

@devKlausS I'll try to debug it locally again to see if I can pinpoint the issue you are hitting.

@devKlausS
Copy link
Author

@janvorli thank you for your support
@tmds we have to perform more load tests

@hassancode
Copy link

hassancode commented Nov 13, 2018

@devKlausS please try:

<PropertyGroup> 
    <ServerGarbageCollection>false</ServerGarbageCollection>
  </PropertyGroup>

I have also tried this, after this memory utlization has come down to more than half. btw I am running .net core in linux - docker container - aws ECS

related article: https://blog.markvincze.com/troubleshooting-high-memory-usage-with-asp-net-core-on-kubernetes/

@axelbodo
Copy link

this is related to other long running (closed) problem as well (dotnet/dotnet-docker#220), actually java has solved to listen to the cgroup and not on host machin/vm memory (or using lxcfs it works as intended), but .net somehow doesn't use /proc provided infos to use in decisions on high memory pressure.
Either .netcore should use /proc, or should provide a possibility by any mean of configuration to tell it, what is the max allocatable memory for heap and the process all in all. MaxWorkingSet cannot be used for two reasons: 1. it defines max non pagable memory 2. as I know it is not implemented in netcore.

So I would preffer the environment var or dotnet run cli option, which in this way can be used eith in e.g. ecs or kubernetes deployment descriptor with helm chart, where we can substitude mem resource limit and this option with the same value.

@tmds
Copy link
Member

tmds commented Nov 19, 2018

Either .netcore should use /proc

.NET Core uses cgroup limits. Available memory is determined by polling the proc file system.

Because of the polling interval, it is possible to allocate so much you get yourself killed, before .NET Core sees it should perform a GC. That is what happens with the code in the top comment.

@kriskalish
Copy link

.NET Core uses cgroup limits. Available memory is determined by polling the proc file system.

Because of the polling interval, it is possible to allocate so much you get yourself killed, before .NET Core sees it should perform a GC. That is what happens with the code in the top comment.

Is the polling used to check /fs/cgroup/memory/memory.limit_in_bytes? Is that ever expected to change after the CLR starts up? I'm trying to think of other work around to this issue which is affecting me. Is there a list of COMPlus_ environment variables that the CLR looks for somewhere? I was trying to find anything that could be used to tune the GC and had no luck. I was hoping that setting the polling interval may have some positive outcome.

For now I am manually invoking garbage collection on a timer, but it seems fairly flaky.

@unsafecode
Copy link

@kriskalish Might be relevant for your question dotnet/coreclr@a25682cdcf

@tmds In your opinion, which is the best suited approach then? As far as I can understand, setting Workstation GC looks like to be the "safest" option to constrain memory consumption, but it may become impacting on throughput-sensitive scenarios. Or is there a reliable way to prevent OOM kills while still preserving memory w/ Server GC (w.r.t. your point on cgroup polling time)?

/cc @gidifede @ruoccofabrizio

@axelbodo
Copy link

@tmds cgroup and proc is not the same. So what is used by netcore? proc only reflects cgroup if it is mounted by lxcfs or some other mean of fuse. In our microservice containers only one netcore process runs, so it doesn't race cgroup limit with other processes, however it continuously oom kill itself. I've also tested 2.0 with just memory pressure setting meminof under proc to a relatively low value, and it gets oom killed. Investigating that time the heap manager, as I remember it used kernel call to get phisical mem, not the proc file system. Can you tell me which version/release of netcore switched to poll proc?

@tmds
Copy link
Member

tmds commented Nov 20, 2018

In your opinion, which is the best suited approach then? As far as I can understand, setting Workstation GC looks like to be the "safest" option to constrain memory consumption, but it may become impacting on throughput-sensitive scenarios.

  • low provision your container and use workstation gc
  • high provision your container and use server gc

This blog post series will give us some more guidelines.

Or is there a reliable way to prevent OOM kills while still preserving memory w/ Server GC (w.r.t. your point on cgroup polling time)?

Switching to workstation GC will help. GCs will be triggered more soon because there is a single, small gen0+1 segment.
That said, if your app runs OOM because of the polling time, I think you should look at your allocation rate and probably it can be drastically reduced by using pooled objects.
Kestrel uses pooled memory, so I wonder if this polling interval is causing many real-world OOMs.

cgroup and proc is not the same. So what is used by netcore?

.NET Core reads the cgroup limits and usage from the proc filesystem. This code is in the cgroup.cpp files in this repo.

Can you tell me which version/release of netcore switched to poll proc?

You need to run at least 2.1.5.

@janvorli janvorli self-assigned this Feb 1, 2019
@danilobreda
Copy link

Hello i am having the same kind of issue on asp net core 2.2, on a docker (ubuntu) on AWS ECS. I tried to run the docker container on my machine and it does not kill itself.
What can i do to resolve the problem?

@Maoni0
Copy link
Member

Maoni0 commented Apr 18, 2019

this is the one I wanted to ask you about, @janvorli - a while ago you mentioned you still wanted to do some work for this so just checking.

@saixiaohui
Copy link

@janvorli and @Maoni0 we have released .net core 3.0. Has this been fixed?

Thanks!

@Maoni0
Copy link
Member

Maoni0 commented Oct 2, 2019

@saixiaohui have you tried 3.0?

@DotNetRockStar
Copy link

Having the same issue. Will upgrading to 3.0 fix this issue?

@grantzvolsky
Copy link

As a workaround this issue can be solved by running ulimit -d 100000 in the runtime's parent process.

@danilobreda
Copy link

danilobreda commented Nov 21, 2019

Upgrade from 2.2 to 3.0 and the same problem occurs.
My container on AWS ECS with asp net core 3.0:
image

With 20 requests, concurrency level of 2..
image

image
image

At 2.2, it was 2 clicks at the same time and it died ... now it's getting longer alive before that.

@jkotas jkotas transferred this issue from dotnet/coreclr Dec 13, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 13, 2019
@richlander
Copy link
Member

Same question as #852 (comment) ... is this issue still a problem with 3.x.

Info on 3.x for containers: https://devblogs.microsoft.com/dotnet/using-net-and-docker-together-dockercon-2019-update/

@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Jan 8, 2020
@jeffschwMSFT jeffschwMSFT added this to the 5.0 milestone Jan 8, 2020
@effyteva
Copy link

effyteva commented Feb 2, 2020

We face the same issue. Switching from .NET core 2.2 to 3.1 made it even worse for our application.
Are there any recommendations for temporary solutions?
Waiting for .NET core 5.0 (November 2020) isn't a great solution...

@Gladskih
Copy link

Related

@vladshestel
Copy link

The issue is still relevant for netcore 3.1.0-bionic image. May be my scenario will be useful for someone or it brings any new information to the runtime team.

We facing the problem while migrating an asp.net application with high memory pressure between two sets of servers. We use docker swarm mode to orchestrate services and cgroups for a memory limits. Exact same image was used to start a container on both servers sets and it fails with out-of-memory on a new ones.

The servers were configured with ansible scripts, so they use the exact same configs, docker engine version and both has disabled swapping. Only differences between the old servers and the new servers were an OS version and different environment type.
We have VMware virtual machines with Ubuntu 16.04.5 on old nodes and bare metal machines with Ubuntu 18.04.4 on a new nodes.

So, attempt to run a container with memory limit leads to out-of-memory error message.

Attempt to run a container without memory limit leads to a "half-dead-working" application with really high memory consumption (4x above normal), without ability to process requests (with connection refused).

One of my hypothesis was about different ubuntu versions. To validate it I deployed stage copy of production node (with Ubuntu 18.04.4 on a virtualized machine) and made the application with production setup run. The run was successful.

So maybe problem with GC is somehow relevant to a system run mode?

Temp solution, that make it work: explicit use of Workstation GC mode solve the issue.

<PropertyGroup>
    <ServerGarbageCollection>false</ServerGarbageCollection>
</PropertyGroup>

@vladshestel
Copy link

Whoa, hope this could be helpful for some researchers.

After further investigations I've noticed that there is big difference between my servers in amount of available logical CPUs count (80 vs 16). After some googling I came across this topic dotnet/runtime#622 that leads me to an experiments with CPU/GC/Threads settings.

I was using --cpus constraint in stack file; explicitly set System.GC.Concurrent=true, System.GC.HeapCount=8, System.GC.NoAffinitize=true, System.Threading.ThreadPool.MaxThreads=16 in runtimeconfig.template.json file; update image to a 3.1.301-bionic sdk and 3.1.5-bionic asp.net runtime — I made all this things in a various combinations and all of this had no effect. Application just hangs until gets OOMKilled.

The only thing that make it work with Server GC is --cpuset-cpus constraint. Of course, explicit setting of available processors is not an option for a docker swarm mode. But I was experimenting with available cpus to find any regularity. And here I got a few interesting facts.

What is interesting, previously I have mirgated 3 other backend services to a new servers cluster and they all go well with a default settings. Their memory limit is set to 600 Mb but in fact they need about 400 Mb to run. Things go wrong only with memory-consuming applications (I have two of those), it requires 3 Gb to build in-memory structures and runs with a 6 Gb constraint.

It keeps working in any range between [1, 35] available cpus and gets hanging when cpus count is 36. ¯\_(ツ)_/¯

Of course, it's relevant only for my particular workload, but may be someone could see any relations.

@pnagori02
Copy link

Hi, is there a plan to fix this sooner ? We are also hitting this problem

@janvorli
Copy link
Member

I have finally tested this with .NET 6 preview 6. We have made some changes to how we detect memory load on Linux recently, so I wanted to give this a try. It seems that both the client and server cases now behave correctly:

server GC: false
allocated bytes: 20000000
sleep ms: 1000
free: True
force full collection: False
run GC every: 0
AllocatedBytesForCurrentThread: 20Mb TotalMemory: 20Mb CollectionCount(0): 0 CollectionCount(1): 0
AllocatedBytesForCurrentThread: 40Mb TotalMemory: 40Mb CollectionCount(0): 1 CollectionCount(1): 1
AllocatedBytesForCurrentThread: 60Mb TotalMemory: 60Mb CollectionCount(0): 1 CollectionCount(1): 1
AllocatedBytesForCurrentThread: 80Mb TotalMemory: 40Mb CollectionCount(0): 2 CollectionCount(1): 2
AllocatedBytesForCurrentThread: 100Mb TotalMemory: 60Mb CollectionCount(0): 2 CollectionCount(1): 2
AllocatedBytesForCurrentThread: 120Mb TotalMemory: 40Mb CollectionCount(0): 3 CollectionCount(1): 3
AllocatedBytesForCurrentThread: 140Mb TotalMemory: 40Mb CollectionCount(0): 4 CollectionCount(1): 4
AllocatedBytesForCurrentThread: 160Mb TotalMemory: 60Mb CollectionCount(0): 4 CollectionCount(1): 4
AllocatedBytesForCurrentThread: 180Mb TotalMemory: 40Mb CollectionCount(0): 5 CollectionCount(1): 5
AllocatedBytesForCurrentThread: 200Mb TotalMemory: 60Mb CollectionCount(0): 5 CollectionCount(1): 5
AllocatedBytesForCurrentThread: 220Mb TotalMemory: 40Mb CollectionCount(0): 6 CollectionCount(1): 6
AllocatedBytesForCurrentThread: 240Mb TotalMemory: 40Mb CollectionCount(0): 7 CollectionCount(1): 7
AllocatedBytesForCurrentThread: 260Mb TotalMemory: 60Mb CollectionCount(0): 7 CollectionCount(1): 7
AllocatedBytesForCurrentThread: 280Mb TotalMemory: 40Mb CollectionCount(0): 8 CollectionCount(1): 8
AllocatedBytesForCurrentThread: 300Mb TotalMemory: 60Mb CollectionCount(0): 8 CollectionCount(1): 8
AllocatedBytesForCurrentThread: 320Mb TotalMemory: 40Mb CollectionCount(0): 9 CollectionCount(1): 9
AllocatedBytesForCurrentThread: 340Mb TotalMemory: 40Mb CollectionCount(0): 10 CollectionCount(1): 10
AllocatedBytesForCurrentThread: 360Mb TotalMemory: 60Mb CollectionCount(0): 10 CollectionCount(1): 10
AllocatedBytesForCurrentThread: 380Mb TotalMemory: 40Mb CollectionCount(0): 11 CollectionCount(1): 11
AllocatedBytesForCurrentThread: 400Mb TotalMemory: 60Mb CollectionCount(0): 11 CollectionCount(1): 11
AllocatedBytesForCurrentThread: 420Mb TotalMemory: 40Mb CollectionCount(0): 12 CollectionCount(1): 12
AllocatedBytesForCurrentThread: 440Mb TotalMemory: 40Mb CollectionCount(0): 13 CollectionCount(1): 13
AllocatedBytesForCurrentThread: 460Mb TotalMemory: 60Mb CollectionCount(0): 13 CollectionCount(1): 13

The total memory keeps oscillating between 40 and 60 Mb even if I let it run for a long time.

Server GC version behaves the same way:

server GC: true
allocated bytes: 20000000
sleep ms: 1000
free: True
force full collection: False
run GC every: 0
AllocatedBytesForCurrentThread: 20Mb TotalMemory: 20Mb CollectionCount(0): 0 CollectionCount(1): 0
AllocatedBytesForCurrentThread: 40Mb TotalMemory: 40Mb CollectionCount(0): 0 CollectionCount(1): 0
AllocatedBytesForCurrentThread: 60Mb TotalMemory: 60Mb CollectionCount(0): 0 CollectionCount(1): 0
AllocatedBytesForCurrentThread: 80Mb TotalMemory: 40Mb CollectionCount(0): 1 CollectionCount(1): 1
AllocatedBytesForCurrentThread: 100Mb TotalMemory: 60Mb CollectionCount(0): 1 CollectionCount(1): 1
AllocatedBytesForCurrentThread: 120Mb TotalMemory: 40Mb CollectionCount(0): 2 CollectionCount(1): 2
AllocatedBytesForCurrentThread: 140Mb TotalMemory: 60Mb CollectionCount(0): 2 CollectionCount(1): 2
AllocatedBytesForCurrentThread: 160Mb TotalMemory: 40Mb CollectionCount(0): 3 CollectionCount(1): 3
AllocatedBytesForCurrentThread: 180Mb TotalMemory: 60Mb CollectionCount(0): 3 CollectionCount(1): 3
AllocatedBytesForCurrentThread: 200Mb TotalMemory: 40Mb CollectionCount(0): 4 CollectionCount(1): 4
AllocatedBytesForCurrentThread: 220Mb TotalMemory: 60Mb CollectionCount(0): 4 CollectionCount(1): 4
AllocatedBytesForCurrentThread: 240Mb TotalMemory: 40Mb CollectionCount(0): 5 CollectionCount(1): 5
AllocatedBytesForCurrentThread: 260Mb TotalMemory: 60Mb CollectionCount(0): 5 CollectionCount(1): 5
AllocatedBytesForCurrentThread: 280Mb TotalMemory: 40Mb CollectionCount(0): 6 CollectionCount(1): 6
AllocatedBytesForCurrentThread: 300Mb TotalMemory: 60Mb CollectionCount(0): 6 CollectionCount(1): 6
AllocatedBytesForCurrentThread: 320Mb TotalMemory: 40Mb CollectionCount(0): 7 CollectionCount(1): 7
AllocatedBytesForCurrentThread: 340Mb TotalMemory: 60Mb CollectionCount(0): 7 CollectionCount(1): 7
AllocatedBytesForCurrentThread: 360Mb TotalMemory: 40Mb CollectionCount(0): 8 CollectionCount(1): 8
AllocatedBytesForCurrentThread: 380Mb TotalMemory: 60Mb CollectionCount(0): 8 CollectionCount(1): 8
AllocatedBytesForCurrentThread: 400Mb TotalMemory: 40Mb CollectionCount(0): 9 CollectionCount(1): 9
AllocatedBytesForCurrentThread: 420Mb TotalMemory: 60Mb CollectionCount(0): 9 CollectionCount(1): 9
AllocatedBytesForCurrentThread: 440Mb TotalMemory: 40Mb CollectionCount(0): 10 CollectionCount(1): 10
AllocatedBytesForCurrentThread: 460Mb TotalMemory: 60Mb CollectionCount(0): 10 CollectionCount(1): 10

@janvorli
Copy link
Member

@devKlausS these were tested with swap disabled. Do the result look good to you so that we can close this issue?

@janvorli janvorli closed this as completed Aug 2, 2021
@devKlausS
Copy link
Author

devKlausS commented Aug 11, 2021

@janvorli I finally tested it with .NET 6 preview 7 in our k8s cluster and it looks good. Thanks for your help!

https://calip.io/Har4ldTV#7yvAkTxP

@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests