Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET Core applications get oom killed on Kubernetes/OpenShift #10739

Closed
tmds opened this issue Jul 20, 2018 · 9 comments
Closed

.NET Core applications get oom killed on Kubernetes/OpenShift #10739

tmds opened this issue Jul 20, 2018 · 9 comments
Assignees

Comments

@tmds
Copy link
Member

tmds commented Jul 20, 2018

We have been investigating why .net core applications are killed by OpenShift because they exceed their assigned memory.

OpenShift/Kubernetes informs the app via the sysfs limit_in_bytes. This is detected by .NET Core:

https://github.com/dotnet/coreclr/blob/08d39ddf02c81c99bd49c19b808c855235cbabdc/src/pal/src/misc/cgroup.cpp#L25

Then memory is monitored by the oom killer based on sysfs usage_in_bytes.
.NET Core is using statm for this:

https://github.com/dotnet/coreclr/blob/08d39ddf02c81c99bd49c19b808c855235cbabdc/src/pal/src/misc/cgroup.cpp#L24

usage_in_bytes includes RSS and CACHE, while statm is only RSS.
So memory in cache is a reason to get oom killed, but .NET Core doesn't use it to detect when to do a GC.

We should change the implementation so it also is aware of usage_in_bytes to measure the memory load of the system.

CC @janvorli

@jkotas
Copy link
Member

jkotas commented Jul 20, 2018

Related to https://github.com/dotnet/coreclr/issues/18971.

cc @MichaelSimons @richlander

@janvorli
Copy link
Member

@tmds thank you for the investigation! I just got back from my vacation and I'll look into fixing it soon.

@janvorli janvorli self-assigned this Aug 13, 2018
@tmds
Copy link
Member Author

tmds commented Aug 14, 2018

@janvorli I am also back from vacation :) If you want, you can assign the issue to me.

@janvorli
Copy link
Member

@tmds thank you, I'm gladly accepting your offer :-)

@tmds
Copy link
Member Author

tmds commented Aug 17, 2018

@janvorli can we backport this to the 2.1? Should I do a PR targeting dotnet:release/2.1 branch?

@kierenj
Copy link

kierenj commented Aug 24, 2018

Yes please @tmds ? And if I may be so bold, I'm looking for assistance here: https://stackoverflow.com/questions/51983312/net-core-on-linux-lldb-sos-plugin-diagnosing-memory-issue .. would this be a worthy issue on dotnet/coreclr just yet?

@tmds
Copy link
Member Author

tmds commented Aug 24, 2018

PR to backport to 2.1: dotnet/coreclr#19650

@kierenj , yes, you can create an issue for that in coreclr repo.

@kierenj
Copy link

kierenj commented Aug 24, 2018

Excellent, this will be great for me in 2.1. On that issue - in fact I was using 2.0 and memory usage is way, way down on 2.1 so no need there. Thank you!

@chrisgilbert
Copy link

chrisgilbert commented Aug 31, 2018

I tested the recent PR for 2.1 (19650) in our application and saw a significant memory use reduction. The charts here are from Amazon ECS, and relative to the soft memory limit of 384MB (which is why it can show more than 100%). The hard memory limit for the cgroup is 1024MB.

before_and_after_gc_fix

The background memory use has remained stable at around 300MB for the last 12h or so, compared to the unpatched application which uses around 420MB.

The difference is more pronounced under load, where in production we are bouncing close to the 2048MB cgroup limit regularly at the moment (we do significant logging and other I/O so roughly half our prod memory use is for page caching).

For ages we thought we had a memory leak, but after scratching our heads for some time trying to find one, I finally found this ticket, which seems to fix our issue. 🍾 🎆

Thanks very much for your work! 👍 👍 👍

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants