Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[release/2.1] Determine memory load based on cgroup usage. #19650

Merged
merged 1 commit into from
Aug 31, 2018

Conversation

tmds
Copy link
Member

@tmds tmds commented Aug 24, 2018

Port #19518 to 2.1

CC @janvorli

cgroup usage is used to trigger oom kills. It includes rss and file cache
of the cgroup.

The implementation was only using the process rss to determine memory load.
This is less than the cgroup usage and leads to oom kills due to GC not
being triggered soon enough.
@janvorli janvorli added this to the 2.1.x milestone Aug 24, 2018
@janvorli janvorli added the Servicing-consider Issue for next servicing release review label Aug 24, 2018
Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@janvorli
Copy link
Member

Description

.NET core applications are killed by OpenShift because they exceed their assigned memory. OpenShift/Kubernetes informs the app via the sysfs limit_in_bytes. Then memory is monitored by the oom killer based on sysfs usage_in_bytes. .NET Core is using /proc/self/statm instead, which includes only RSS and not memory in cache as the usage_in_bytes.

Customer Impact

Customer’s applications are killed by OpenShift because they exceed their assigned memory.

Regression?

No
 

Risk

No

Original issue: #19060

@chrisgilbert
Copy link

Thanks for fixing this! We think we've been hitting this issue in ECS, so I'll give a +1 for having it fixed in a 2.1 patch.

@jamshedd jamshedd added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Aug 30, 2018
@jamshedd
Copy link
Member

Approved for 2.1.5

@danmoseley danmoseley merged commit a25682c into dotnet:release/2.1 Aug 31, 2018
@danmoseley
Copy link
Member

This is expected to go out in October patch.

@danmoseley danmoseley modified the milestones: 2.1.x, 2.1.5 Sep 13, 2018
@devKlausS
Copy link

will this solve https://github.com/dotnet/coreclr/issues/18044 as well?

@tmds
Copy link
Member Author

tmds commented Oct 3, 2018

@devKlausS it should, .NET Core now uses the same limit as the Linux OOM killer.

@devKlausS
Copy link

@tmds thanks for the quick reply! I did a retest with 2.1.5 and I am afraid the issued is not solved yet. See my update https://github.com/dotnet/coreclr/issues/18044#issuecomment-426688885 If you need more details please let me know.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants