Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to run dotnet-docker based containers with memory limits? #220

Closed
geoffpatehutch opened this issue Mar 14, 2017 · 25 comments
Closed

Comments

@geoffpatehutch
Copy link

I'm trying to use a dotnet-docker based container in a limited memory environment. I am passing --memory 134217728 (128MB) and --memory-swap=0 on the docker command line via DC/OS running on Ubuntu in the Azure Container Service.

The problem I have is that the .NET garbage collector is not doing what I expect. I have a simple timer that reloads data from a database creating small amounts of garbage regularly. I can see the memory usage increase until the container is killed due to OOM constraints. I would have expected the garbage collector to free up memory before this happened.

I am wondering if the container believes it has access to as much memory as the container host (3.5GB) and therefore isn't scheduling garbage collection as it feels under no pressure?

@richlander
Copy link
Member

Good question. We have talked about this and I can see why it would be a good idea. We will look into it. I expect we have some tests that will re-create this situation so that we can validate it ourselves.

/cc @Maoni0 @MichaelSimons @glennc @kendrahavens

@geoffpatehutch
Copy link
Author

Just a little more info - I ran the same container with increased memory allocation (2GB), and the memory climbed steadily until it hit 160MB. At that point there was a big drop back down to ~86-88MB and it has stayed there since. Not sure why 160MB would be a magic number, but thought it might be of some use in your investigations!

@MichaelSimons
Copy link
Member

FWIW I created a simple .NET Core console application that continuously allocates memory (arrays of string). As I run it in a container and monitor it with docker stats it seems to behave as I suspect. By that I mean if I hold onto the references, the memory usage will continue to grow until it hits the hard limit set by --memory (w/--memory-swap=0) and will automatically get killed by Docker. If I set --memory-swap=-1, the program will continue running. Also if I release references, I can see GC kicking in and freeing up memory.

@Maoni0
Copy link
Member

Maoni0 commented Mar 16, 2017

handling limited memory situation on Linux is something that @rahku is currently working on. however, could we please first make sure it is the GC that would help in this case, ie, it's memory on the GC heap that keeps growing, not due to something else. You could either induce a GC yourself, or use sos to look at the GC heap. If it's indeed the GC heap that seems to not be aware of the limit you specified, then we can certainly try out @rahku's changes.

@rahku
Copy link

rahku commented Mar 16, 2017

It will be super helpful if you could try changes in PR dotnet/coreclr#10064 and provide feedback if this works as you would expect it to behave.

@geoffpatehutch
Copy link
Author

I tried adding a call to GC.Collect and the problem disappeared which led me to believe it was a GC problem. I'm happy to try @rahku 's change - how would I go about doing that? At the moment I build my container using "FROM microsoft/dotnet:1.1-runtime". Is there an alternative image base somewhere I can use to incorporate these changes?

@Maoni0
Copy link
Member

Maoni0 commented Mar 16, 2017

@rahku could you please help @geoffpatehutch with getting a current build to try (or whatever that's easier for him to invoke your changes)?

@rahku
Copy link

rahku commented Mar 16, 2017

@geoffpatehutch the changes are in .net core 2.0 which is not released yet. Easiest way to try this would be to wait for the PR to merge and then you can get setup files to install from core-setup repo. I should be able to merge this early next week.

@MichaelSimons
Copy link
Member

@geoffpatehutch, Once this gets checked in and picked up by CLI, you can use the microsoft/dotnet-nightly:2.0-sdk image from the dotnet-nightly repo to verify the change.

@rahku
Copy link

rahku commented Mar 27, 2017

@geoffpatehutch could you try the nightly container images. It has my changes now.

@MichaelSimons
Copy link
Member

@geoffpatehutch - did you ever get a change to try this? 2.0 preview images are now also available on microsoft/dotnet.

@geoffpatehutch
Copy link
Author

I haven't yet as its not quite as simple as tweaking the dockerfile to use the dotnet 2.0 runtime, I need to set some time aside to install the new version and compile with it. I assume you can install 1.1 and 2.0 side by side and just switch targets at compile time?

@axelbodo
Copy link

there is another issue and a referenced link in github: moby/moby#20688
For our java applications we can temporarily solve the issues by injecting cgroup limit as a variable to use as max heapsize at java commandline. unfortunatelly dotnetcoe doesn't have such an option so their decisions is still based on what is reported by /proc and/or sysinfo, and they reports host resources, not cgroup limits. I think the temporary solution would be if dotnet(run) also would have an option for setting max heap size. The final solution which is under wide discussion is that the docker container reports container limits, not host values via /proc and sysinfo.

@rahku
Copy link

rahku commented Jun 10, 2017

dotnet core is still based on what is reported by /proc and/or sysinfo

@axelbodo this is not correct for .net core 2.0. .net core 2.0 reads memory limits for cgroups if they are set.

@axelbodo
Copy link

axelbodo commented Jun 10, 2017

Thanks for the info, we use now netcore 1.1, and I don't know when we are enabled to go to 2.0. Am I right it has only a preview release now?

@kendrahavens
Copy link
Contributor

@axelbodo Yes, .NET Core 2.0 is still in Preview.

@MichaelSimons
Copy link
Member

Closing as this is fixed with .NET Core 2.0

@axelbodo
Copy link

axelbodo commented Jun 26, 2017

@rahku, @MichaelSimons I've opened a ticket on moby moby/moby#20688, which targets to solve this problem, and I think it can be solved in the official dotnetcore image as well, however it would depend on whether on the host lxcfs is properly setup. It would solve the problem without using cgroup based heap setup in 2.0 Preview, which I think is just half of the solution, as it doesn't reflect cgoup hierarchy, which, I think, is solved in lxcfs.

I meaun what about if the image is compiled with 4g hard limit or is started -m 4g on a machine which only has 1g. I've tested, lxcfs is aware of that, and even, I think, it is aware if the host has 64g, a conteiner is tarted with 1g hard limit, and anothe ris started in that container with 4g hard limit, in this case lxcfs will report 1g, not 4g.

root@somewhere-on-aws-with-1g-memory:/var/run/runc/d7634b7fb73de4256d5ccd6284b3ad038e44c9e474afd3533c304f2ed5e8c078# docker run -it -m 4g --name ubuntu1 -v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rprivate ubuntu WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. root@86cb4da87816:/# cat /proc/meminfo MemTotal: 1014352 kB ... root@86cb4da87816:/# ls /sys/fs/cgroup/memory/ cgroup.clone_children memory.force_empty memory.kmem.slabinfo memory.kmem.tcp.usage_in_bytes memory.move_charge_at_immigrate memory.soft_limit_in_bytes memory.use_hierarchy cgroup.event_control memory.kmem.failcnt memory.kmem.tcp.failcnt memory.kmem.usage_in_bytes memory.numa_stat memory.stat notify_on_release cgroup.procs memory.kmem.limit_in_bytes memory.kmem.tcp.limit_in_bytes memory.limit_in_bytes memory.oom_control memory.swappiness tasks memory.failcnt memory.kmem.max_usage_in_bytes memory.kmem.tcp.max_usage_in_bytes memory.max_usage_in_bytes memory.pressure_level memory.usage_in_bytes root@86cb4da87816:/# cat /sys/fs/cgroup/memory/memory.limit_in_bytes 4294967296 root@86cb4da87816:/#

@axelbodo
Copy link

axelbodo commented Mar 2, 2018

@rahku It turned out for me nowadays, the dotnet core 2.0 still have the OOM issue, even it it is the only process in the container. I've started a process in a container with 2G limit, which allocates 100MB every 2 seconds, and after the 20th allocation, the dotnet core definitly tried to allocate beyond the limit.

@Maoni0
Copy link
Member

Maoni0 commented Mar 2, 2018

is this a Windows container or Linux container? in a windows container, you should not be able to allocate more than 2GB (I am assuming you are holding onto all the memory you allocated) and that's enforced by the OS feature, not even GC. I am not familiar with Linux containers. @janvorli do you have sufficient knowledge there?

@janvorli
Copy link
Member

janvorli commented Mar 2, 2018

@axelbodo are you sure your test is not holding references to the allocated blocks in some way?

@rahku
Copy link

rahku commented Mar 2, 2018

@axelbodo work was done to ensure that when you go beyond the memory limit set in cgroups GC would be notified to start collection. Dot net is not going to limit your allocations ....if you keep allocating beyond the limit, in linux your process will get killed.

@axelbodo
Copy link

axelbodo commented Mar 3, 2018

I'm sure to holding refernces as that was the intention. However it would be good, if the gc knows, that we would go beyond the limit, instead of getting a deterministic suicide hadshot, the gc can return an OutOfMemoryException to give chance to the application, to make a safe a recovery or shutdown.

@Maoni0
Copy link
Member

Maoni0 commented Mar 3, 2018

you specified a memory limit on the container; it's the container's responsibility to give you that OOM. the question to ask is, why the container isn't giving you the OOM.

@axelbodo
Copy link

axelbodo commented Mar 5, 2018

yes, I'm specified 2G memory, when, I'm at ~1.9G+ allocation, in the cycle the GC tried to allocate the next 100M memory, without throwing OutOfMemoryexception. The kernel asp per cgroup documentation, in this case (exactly when the commit happens), triggers the OOM kill (with no 9 signal), causing the application a sudden death. So linux handles OOM as usuall, but the situation is deterministic, it is because the application went beyond the limit.
In case, when we run only one process in the container, that process (can) excactly know, how much merory is allocated itself, and this is the only contribution to the control group, and can refuse memory allocation requests in the appropriate memory manager way (i.e. throwing an OutOfMemroyException), which would overcommit the requested hard limit.

As I see, in https://github.com/dotnet/coreclr/issues/14991#issuecomment-344867841, there is an effort to solve OOM issue (like in the provided syslog at 3690.531632). I'll take a look at inside 2.0.3.

MichaelSimons added a commit to MichaelSimons/dotnet-docker that referenced this issue Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants