Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asp.net Core not Collecting Garbage #3409

Closed
sebastienros opened this issue Aug 10, 2018 · 66 comments
Closed

Asp.net Core not Collecting Garbage #3409

sebastienros opened this issue Aug 10, 2018 · 66 comments
Assignees

Comments

@sebastienros
Copy link
Member

Copied from #1976

@sinapis commented on Dec 19, 2017

Hi,

I have the same issue with ASP.NET Core 2. I've took a memory dump and tried analyzing. From what I see the problem is exactly as the OP said. My application starts with allocation about 75 MB, and very quickly it goes all the way to ~750MB, out of it 608MB is "Unused memory allocated to .NET".

First snapshot at app start:
image

Second snapshot after 3 minutes and 100 requests:
image

@sebastienros commented on Apr 5

  • We only measure this on 2.1, I will look into adding the same thing for 2.0
  • All comments seem to be related to to the LOH issue that was mentioned, which should be taken into account in the application and pool large arrays as much as possible

@sinapis commented on Apr 8

@sebastienros, several questions:

  1. I used Ants profiler to measure memory usage, according to it, no LOH fragmentation was detected. Can you advise how can I verify if my application suffers from LOH fragmentation issues?
  2. What are the results on .net core 2.1? Is the issue resolved because Kestrel is using Span?
  3. What if we can't pool arrays - can you provide a workaround? Should we use GCSettings.LargeObjectHeapCompactionMode.CompactOnce?

@davidfowl commented on Apr 8

What are the results on .net core 2.1? Is the issue resolved because Kestrel is using Span?

We personally haven't seen any evidence that the issue is in Kestrel. It still looks like an application problem.

What if we can't pool arrays - can you provide a workaround? Should we use GCSettings.LargeObjectHeapCompactionMode.CompactOnce?

@sebastienros commented on Apr 10

@sinapis
Would any of you be able to share an app that shows the issue? If I could repro it we would be able to find the reason, or at least remove some code piece by piece and isolate a minimal repro.

@sinapis commented on Apr 10

@sebastienros I can't share the app, but I can share the session from PerfView session + memory dump.
Some description: I have a ASP.NET Core 2 Web API, I've created a load test of 200 users all sending the same request over 10 seconds. Overall 775 requests were processed.

This app jumped to almost 1 GB memory usage in task manager and stayed like that. Looking at the dump I can count about 18 MB:

image

So the questions is where did almost 1 GB go?

@sebastienros commented on Apr 10

@sinapis Thanks

The behavior you are describing is not unexpected, the GC will allocated some memory as necessary on the peak load, and just release it over time. It's the GC Server mode, and usually wait for idle periods to release it and not affect your app perf. The amount of memory it will reserve depends of the total memory available on the system.

We would definitely see an issue if it kept increasing. I assume that if you don't send anymore requests and let your app run you will see the memory usage going down.

Could you run the same thing until it consumes most of your system memory? Or at least long enough with the same load that it will show it growing continuously? I will still get a look at your current dumps.

@sebastienros commented on Apr 10

Also can you take dumps during and at the end of the jobs, so we can see the detals.

@Maoni0 commented on Apr 11

@sinapis I looked at your ETW trace. it is puzzling to me - you survived very little in the last induced gen2 GC yet we still chose to keep that much memory committed. your application seems edge case (you mostly just did only background GCs due to LOH allocations) - I wonder if we have some accounting errors there (another possibility is errors in the numbers reported but if you already verified that you have that much committed that's a smaller possibility). if you could repro with something you can share with me, that'd be fantastic; otherwise if it's possible to run your app with some logging from GC (I can give you a commit that does that) that'd be helpful too.

@sinapis commented on Apr 11

@Maoni0 please share how should I enable GC logging
If there is some other data you would like me to provide in order to disprove accounting error please let me know what should I provide you with and how (maybe tell perfview to collect more data?)
I'll try creating a minimum repro, but not sure I'll succeed since I don't know where the problem is.

@sebastienros hopefully I will provide another dump with more memory consumption today

@sebastienros
Copy link
Member Author

@sinapis

Are you still waiting from us to make progress on this issue?

@Gorthog
Copy link

Gorthog commented Aug 12, 2018

Hi,

That project was abandoned but it's very important for us to get to the bottom of this issue. I could use the info how to record more info on GC.

Also, please describe what exactly I can provide that will help got to the bottom of this issue.

@Maoni0
Copy link
Member

Maoni0 commented Aug 13, 2018

@sinapis are you trying to find out the issue you described above or just interested in recording more info in general? for the former, are you in a position to build a private coreclr.dll and use that in your project that's showing the symptom you described? if so I can just give you a commit with proper logging enabled.

@Gorthog
Copy link

Gorthog commented Aug 13, 2018

@Maoni0 Yes, I can build a private coreclr.dll and use it. If there is are additional logging you would like me to add, I'd be more than happy to do so. In general, tell me what you need. I'll do whatever I can.

@NickCraver
Copy link
Member

Environment

We are currently on .NET Core, version 2.1.300, but are installing 2.2 preview 3 for testing likely this week. I'll re-visit this issue then to confirm or show changes.

Details

We are seeing this at Stack Overflow with our canary .NET Core web socket server as well. We are observing 2 behaviors:

  1. GC does not appear to be running on it's own. We are seeing Gen 0 grow to gigabytes (at which point we have to kill it).
  2. When GC is forced to run, overall process memory is not freed.

Here's a before and after of a forced all-generations GC run:

Before After
before after

Note that gen 0 is cleared, the long-lived websocket object trees (expected) are moved to gen 2, and overall memory usage drops to 530MB. However, overall process memory was at 2.8GB and remains at 2.8GB. Even though only 530MB is used internally.

We currently have a timer running inside the application forcing GC to get by for other testing, but that's not a great idea and really shouldn't be necessary. I currently consider this a blocker to deploying Stack Overflow on .NET Core. What can we do to help? We have memory dumps and can readily do analysis and testing to assist.

@Tratcher
Copy link
Member

Tratcher commented Nov 7, 2018

How's the fragmentation? That can cause symptoms like this when there's lots of pinned buffers.

@sebastienros
Copy link
Member Author

How much memory is on the system?

@davidfowl
Copy link
Member

There should be no new pinned buffers when using ASP.NET Core. The only pinned buffers are the ones Kestrel uses.

@sebastienros
Copy link
Member Author

What we'll need:

  • Something we can repro consistently, and ideally locally
  • Memory dumps
  • Perfview traces under load

@NickCraver
Copy link
Member

NickCraver commented Nov 7, 2018

@Tratcher Attaching !dumpheap -stat and !eeheap-gc here: SocketServerDbg.txt (happy to do any other breakdowns as well) - this is on the 530MB dump (assuming you wanted highest fragmentation).

@sebastienros Specs:

  • 24 Cores, 48 logical processors
  • 64GB of RAM (43.3GB in use)

I don't think we can provide memory dumps here as they contain user cookies due to the nature of the application and that's a security issue, but we're more than happy to assist as best we can. On the perfview front: send instructions and happy to help.

@benaadams
Copy link
Member

How's the fragmentation? That can cause symptoms like this when there's lots of pinned buffers.

Looking at the dump doesn't look huge?

Total 5432168 objects
Fragmented blocks larger than 0.5 MB:
           Addr     Size      Followed by
00000218476b92e8    1.2MB 00000218477e12d8 System.Byte[]

@sebastienros
Copy link
Member Author

GC does not appear to be running on it's own. We are seeing Gen 0 grow to gigabytes (at which point we have to kill it).

When you say "grow to gigabytes", this is not something that is unexpected. I understand you are aware that the Server GC will be triggered by memory pressure. If there is no pressure (you have 64GB) it doesn't have to be triggered. So it's all about how much available memory was left when you decided to kill it. Are these numbers the ones you shared before or were they higher that that?

Quoting the documentation:

Garbage collection occurs when one of the following conditions is true:

  • The system has low physical memory. This is detected by either the low memory notification from the OS or low memory indicated by the host.
  • The memory that is used by allocated objects on the managed heap surpasses an acceptable threshold. This threshold is continuously adjusted as the process runs.

These explanations are very fuzzy: "acceptable threshold". It doesn't tell what is acceptable or not, and you might have other acceptance thresholds ;)

When GC is forced to run, overall process memory is not freed.

Same. I assume that's an expected behavior, though it doesn't mean it's what should have happened, but nothing tells me it shouldn't behave this way from the data you shared.

@Maoni0 is this is an expected behavior based on the numbers that were provided?

I have never tried to run an ASP.NET app and experience how the GC behaves with such an amount of RAM to see what these thresholds look like.

@NickCraver
Copy link
Member

Are these numbers the ones you shared before or were they higher that that?

Oh no, it just takes a while to grow (due to the long-lived nature of sockets, connections ramp up slowly). It grows well beyond what's here.
I've let it run since posting ~4 hours ago and I just took another dump. It's at 5GB from the Windows view. Here's the breakdown:
image

I understand being lazy and using available memory...but that's way beyond anything I've ever seen in Gen 0.

With every .NET full framework application we have today: GC runs as expected, and memory is freed instantaneously or very shortly after a run. This is critical behavior for a shared web tier. If every app only runs GC when there's memory pressure, you're lighting 8 sticks of dynamite at once all to go off at the same time and pegging CPU when they do get a pressure signal. I'm really hoping this is a GC bug, because if it's expected behavior we have legitimate concerns and blockers with such behavior.

Standing by to run whatever diagnostics will help. And we're more than happy to run a custom build with better logging, etc. Whatever we can do - this is my highest priority to fix. I'm happy to force a GC and get numbers, but will await feedback here before doing so - lest we lose data you'd like to see.

@benaadams
Copy link
Member

benaadams commented Nov 7, 2018

My assumption; perhaps incorrect, was a Gen0 would be triggered if it needed to allocate more from the OS (or after xN OS allocations)?

@sebastienros
Copy link
Member Author

Would you like to try the workstation GC as a mitigation for now?

@NickCraver
Copy link
Member

@sebastienros we have a mitigation (GC on a timer), but are more concerned with solving the root issue. I'm loading the latest dump up now to see what state the GC itself is in and see if I can find anything. Potentially relevant: we're using libuv as the base layer here, which may or may not be common to the other issues? I'll update with more info as I dig here.

@NickCraver
Copy link
Member

Thread dump for the curious: SocketServerThreads.txt

@Maoni0
Copy link
Member

Maoni0 commented Nov 7, 2018

@NickCraver are you on Windows?

@Maoni0
Copy link
Member

Maoni0 commented Nov 7, 2018

and @sinapis I apparently completely dropped the ball on the logging part...really sorry about that! I dunno if it still helps to give you a commit to log now...

@NickCraver
Copy link
Member

@Maoni0 Yep, Windows Server 2016 - happy to test a custom build if it'll give you more data.

@Maoni0
Copy link
Member

Maoni0 commented Nov 7, 2018

@NickCraver could you please collect some GC ETW traces? you can use this commandline for perfview:

perfview /GCCollectOnly /nogui /accepteula /NoV2Rundown /NoNGENRundown /NoRundown /merge:true /zip:true /BufferSize:3000 /CircularMB:3000 /MaxCollectSec:7200 collect

this collects for 7200s. you can start it right before you start your process. if 2 hours isn't enough feel free to replace it with longer. please share the resulting trace.

@NickCraver
Copy link
Member

@Maoni0 A few qeustions:

  • It's a shared web tier - is there a switch to restrict it to a single process/PID to eliminate lots of noise? (Last I'm aware, it's only machine-wide, but maybe there's a filter on the logging side?)
  • Does the process need to be restarted, or should I capture as it's running now? Or both?

@sebastienros
Copy link
Member Author

You can't restrict to a single process, but share the process id and/or the startup dll name.
You don't have to restart the process.

@NickCraver
Copy link
Member

Alrighty - capture is running now, I'll have results in a few hours - can we please arrange a secure/NDA (we already have NDA in place) upload in the mean time? craver at stackoverflow on my end

@Maoni0
Copy link
Member

Maoni0 commented Nov 7, 2018

capturing it running while it already has a big gen0 is not going to help.

GC is not a point in time thing, in order to figure out why you are in a situation, we need to see the history. so we need a trace to capture enough history that led up to that point. so if it takes 2 hours before the gen0 gets to be a couple of GB, we need those 2 hours captured. or you could capture at least a while before it got to that point.

@NickCraver
Copy link
Member

NickCraver commented Nov 7, 2018

Understood - restarting the process and capture here.

Edit: Now restarted. I set the capture to 3 hours for a few GB (lower traffic at night...takes longer), and I received the folder invite to upload to - thanks!

@NickCraver
Copy link
Member

@sebastienros @Maoni0 The .etl output is in the dropbox.

Process details:

  • PID: 14704
  • Command line: dotnet.exe c:\SocketServer\SocketServer2.dll --environment=NY-Prod --service
  • ~2.02GB windows memory at the end of the capture

@sebastienros
Copy link
Member Author

is in the dropbox

Well, OneDrive you mean? ;)

@Maoni0
Copy link
Member

Maoni0 commented Nov 9, 2018

@sebastienros or you can run it in a container if you simply want to limit that process's memory usage. on Windows this is a job object.

@NickCraver
Copy link
Member

I don't understand what you mean by "the free memory". could you please elaborate?

As in windows still sees the overall memory usage, it doesn't go down any.

The GC behavior here would be good if it was a dedicated machine. But it's not great behavior in a shared server situation (which we have a lot of). This includes local development on Windows 10 (our only option for Visual Studio). As servers get more and more cores (Intel is going to double the amount of cores soon), are users expected to absorb double gen0 budgets as a result? It seems at some point we need this to be adjustable or the inherent defaults to account for overall limits. Core counts are adjusting fast, and with that variable in play without a corresponding control...things will get hairy. It's increasingly difficult to manage things on shared systems as a result, and the more cores that are in play, the more sharing that's likely...or likely to be more complex at least.

On the dumps front: we have sensitive data in there, but I'm happy to pull the data from the existing dumps if you've got any WinDBG pointers or other such that that's reasonable to convey (understandable if not).

@chrisnas
Copy link

chrisnas commented Nov 9, 2018

@NickCraver: increasing the number of cores should come with a RAM increase. The sizing of servers stays a complicated problem. At work, we are going to linux containers managed by Mesos and I'm not sure that terminating a process when a threshold is detected is the right solution (it would be the same with a Windows Job).
If, as @Maoni0 said, you have pinned objects, feel free to use the tool I mentionned earlier: you will see if red vertical lines are near the end of the gen0 segments + full of white space: it would explain the difference between the size of "used" objects and the size of allocated segments. In addition, you will get the class name of these pinned instances as an hint of possible area of improvement.

@Maoni0: do you have a timeframe for having in .NET Core the settings you described in https://blogs.msdn.microsoft.com/maoni/2018/10/02/middle-ground-between-server-and-workstation-gc/?
The could be a solution to limit the gen0 growth and be part of the sizing of apps on a server.

At Windows level, the memory manager is able to deal with a lot of applications consuming "memory" beyond the available RAM; especially if the heaps contain objects that are no more accessed (would be easily paged out on very fast SSD). I don't have exact numbers about how many complete 4KB pages of gen0 are not accessed at all before a collection but could be important.

@NickCraver
Copy link
Member

We have very few pinned objects (only what Kestrel pins) - and my demo app above demonstrates that this happens with or without. Assuming servers come with RAM increases is not safe. Intel releases X performance with new cores (or AMD, whatever floats your boat!) but we don't need more RAM. We get no benefit from it. As an SRE and the person who specs most of our server hardware, I'm going to have a hard time justifying "because the GC doesn't run" as the cause for needing to spend thousands more per box on memory.

Intel is planning on doubling the number of cores coming up, we don't need double the RAM. At least, not in the foreseeable future. I think any assumption that RAM goes with cores is not a valid one we should be basing technical decisions on here, FWIW.

@chrisnas
Copy link

chrisnas commented Nov 9, 2018

@NickCraver: my point was more related to the number of applications you want to run on a machine (hence my Mesos example). The more apps, the more RAM.
However, I agree with you that each app should be able to "control" its high end memory consumption (the only way/workaround I know today with the CLR is to use the settings I mentionned). I don't believe in the hard limit that would either trigger an OOM on Windows with Jobs and process termination on Mesos.
This is a kind of new (and stronger) requirement for .NET (maybe @Maoni0 could comment). In addition, with the raise of containers, it would really help to avoid the "terminator" threshold. BTW, one of Criteo developers pushed a PR to take into account the per container (via cgroup) core count in a previous version of Core CLR for this exact same reason. IMHO, memory consumption limit should not be bound to number of CPU cores .

@NickCraver
Copy link
Member

Maybe I'm just being obtuse in this case, but is there any reason the GC doesn't just run every so often? Not running in hours is our case. Is there a way to tell it to run on an interval, regardless of usage? That's effectively what our workaround way, a timer with a GC call...but that seems pretty silly for every application to do. It seems like a sanity check of:

Let's say once every 5 minutes: "have I run in 30 minutes? No? Okay run", otherwise do nothing. Is that an insane idea? I'd be surprised if such a thing hadn't been discussed before...I'm curious what the thinking is there.

@benaadams
Copy link
Member

benaadams commented Nov 9, 2018

Can bring down the SOH segment size via environment variable; prior to startup?

COMPlus_GCSegmentSize

Is hex value in bytes and > 4MB & multiple of 1MB, so 1000000, 2000000, 4000000, etc

According to docs it defaults are the following:

64bit: > 8 CPUs its 1GB on server; whereas 256MB on workstation
32bit: > 8 CPUs its 16MB on server; whereas 16MB on workstation

[Credits to @kevingosse on twitter, for working out units and that its hex.]

That base level will be then multiplied by the number of heaps? Which leads also to the other knobs:

do you have a timeframe for having in .NET Core the settings you described in "Middle Ground between Server and Workstation GC"

Should be able to set them via the environment variables; prior to execution

COMPlus_GCHeapCount // starts at core count
COMPlus_GCNoAffinitize 
COMPlus_GCHeapAffinitizeMask 

@rwkarg
Copy link

rwkarg commented Nov 9, 2018

@NickCraver We've implemented basically that in our base framework for WAS hosted services. There's some periodic handle usage by WAS or IIS to monitor the service (something like 3 Event, 1 Thread, and 1 something else handle if I remember correctly) that will eventually get released when the finalizer runs so it eventually gets cleaned up when GC happens.
But, we have some services that receive virtually no traffic and so never get GC run. The process handle count just keeps increasing by 5 every few seconds. After a few days the handle count and associated memory usage start getting absurd (10+GB on a shared web host).
We just check every hour to see if GC has run since the last check, and if not, request a collection.

Having the GC run "every so often" would eliminate the need for us to have this timer code.

@benaadams
Copy link
Member

benaadams commented Nov 9, 2018

Thinking about what's different...

Running raw Kestrel (without a Job object or cglmits) is going to behave to the GC defaults; though I assume @NickCraver is using IIS.

So question is does IIS use Job objects; where the GC maximums would be obeyed; however a difference with ASP.NET Core 2.x is it spawns a second dotnet process to run (2.2 will allow in-proc).

Does that second process then still obey the limits set by IIS?

@NickCraver
Copy link
Member

In this case - no IIS (though we will be elsewhere). The web socket server is netcoreapp2.1 via dotnet.exe listening on a port directly with Kestrel (we can open source this if we need...it doesn't even depend on Stack Overflow code). I'm curious about all this all the same though, and how it'll apply to both this and our next applications (which will be in-proc 2.2).

@jkotalik
Copy link
Contributor

jkotalik commented Nov 9, 2018

ANCM uses job objects when creating a new dotnet.exe process: https://github.com/aspnet/AspNetCore/blob/master/src/IISIntegration/src/AspNetCoreModuleV2/OutOfProcessRequestHandler/serverprocess.cpp#L74.

@benaadams I'm confused about:

So question is does IIS use Job objects; where the GC maximums would be obeyed;

IIS doesn't have a GC. How would these maximums be set by IIS? Environment variables would be respected if WAS is restarted.

@benaadams
Copy link
Member

So question is does IIS use Job objects; where the GC maximums would be obeyed;

IIS doesn't have a GC. How would these maximums be set by IIS?

I'd assume it would use these settings?

image

@CJHarmath
Copy link

Hitting those app pool limits just cause an app pool recycle ( spawns a new w3wp.exe while the existing finishes serving the current requests). IIS worker processes don't use job objects, the ASP.NET Core IIS module seems to be, but that's different.

I've seen Job objects mentioned here and at the parent issue as well and did a quick test and it does not seem that setting lets say a 100Mb memory limit on dotnet.exe via adding it to a job object is triggering GC's either. ( I could be doing it wrong).
Did a dummy web api and started hammering it and the memory usage kept climbing up until dotnet.exe died when hitting the limit.
I've used https://github.com/nccgroup/WindowsJobLock as below

.\Win.JobLock.exe -p 21780 -n dotnet -m $(100Mb)

@arishlabroo
Copy link

Wanted to share my related experience, Hopefully it is relevant to this discussion.

Over the last one year we moved to a more micro servicey architecture and have 20+ containerized applications deployed on many XXL instances in AWS ECS.

Each deployable unit based on microsoft/dotnet:2.1-aspnetcore-runtime image

We do a lot of caching, and work offloading via queues and workers, hence more than 50% of our http services, except for the user facing ones , are hit at avg ~200 requests per minute.

Keeping this is mind we had set our resource allocation for these sparsely hit services very modestly. 128mb memory soft limit and 512mb hard limit.

As soon as we released, we noticed that the memory graph was consistently at 400%, and the CPU was extremely spikey, perf was also erratic. Since we have hard limits set on our containers, unlike what OP observed, we consistently ran into memory pressure which caused GC, which caused CPU spikes, alerts, restarts, hairloss.

After noticing that other users have had a similar experience , we decided to allocate each service with 1024MB memory limit. Which felt too excessive given what these services do. However, as soon as we did that, everything stabilized. Stable lines, no spikes, no restarts. Even if we turned off caching, the services handled the load flawlessly 👏 .

However, this is not sustainable. We are enjoying the small small many independent services, but giving each service minimum 1GB becomes a bottleneck. Our solution so far has been to add more instances to our cluster to accommodate new services. Its getting too expensive :( .

I love C# and .Net, so please forgive me for calling this out, but we recently deployed 3 of our new services in Go, and they have been chugging along fine with just 128mb hard memory limit. For this reason primarily, it is becoming hard to justify using aspnet core for our newer services.

@kierenj
Copy link

kierenj commented Nov 12, 2018

@arishlabroo could that be this bug - https://github.com/dotnet/coreclr/issues/19060 and corresponding PR dotnet/coreclr#19518 ?

@arishlabroo
Copy link

@kierenj 🤔, seems possible, but wouldn't the latest microsoft/dotnet:2.1-aspnetcore-runtime image include that fix? Since the bug was closed 3 months ago.

@wanton7
Copy link

wanton7 commented Nov 12, 2018

@arishlabroo I think that fix landed in 2.1.5 that was released month ago October 2, 2018.

@sebastienros
Copy link
Member Author

@arishlabroo IMO this kind of many small services that are limited in memory should use the workstation GC. The processes would then hardly go over 100MB, and most probably less than 50MB.

@Maoni0
Copy link
Member

Maoni0 commented Nov 12, 2018

@arishlabroo as @sebastienros pointed out, have you tried Workstation GC? if not, would you mind trying it out?

@Maoni0
Copy link
Member

Maoni0 commented Nov 12, 2018

also I've seen this WindowsJobLock tool used by folks - this is NOT going to work for limiting GC heap usage because the physical limit is read when the runtime is loaded. if you wait till the process is already running and then specify the limits they are not going to get read, period.

I'm actually looking at processes running within a job with small memory limits this week so I will publish a blog entry to talk about my findings. for folks who want to specify a small limit, Workstation GC is certainly a good starting point if you are currently using Server GC.

@tmds
Copy link
Member

tmds commented Nov 12, 2018

@Maoni0 @sebastienros is it possible to provide some recommendations?
If I'm running Kestrel with 1 cpu assigned. At a low memory limit I should be using workstation GC. Is there a higher limit (what?), at which it makes sense to change to server GC?
If I'm running Kestrel on N cpus with server GC. Is there a recommended amount of RAM I should provision per core?

@Maoni0
Copy link
Member

Maoni0 commented Nov 12, 2018

@tmds for sure...I will be providing guidelines in my blog entry. thanks for your patience!

@sebastienros
Copy link
Member Author

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests