Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotnet-gcdump collect causes OutOfMemoryException #2038

Closed
iliamosko opened this issue Mar 2, 2021 · 5 comments · Fixed by dotnet/docs#29248
Closed

Dotnet-gcdump collect causes OutOfMemoryException #2038

iliamosko opened this issue Mar 2, 2021 · 5 comments · Fixed by dotnet/docs#29248
Assignees
Labels
documentation Documentation related issue
Milestone

Comments

@iliamosko
Copy link

Description

When we are using gcdump to troubleshoot a server that is running a dockerized application, OutOfMemoryExceptions become apparent when the gcdump collect command is being executed. On this server we have a hard limit on the amount of memory that is available with no file swap allowed in the container. Once the application consumes ~90% of the allocated memory, we would run a gcdump command: “dotnet gcdump collect -p ” inside the container. A spike in the applications memory could be seen when the collect command is executed with the application crashing shortly after with an OutOfMemoryException.

Configuration

Net version: .NET 5.0
OS: Linux, Docker container.
Architecture: ARM64
MEMORY: 90GB

Data

Server-stats
This is a small snippet of the server’s memory graph at some points when gcdumps were taken. The application's memory could be seen by the red line and the blue line is the total available memory. The application’s memory is seen to be ~80Gb which is ~90%-95% of the allocated memory for that server. At that point we start the gcdump collection, memory spikes ~10Gb which results in the application maxing the allocated memory which then results in an OutOfMemoryException which inturn crashes the application. The application is then restarted, once again, the application starts showing signs of memory exhaustion, another gcdump is taken with the same outcome as the first one.

Analysis

Looking over the documentation for gcdump and its normal operation, it does not state what the minimum amount of memory is required for gcdump to run safely.

All I have found is this:

To walk the GC heap, this command triggers a generation 2 (full) garbage collection, which can suspend the runtime for a long time, especially when the GC heap is large. Don't use this command in performance-sensitive environments when the GC heap is large.

This then leads to the question, is there any suggested amount of free memory needed to run gcdump safely, or does it solely rely on how large the GC heap is at that point?

@sywhang
Copy link
Contributor

sywhang commented Mar 2, 2021

Sorry you ran into issues @iliamosko.

The issue is probably due to the following sequence of events happening:

  1. The GC heap is huge
  2. dotnet-gcdump itself (not the target process being diagnosed) ends up eating a bunch of memory because it will try to form an object graph based on the events emitted from the target process.
  3. Swapping is disabled, so dotnet-gcdump can't swap out the excess memory usage to file either.
  4. Container gets OOM killed.

Couple of things you could try doing:

  1. Could you use dotnet-dump instead of dotnet-gcdump? This will try to write out the dump to a file. There are commands in dotnet-dump that you can use to diagnose memory issues (which I assume you are trying to do).
  2. If you want to keep using dotnet-gcdump, could you enable swapping in your container?

@josalem do you have a rough estimate on how much memory the object graph dotnet-gcdump forms would eat up in a large GC heap?

@josalem
Copy link
Contributor

josalem commented Mar 2, 2021

@sywhang is most likely correct. If the memory threshold for the container is "just right" for the target application and you collect a gc-dump, you may trip the memory limit.

The actual file that gets output is fairly compact compared to the heap size if I recall. More likely, an OOM would get triggered while the tool collects the incoming events and parses them to build the graph. It stores the data from each event in arrays of values and objects, which, if the heap is sufficiently large, could grow quickly. I haven't done any analysis about what the memory overhead of that process is, though.

@sywhang
Copy link
Contributor

sywhang commented Mar 2, 2021

We should probably analyze the memory usage of the tool itself and document that as part of dotnet-gcdump docs.

@josalem josalem added the documentation Documentation related issue label Mar 2, 2021
@tommcdon tommcdon added this to the 6.0.0 milestone Mar 3, 2021
@tommcdon tommcdon added the enhancement New feature or request label Mar 3, 2021
@baal2000
Copy link

@sywhang
Would you be able to come up with a guidance on how much memory the gcdump tool would need on average in proportion to the process heap size?
Thanks for looking into this.

@baal2000
Copy link

Could we raise priority of this issue for this very important member of .NET Core tollbox?

The cost of unpredictable tool behavior is just too high in a production environment, making it effectively useless because of the added risk.

@tommcdon tommcdon modified the milestones: 6.0.0, 7.0.0 Jun 21, 2021
@mikelle-rogers mikelle-rogers removed the enhancement New feature or request label Apr 20, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Documentation related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants