From f9e971f01ac0957c9acc185e68d344c7ccc9e9a8 Mon Sep 17 00:00:00 2001 From: Steve MacLean Date: Tue, 27 Aug 2019 13:14:32 -0400 Subject: [PATCH] Add diagnostic tutorials Add tutorials to walk through key use cases. --- .../app_is_leaking_memory_eventual_crash.md | 118 ++++++++ .../diagnostics/app_running_slow_highcpu.md | 114 ++++++++ docs/core/diagnostics/diagnostic-scenarios.md | 48 ++++ docs/core/diagnostics/hung_app.md | 262 ++++++++++++++++++ docs/core/diagnostics/index.md | 4 + docs/core/diagnostics/sample-debug-target.md | 56 ++++ docs/core/toc.yml | 12 + 7 files changed, 614 insertions(+) create mode 100644 docs/core/diagnostics/app_is_leaking_memory_eventual_crash.md create mode 100644 docs/core/diagnostics/app_running_slow_highcpu.md create mode 100644 docs/core/diagnostics/diagnostic-scenarios.md create mode 100644 docs/core/diagnostics/hung_app.md create mode 100644 docs/core/diagnostics/sample-debug-target.md diff --git a/docs/core/diagnostics/app_is_leaking_memory_eventual_crash.md b/docs/core/diagnostics/app_is_leaking_memory_eventual_crash.md new file mode 100644 index 0000000000000..23b13317afc60 --- /dev/null +++ b/docs/core/diagnostics/app_is_leaking_memory_eventual_crash.md @@ -0,0 +1,118 @@ +--- +title: Debugging a memory leak - .NET Core +description: A tutorial walk-through, debugging a memory leak in .NET Core. +author: sdmaclea +ms.author: stmaclea +ms.date: 08/27/2019 +--- +# Debugging a memory leak + +In this scenario, the endpoint will slowly start leaking memory and eventually will result in an out of memory exception. To diagnose this scenario, we need several key pieces of diagnostics data. + +## Memory counters + +Before we dig into collecting diagnostics data to help us root cause this scenario, we need to convince ourselves that what we are actually seeing is a memory leak (memory growth). We can use the [dotnet-counters](dotnet-counters.md) tool to get at this information. + +Lets run the [Sample debug target](sample-debug-target.md). + +```bash +dotnet run +``` + +Then find the process ID using: + +```bash +dotnet-trace list-processes +``` + +Before causing the leak, lets check our managed memory counters: + +```bash +dotnet-counters monitor --refresh-interval 1 -p 4807 +``` + +4807 is the process ID that was found using `dotnet-trace list-processes`. The refresh-interval is the number of seconds between refreshes. + +The output should be similar to: + +``` + Press p to pause, r to resume, q to quit. + System.Runtime: + CPU Usage (%) 4 + Working Set (MB) 66 + GC Heap Size (MB) 4 + Gen 0 GC / second 0 + Gen 1 GC / second 0 + Gen 2 GC / second 0 + Number of Exceptions / sec 0 +``` + +Here we can see that right after startup, the managed heap memory is 4 MB. + +Now, let's hit the URL (http://localhost:5000/api/diagscenario/memleak/200000) + +Rerun the dotnet-counters command. We should see an increase in memory usage as shown below: + +``` + Press p to pause, r to resume, q to quit. + System.Runtime: + CPU Usage (%) 4 + Working Set (MB) 353 + GC Heap Size (MB) 258 + Gen 0 GC / second 0 + Gen 1 GC / second 0 + Gen 2 GC / second 0 + Number of Exceptions / sec 0 +``` + +Memory has now grown to 258 MB. + +We can safely say that memory is growing or leaking. The next step is collect the right data for memory analysis. + +### Core dump generation + +When analyzing possible memory leaks, we need access to the apps memory heap. We then analyze the memory contents. Looking at relationships between objects, we create theories on why memory isn't being freed. A common diagnostics data source is a memory dump (Win) or the equivalent core dump (Linux). To generate a core dump of a .NET Core application, we can use the [dotnet-dump)](dotnet-dump.md) tool. + +Using the previous [Sample debug target](sample-debug-target.md) started above, run the following command to generate a core dump: + +```bash +sudo ./dotnet-dump collect -p 4807 +``` + +4807 is the process ID that can be found using `dotnet-trace list-processes`. The result is a core dump located in the same folder. + +> [!NOTE] +> To generate core dumps, `dotnet-dump` requires sudo. + +### Analyzing the core dump + +Now that we have a core dump generated, use the [dotnet-dump)](dotnet-dump.md) tool to analyze the dump: + +```bash +dotnet-dump analyze core_20190430_185145 +``` + +Where `core_20190430_185145` is the name of the core dump you want to analyze. + +> [!NOTE] +> If you see an error complaining that libdl.so cannot be found, you may have to install the libc6-dev package. + +You'll be presented with a prompt where you can enter SOS commands. Commonly, the first thing we want to look at is the overall state of the managed heap: + +```bash +dumpheap -stat +``` + +The (partial) output can be seen below: + +![alt text](https://user-images.githubusercontent.com/15442480/57110756-7d32ac80-6cee-11e9-9b80-2ce700e7a2f1.png) + +Here we can see that we have quite a few strings laying around (also instances of Customer and Customer[]). We can now use the gcroot command on one of the string instances to see how/why the object is rooted: + +![alt text](https://user-images.githubusercontent.com/15442480/57110770-8face600-6cee-11e9-8eea-608b59442058.png) + +The string instance appears to be rooted from top-level Processor object which in turn references a cache. We can continue dumping out objects to see how much the cache is holding on to: + +![alt text](https://user-images.githubusercontent.com/15442480/57110703-4b214a80-6cee-11e9-8887-02c25424a0ad.png) + +From here, we can now try to back-track (from code) why the cache seems to be growing in an unbound fashion. diff --git a/docs/core/diagnostics/app_running_slow_highcpu.md b/docs/core/diagnostics/app_running_slow_highcpu.md new file mode 100644 index 0000000000000..4261146fc3110 --- /dev/null +++ b/docs/core/diagnostics/app_running_slow_highcpu.md @@ -0,0 +1,114 @@ +--- +title: Debugging high CPU usage - .NET Core +description: A tutorial walk-through, debugging high CPU usage in .NET Core. +author: sdmaclea +ms.author: stmaclea +ms.date: 08/27/2019 +--- +# Debugging high CPU usage + +In this scenario, the [sample debug target](sample-debug-target.md) will consume excessive CPU. To diagnose this scenario, we need several key pieces of diagnostics data. + +## CPU counters + +Before we dig into collecting diagnostics data, we need to convince ourselves that what we are actually seeing is a high CPU condition. + +Lets run the [sample debug target](sample-debug-target.md). + +```bash +dotnet run +``` + +Then find the process ID using: + +```bash +dotnet-trace list-processes +``` + +Before hitting the above URL that will cause the high CPU condition, lets check our CPU counters using the [dotnet-counters](dotnet-counters.md) tool: + +```bash +dotnet-counters monitor --refresh-interval 1 -p 22884 +``` + +22884 is the process ID that was found using `dotnet-trace list-processes`. The refresh-interval is the number of seconds before refreshes. + +The output should be similar to the below: + +![alt text](https://user-images.githubusercontent.com/15442480/57110746-75730800-6cee-11e9-81a8-1c253aef37ce.jpg) + +Here we can see that right after startup, the CPU isn't being consumed at all (0%). + +Now, let's hit the URL (http://localhost:5000/api/diagscenario/highcpu/60000) + +Rerun the [dotnet-counters](dotnet-counters.md) command. We should see an increase in CPU usage as shown below: + +![alt text](https://user-images.githubusercontent.com/15442480/57110736-6be9a000-6cee-11e9-86b6-6e128318a267.jpg) + +Throughout the execution of that request, CPU hovers at around 30%. + +```bash +dotnet-counters monitor System.Runtime[cpu-usage] -p 22884 --refresh-interval 1 +``` + +At this point, we can safely say the CPU is running a hotter than we expect. + +## Trace generation + +When analyzing a slow request, we need a diagnostics tool that can give us insight into what our code is doing. The usual choice is a profiler. There are a few different profilers options. + +### Profile with `dotnet-trace` then view with Windows `PerfView` + +We can use the [dotnet-trace](dotnet-trace.md) tool. Using the previous [sample debug target](sample-debug-target.md), hit the URL (http://localhost:5000/api/diagscenario/highcpu/60000) again and while its running within the 1-minute request, run: + +```bash +dotnet-trace collect -p 2266 --providers Microsoft-DotNETCore-SampleProfiler +``` + +22884 is the process ID that was found using `dotnet-trace list-processes`. + +Let [dotnet-trace](dotnet-trace.md) run for about 20-30 seconds and then hit enter to exit the collection. The result is a `nettrace` file located in the same folder. `nettrace` files are a great way to use existing analysis tools on Windows. + +Open the `nettrace` with `PerfView` as shown below. + +![alt text](https://user-images.githubusercontent.com/15442480/57110777-976c8a80-6cee-11e9-9cf7-407a01a08b1d.jpg) + +### Profile and view with Linux `perf` + +Here, we'll demonstrate the Linux `perf` tool to generate .NET Core app profiles. + +Exit the previous instance of the [sample debug target](sample-debug-target.md). + +Set the `COMPlus_PerfMapEnabled` to cause the .NET Core app to create a `map` file in the `/tmp` directory. This `map` file is used by `perf` to map CPU address to JIT-generated functions by name. + +Run the [sample debug target](sample-debug-target.md) in the same terminal session. + +```bash +export COMPlus_PerfMapEnabled=1 +dotnet run +``` + +Hit the URL (http://localhost:5000/api/diagscenario/highcpu/60000) again and while its running within the 1-minute request, run: + +```bash +sudo perf record -p 2266 -g +``` + +This command will start the perf collection process. Let it run for about 20-30 seconds and then hit CTRL-C to exit the collection process. + +You can use the same perf command to see the output of the trace. + +```bash +sudo perf report -f +``` + +You can also generate a flame-graph by using the following commands: + +```bash +git clone --depth=1 https://github.com/BrendanGregg/FlameGraph +sudo perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flamegraph.svg +``` + +This command will generate a `flamegraph.svg` that you can view in the browser to investigate the performance problem: + +![alt text](https://user-images.githubusercontent.com/15442480/57110767-87ed4180-6cee-11e9-98d9-9f1c908acfd5.jpg) diff --git a/docs/core/diagnostics/diagnostic-scenarios.md b/docs/core/diagnostics/diagnostic-scenarios.md new file mode 100644 index 0000000000000..56647618b0d23 --- /dev/null +++ b/docs/core/diagnostics/diagnostic-scenarios.md @@ -0,0 +1,48 @@ +--- +title: Diagnostics walk-throughs - .NET Core +description: +author: sdmaclea +ms.author: stmaclea +ms.date: 08/27/2019 +ms.topic: overview +#Customer intent: As a .NET Core developer I want to find the best tutorial to help me debug my scenario. +--- +# .NET Core diagnostics walk throughs + +.NET Core has a wide range of diagnostics capabilities. To learn more about production diagnostics in .NET Core 3, we'll run through a set of diagnostics scenarios using the `dotnet` tools. + +> [!NOTE] +> The walk-throughs were all run on Ubuntu 16.04 and using the .NET Core 3 preview 5 bits. +> +> The tools/APIs that are used are based on preview 5 and are subject to change. The tutorial will be updated to account for later previews and final release +> +> Please note that you have to be using at least preview 5 for most of the capabilities to work. + +## Production diagnostics + +Before we jump in, let's take a look at production diagnostic methodology. When an outage occurs, the goal is getting services fully restored quickly. This goal often means restarting the app or node(s). + +To improve service reliability, root cause analysis of failures is key. To get to root cause, we need to collect as much diagnostics data as we can **before** restarting. + +The diagnostics data collected can then be analyzed postmortem to determine root cause and possible fixes. + +## Preparing + +Each of the scenarios: + +- Uses the [Sample debug target](sample-debug-target.md) to trigger the scenario. +- Uses the [.NET Core dotnet diagnostic global command-line tools](index.md#net-core-dotnet-diagnostic-global-tools). + +## The scenarios + +### Debugging a memory leak + +[Debugging a memory leak](app_is_leaking_memory_eventual_crash.md) walks through finding a memory leak. The [dotnet-counters](dotnet-counters.md) tool is used to confirm the leak. Then the [dotnet-dump](dotnet-dump.md) tool is used to diagnose the leak. + +### Debugging a slow running application + +[Debugging high CPU usage](app_running_slow_highcpu.md) walks through investigating high CPU usage. It uses the [dotnet-counters](dotnet-counters.md) tool to confirm the high CPU usage. It then walks through using [Trace for performance analysis utility (`dotnet-trace`)](dotnet-trace.md) or Linux `perf` to collect and view CPU usage profile. + +### Debugging deadlock + +The [debugging deadlock](hung_app.md) tutorial explores using the [dotnet-dump](dotnet-dump.md) tool to investigate threads and locks. diff --git a/docs/core/diagnostics/hung_app.md b/docs/core/diagnostics/hung_app.md new file mode 100644 index 0000000000000..bb44f3bec39cb --- /dev/null +++ b/docs/core/diagnostics/hung_app.md @@ -0,0 +1,262 @@ +--- +title: Debugging deadlock - .NET Core +description: A tutorial walk-through, debugging locking issues in .NET Core. +author: sdmaclea +ms.author: stmaclea +ms.date: 08/27/2019 +--- +# Debugging deadlock + +In this scenario, the endpoint will experience a hang and thread accumulation. We'll show how you can use the existing tools to analyze the problem. + +## Core dump generation + +To investigate hung application, a memory dump allows us to inspect the state of its threads and any possible locks that may have contention issues. + +Lets run the [Sample debug target](sample-debug-target.md). + +```bash +dotnet run +``` + +Then find the process ID using: + +```bash +dotnet-trace list-processes +``` + +Before causing the leak, lets check our managed memory counters: + +```bash +dotnet-counters monitor --refresh-interval 1 -p 4807 +``` + +4807 is the process ID that was found using `dotnet-trace list-processes`. + +Navigate to the following URL: + +http://localhost:5000/api/diagscenario/deadlock + +Let the request run for about 10-15 seconds then create the dump: + +```bash +sudo ./dotnet-dump collect -p 4807 +``` + +### Analyzing the core dump + +To start our investigation, let's open the core dump using dotnet-dump analyze: + +```bash +./dotnet-dump analyze ~/.dotnet/tools/core_20190513_143916 +``` + +Since we're looking at a potential hang, we want an overall feel for the thread activity in the process. We can use the threads command as shown below: + +``` +threads +*0 0x1DBFF (121855) + 1 0x1DC01 (121857) + 2 0x1DC02 (121858) + 3 0x1DC03 (121859) + 4 0x1DC04 (121860) + 5 0x1DC05 (121861) + 6 0x1DC06 (121862) + 7 0x1DC07 (121863) + 8 0x1DC08 (121864) + 9 0x1DC09 (121865) + 10 0x1DC0A (121866) + 11 0x1DC0D (121869) + 12 0x1DC0E (121870) + 13 0x1DC10 (121872) + 14 0x1DC11 (121873) + 15 0x1DC12 (121874) + 16 0x1DC13 (121875) + 17 0x1DC14 (121876) + 18 0x1DC15 (121877) + 19 0x1DC1C (121884) + 20 0x1DC1D (121885) + 21 0x1DC1E (121886) + 22 0x1DC21 (121889) + 23 0x1DC22 (121890) + 24 0x1DC23 (121891) + 25 0x1DC24 (121892) + 26 0x1DC25 (121893) + +... + +... + + 317 0x1DD48 (122184) + 318 0x1DD49 (122185) + 319 0x1DD4A (122186) + 320 0x1DD4B (122187) + 321 0x1DD4C (122188) + ``` + +The output shows all the threads currently running in the process with their associated debugger thread ID and operating system thread ID. + +Based on the output above it looks like we have a little over 300 threads. + +The next step is to get a better understanding of what the threads are currently doing by getting their callstacks. There's a command called `clrstack` that can be used to output callstacks. It can either output a single callstack or all the callstacks. + +Run: + +```bash +clrstack -all +``` + +A representative portion of the output looks like: + +``` + ... + ... + ... + Child SP IP Call Site +00007F2AE37B5680 00007f305abc6360 [GCFrame: 00007f2ae37b5680] +00007F2AE37B5770 00007f305abc6360 [GCFrame: 00007f2ae37b5770] +00007F2AE37B57D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2ae37b57d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2AE37B5920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2AE37B5950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2AE37B5CA0 00007f30593044af [GCFrame: 00007f2ae37b5ca0] +00007F2AE37B5D70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2ae37b5d70] +OS Thread Id: 0x1dc82 + Child SP IP Call Site +00007F2AE2FB4680 00007f305abc6360 [GCFrame: 00007f2ae2fb4680] +00007F2AE2FB4770 00007f305abc6360 [GCFrame: 00007f2ae2fb4770] +00007F2AE2FB47D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2ae2fb47d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2AE2FB4920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2AE2FB4950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2AE2FB4CA0 00007f30593044af [GCFrame: 00007f2ae2fb4ca0] +00007F2AE2FB4D70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2ae2fb4d70] +OS Thread Id: 0x1dc83 + Child SP IP Call Site +00007F2AE27B3680 00007f305abc6360 [GCFrame: 00007f2ae27b3680] +00007F2AE27B3770 00007f305abc6360 [GCFrame: 00007f2ae27b3770] +00007F2AE27B37D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2ae27b37d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2AE27B3920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2AE27B3950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2AE27B3CA0 00007f30593044af [GCFrame: 00007f2ae27b3ca0] +00007F2AE27B3D70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2ae27b3d70] +OS Thread Id: 0x1dc84 + Child SP IP Call Site +00007F2AE1FB2680 00007f305abc6360 [GCFrame: 00007f2ae1fb2680] +00007F2AE1FB2770 00007f305abc6360 [GCFrame: 00007f2ae1fb2770] +00007F2AE1FB27D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2ae1fb27d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2AE1FB2920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2AE1FB2950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2AE1FB2CA0 00007f30593044af [GCFrame: 00007f2ae1fb2ca0] +00007F2AE1FB2D70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2ae1fb2d70] +OS Thread Id: 0x1dc85 + Child SP IP Call Site +00007F2AE17B1680 00007f305abc6360 [GCFrame: 00007f2ae17b1680] +00007F2AE17B1770 00007f305abc6360 [GCFrame: 00007f2ae17b1770] +00007F2AE17B17D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2ae17b17d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2AE17B1920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2AE17B1950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2AE17B1CA0 00007f30593044af [GCFrame: 00007f2ae17b1ca0] +00007F2AE17B1D70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2ae17b1d70] +OS Thread Id: 0x1dc86 + Child SP IP Call Site +00007F2AE0FB0680 00007f305abc6360 [GCFrame: 00007f2ae0fb0680] +00007F2AE0FB0770 00007f305abc6360 [GCFrame: 00007f2ae0fb0770] +00007F2AE0FB07D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2ae0fb07d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2AE0FB0920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2AE0FB0950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2AE0FB0CA0 00007f30593044af [GCFrame: 00007f2ae0fb0ca0] +00007F2AE0FB0D70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2ae0fb0d70] +OS Thread Id: 0x1dc87 + Child SP IP Call Site +00007F2AE07AF680 00007f305abc6360 [GCFrame: 00007f2ae07af680] +00007F2AE07AF770 00007f305abc6360 [GCFrame: 00007f2ae07af770] +00007F2AE07AF7D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2ae07af7d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2AE07AF920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2AE07AF950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2AE07AFCA0 00007f30593044af [GCFrame: 00007f2ae07afca0] +00007F2AE07AFD70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2ae07afd70] +OS Thread Id: 0x1dc88 + Child SP IP Call Site +00007F2ADFFAE680 00007f305abc6360 [GCFrame: 00007f2adffae680] +00007F2ADFFAE770 00007f305abc6360 [GCFrame: 00007f2adffae770] +00007F2ADFFAE7D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2adffae7d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2ADFFAE920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2ADFFAE950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2ADFFAECA0 00007f30593044af [GCFrame: 00007f2adffaeca0] +00007F2ADFFAED70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2adffaed70] +... +... +... +``` + +Eye balling the callstacks for all 300+ threads shows a pattern where a majority of the threads share a common callstack: + +```bash +OS Thread Id: 0x1dc88 + Child SP IP Call Site +00007F2ADFFAE680 00007f305abc6360 [GCFrame: 00007f2adffae680] +00007F2ADFFAE770 00007f305abc6360 [GCFrame: 00007f2adffae770] +00007F2ADFFAE7D0 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2adffae7d0] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef) +00007F2ADFFAE920 00007F2FE392B31F testwebapi.Controllers.DiagScenarioController.b__3_1() [/home/marioh/webapi/Controllers/diagscenario.cs @ 36] +00007F2ADFFAE950 00007F2FE392B46D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201] +00007F2ADFFAECA0 00007f30593044af [GCFrame: 00007f2adffaeca0] +00007F2ADFFAED70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2adffaed70] +``` + +The callstack seems to show that the request arrived in our deadlock method that in turn makes a call to `Monitor.ReliableEnter`. This method indicates that the threads are trying to enter a monitor lock. They're waiting on the availability of the lock. It's likely locked by a different thread. + +The next step then is to find out which thread is actually holding the monitor lock. Since monitors typically store lock information in the sync block table, we can use the `syncblk` command to get more information: + +```bash +> syncblk +Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner + 41 000000000143D038 603 1 00007F2B542D28C0 1dc1d 20 00007f2e90080fb8 System.Object + 42 000000000143D080 3 1 00007F2B400016A0 1dc1e 21 00007f2e90080fd0 System.Object +----------------------------- +Total 264 +Free 0 + +``` + +The two interesting columns are the `MonitorHeld` and the `Owning Thread Info` columns. The `MonitorHeld` shows whether a monitor lock is acquired by a thread and the number of waiting threads. The `Owning Thread Info` shows which thread currently owns the monitor lock. The thread info has three different subcolumns. The second subcolumn shows operating system thread ID. + +At this point, we know two different threads (0x1dc1d and 0x1dc1e) hold a monitor lock. The next step is to take a look at what those threads are doing. We need to check if they're stuck indefinitely holding the lock. Let's use the `setthread` and `clrstack` commands to switch to each of the threads and display the callstacks: + +```bash +> setthread 0x1dc1d +> clrstack +OS Thread Id: 0x1dc1d (20) + Child SP IP Call Site +00007F2B862B9610 00007f305abc6360 [GCFrame: 00007f2b862b9610] +00007F2B862B9700 00007f305abc6360 [GCFrame: 00007f2b862b9700] +00007F2B862B9760 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2b862b9760] System.Threading.Monitor.Enter(System.Object) +00007F2B862B98B0 00007F2FE392A9E5 testwebapi.Controllers.DiagScenarioController.DeadlockFunc() [/home/marioh/webapi/Controllers/diagscenario.cs @ 57] +00007F2B862B98F0 00007F2FE392A8FC testwebapi.Controllers.DiagScenarioController.b__3_0() [/home/marioh/webapi/Controllers/diagscenario.cs @ 27] +00007F2B862B9910 00007F2FE02B7BA2 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/__w/3/s/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 51] +00007F2B862B9930 00007F2FE02C1021 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 172] +00007F2B862B9980 00007F2FE02B7CBE System.Threading.ThreadHelper.ThreadStart() [/__w/3/s/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 101] +00007F2B862B9CA0 00007f30593044af [GCFrame: 00007f2b862b9ca0] +00007F2B862B9D70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2b862b9d70] +``` + +Lets look at the first thread. The deadlock function is waiting to acquire a lock, but it already owns the lock. It's in deadlock waiting for the lock it already holds. + +``` +> setthread 0x1dc1e +> clrstack +OS Thread Id: 0x1dc1e (21) + Child SP IP Call Site +00007F2B85AB8640 00007f305abc6360 [GCFrame: 00007f2b85ab8640] +00007F2B85AB8730 00007f305abc6360 [GCFrame: 00007f2b85ab8730] +00007F2B85AB8790 00007f305abc6360 [HelperMethodFrame_1OBJ: 00007f2b85ab8790] System.Threading.Monitor.Enter(System.Object) +00007F2B85AB88E0 00007F2FE392AAAE testwebapi.Controllers.DiagScenarioController.b__4_0() [/home/marioh/webapi/Controllers/diagscenario.cs @ 53] +00007F2B85AB8910 00007F2FE02B7BA2 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/__w/3/s/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 51] +00007F2B85AB8930 00007F2FE02C1021 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/__w/3/s/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 172] +00007F2B85AB8980 00007F2FE02B7CBE System.Threading.ThreadHelper.ThreadStart() [/__w/3/s/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 101] +00007F2B85AB8CA0 00007f30593044af [GCFrame: 00007f2b85ab8ca0] +00007F2B85AB8D70 00007f30593044af [DebuggerU2MCatchHandlerFrame: 00007f2b85ab8d70] + +``` + +The second thread is similar. It's also trying to acquire a lock that it already owns. + +The remaining 300+ threads that are all waiting are most likely also waiting on one of the locks that caused the deadlock. diff --git a/docs/core/diagnostics/index.md b/docs/core/diagnostics/index.md index 255bbcd0a076d..774385bbbb05f 100644 --- a/docs/core/diagnostics/index.md +++ b/docs/core/diagnostics/index.md @@ -38,3 +38,7 @@ The [dotnet-dump](dotnet-dump.md) tool is a way to collect and analyze Windows a ### dotnet-trace .NET Core includes what is called the `EventPipe` through which diagnostics data is exposed. The [dotnet-trace](dotnet-trace.md) tool allows you to consume interesting profiling data from your app that can help in scenarios where you need to root cause apps running slow. + +## .NET Core diagnostics walk throughs + +The [.NET Core diagnostics walk throughs](diagnostic-scenarios.md) are designed to highlight key use cases. diff --git a/docs/core/diagnostics/sample-debug-target.md b/docs/core/diagnostics/sample-debug-target.md new file mode 100644 index 0000000000000..a3d6b17ff8ff8 --- /dev/null +++ b/docs/core/diagnostics/sample-debug-target.md @@ -0,0 +1,56 @@ +--- +title: Sample debug target +description: Introduction to the sample debug target used in the scenarios. +author: sdmaclea +ms.author: stmaclea +ms.date: 08/27/2019 +--- +# Sample debug target + +The scenarios are implemented using a `webapi` target with methods that trigger undesirable behaviors for us to diagnose. + +## Sample source + +The sample code is in [DiagnosticScenarios](https://github.com/dotnet/samples/blob/master/core/diagnostics/DiagnosticScenarios). The sample is a basic `webapi` with a single added file [DiagnosticScenarios.cs](https://github.com/dotnet/samples/blob/master/core/diagnostics/DiagnosticScenarios/Controllers/DiagnosticScenarios.cs). + +The simplest way to acquire to source is to clone the [samples repo](https://github.com/dotnet/sample) + +```bash +git clone git@github.com:dotnet/samples.git +``` + +## Building and running the target + +After cloning the source, you can easily create the webapi using: + +```bash +cd core/diagnostics/DiagnosticScenarios +dotnet build +dotnet run +``` + +## Target methods + +### Deadlock + +http://localhost:5000/api/diagscenario/deadlock + +This method will cause the target to hang and accumulate many threads. + +### High CPU usage + +http://localhost:5000/api/diagscenario/highcpu/{milliseconds} + +The method will cause to target to heavily use the CPU for a duration specified by {milliseconds}. + +### Memory leak + +http://localhost:5000/api/diagscenario/memleak/{kb} + +This method will cause the target to leak memory (amount specified by {kb}). + +### Memory usage spike + +http://localhost:5000/api/diagscenario/memspike/{seconds} + +This method will cause intermittent memory spikes over the specified number of seconds. Memory will go from base line to spike and back to baseline several times. diff --git a/docs/core/toc.yml b/docs/core/toc.yml index 02cce242e49df..56538ffb6f2ca 100644 --- a/docs/core/toc.yml +++ b/docs/core/toc.yml @@ -130,6 +130,18 @@ href: diagnostics/dotnet-dump.md - name: dotnet-trace href: diagnostics/dotnet-trace.md + - name: .NET Core diagnostics walk throughs + items: + - name: Overview + href: core/diagnostics/diagnostic-scenarios.md + - name: DiagnosticScenarios sample debug target + href: core/diagnostics/sample-debug-target.md + - name: Debugging a memory leak + href: core/diagnostics/app_is_leaking_memory_eventual_crash.md + - name: Debugging high CPU usage + href: core/diagnostics/app_running_slow_highcpu.md + - name: Debugging deadlock + href: core/diagnostics/hung_app.md - name: Unit Testing href: testing/index.md items: