Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown error when running createdump #4

Closed
andtii opened this issue Jun 13, 2019 · 14 comments
Closed

Unknown error when running createdump #4

andtii opened this issue Jun 13, 2019 · 14 comments

Comments

@andtii
Copy link

andtii commented Jun 13, 2019

Hey, first of all thank you for the awesome work you have done here to provide all theese guidelines and examples. I followed your example to create a minidump here
https://github.com/joe-elliott/netcore-kubernetes-profiling/blob/master/coredumps/generating.md

First i got a permission denied error when running the createdump and i saw that the securityContext:
privileged: true
was missing in that sidecar yaml so when i added that it works to run the createdump withou access denied.
But i ran into this unknown error now instead. Do you have any clue what could cause this?
https://github.com/dotnet/coreclr/issues/25152

@joe-elliott
Copy link
Owner

joe-elliott commented Jun 13, 2019

That's interesting. So the documentation suggests the container needs elevated privileges:

https://github.com/dotnet/coreclr/blob/master/Documentation/botr/xplat-minidump-generation.md#configurationpolicy

But I was able to run the coredump example without it. I just assumed it was b/c I was root in the container and I was inspecting processes in the same process namespace. At one point I had allowPrivilegeEscalation: true but removed it b/c I thought it was unnecessary.

Can you post the original permission error you received?

@joe-elliott
Copy link
Owner

joe-elliott commented Jun 13, 2019

Follow up thoughts:

The error code reported in https://github.com/dotnet/coreclr/issues/25152 appears to be CORDBG_E_UNSUPPORTED:
https://github.com/dotnet/coreclr/blob/master/src/inc/corerror.xml#L2162

If you're using this against your own application container can you post your yaml? or Dockerfile that the application was built with?

I'm able to reproduce that error by running the debugging sidecar container against an alpine application image. I'd definitely like to see how the application container was built that you are running the debugging sidecar against.

# /usr/share/dotnet/shared/Microsoft.NETCore.App/2.2.5/createdump 6
Writing minidump with heap to file /tmp/coredump.6
CLRDataCreateInstance(ICLRDataEnumMemoryRegions) FAILED 80131c4e

@andtii
Copy link
Author

andtii commented Jun 13, 2019

@joe-elliott I am running against a alpine container so you have the same behavior as me then.

@joe-elliott
Copy link
Owner

@andtii So there is an alpine version of the container: https://hub.docker.com/r/joeelliott/netcore-debugging-tools/tags

joeelliott/netcore-debugging-tools:v0.0.11-2.2.5-alpine

I've not messed with alpine as much, but I have gotten some of the basic demos working. If you have the time please try and let me know?

@joe-elliott
Copy link
Owner

Allright, so using this podspec:

apiVersion: v1
kind: Pod
metadata:
  name: sample-netcore-app
  labels:
    app: sample-netcore-app
spec:
  shareProcessNamespace: true
  containers:
  - name: sample-netcore-app
    image: joeelliott/sample-netcore-app:v1.0.0-2.2.5-alpine
    imagePullPolicy: IfNotPresent
    env:
    - name: COMPlus_DbgEnableMiniDump
      value: "1"
    - name: COMPlus_DbgMiniDumpName
      value: "/tmp/coredump.%d"
    - name: ASPNETCORE_URLS
      value: http://*:8080
    volumeMounts:
    - mountPath: /tmp
      name: tmp
  - name: profile-sidecar
    image: joeelliott/netcore-debugging-tools:v0.0.11-2.2.5-alpine
    imagePullPolicy: IfNotPresent
    args:
    - sleep
    - "3600"
    volumeMounts:
    - mountPath: /tmp
      name: tmp
  volumes:
  - name: tmp
    emptyDir: {}

I was able to generate a coredump from an alpine sidecar:

~ # ps aux
PID   USER     TIME  COMMAND
    1 root      0:00 /pause
   10 root      0:00 dotnet /app/sample-netcore-app.dll
   17 root      0:00 sleep 3600
   45 root      0:00 ash
  130 root      0:00 ps aux
~ # /usr/share/dotnet/shared/Microsoft.NETCore.App/2.2.5/createdump 10
Writing minidump with heap to file /tmp/coredump.10
Written 42733568 bytes (10433 pages) to core file

I am having issues with lldb, but it's a start.

@andtii
Copy link
Author

andtii commented Jun 14, 2019

Im trying to debug production containers so at the moment i cant just change the base image. The idea of having the sidecar containing everything you need is the optimal solution we need

@joe-elliott
Copy link
Owner

So it appears that taking a coredump from a netcore application is easy (see above), but getting lldb to work is not. Currently lldb in the official repo is failing with:

~ # lldb
Error relocating /usr/bin/../lib/liblldb.so.5.0: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2ERKS4_mRKS3_: symbol not found

I'll try to find some time to fool around with it this weekend. Perhaps building from source would work.

@joe-elliott
Copy link
Owner

I've been able to get lldb partially working by using an alpine 3.9 sidecar. If I build the sidecar with this base image:

FROM mcr.microsoft.com/dotnet/core/aspnet:2.2.5-alpine3.9

Install lldb

apk add lldb

Then I can generate dumps with createdump or forcing an unexpected exception. And some analysis works, but sadly DumpHeap does not.

(lldb) clrthreads
ThreadCount:      9
UnstartedThread:  0
BackgroundThread: 8
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                                                        Lock  
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
   1    1   7c 000055C5C2F61C80  2020020 Preemptive  00007F1410D9C878:00007F1410D9DFD0 000055C5C2F44360 0     Ukn 
   9    2   88 000055C5C2F7BBA0    21220 Preemptive  0000000000000000:0000000000000000 000055C5C2F44360 0     Ukn (Finalizer) 
  10    3   89 000055C5C30EA460  1020220 Preemptive  0000000000000000:0000000000000000 000055C5C2F44360 0     Ukn (Threadpool Worker) 
  11    4   8a 000055C5C315A000    21220 Preemptive  00007F1410BF91D8:00007F1410BF9FD0 000055C5C2F44360 0     Ukn 
  12    5   8b 000055C5C31634C0  1021220 Preemptive  00007F1310F42E70:00007F1310F43FD0 000055C5C2F44360 0     Ukn (Threadpool Worker) 
  13    6   8d 000055C5C3166400  1021220 Preemptive  00007F1410C0FE88:00007F1410C0FFD0 000055C5C2F44360 0     Ukn (Threadpool Worker) 
  14    7   8e 000055C5C32148E0  2021220 Preemptive  00007F1310F3CD40:00007F1310F3DFD0 000055C5C2F44360 0     Ukn 
  16    8   90 000055C5C3308900  1021220 Preemptive  00007F1310F41560:00007F1310F41FD0 000055C5C2F44360 0     Ukn (Threadpool Worker) 
  17    9   91 000055C5C3307880    21220 Preemptive  00007F1410D9E0D0:00007F1410D9FFD0 000055C5C2F44360 0     Ukn 
(lldb) sos ClrStack
OS Thread Id: 0x7c (1)
        Child SP               IP Call Site
00007FFC45D167C0 00007f15ab9533ad [GCFrame: 00007ffc45d167c0] 
00007FFC45D168A0 00007f15ab9533ad [HelperMethodFrame_1OBJ: 00007ffc45d168a0] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
00007FFC45D169D0 00007F15313DD4A2 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
00007FFC45D16A60 00007F15313A89E9 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
00007FFC45D16AC0 00007F15313A8879 System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken)
00007FFC45D16B20 00007F15313C96B6 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
00007FFC45D16B40 00007F1531A75527 /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.2.5/Microsoft.AspNetCore.Hosting.dll!Unknown
00007FFC45D16B60 00007F15318C1B2E sample_netcore_app.Program.Main(System.String[])
00007FFC45D16E48 00007f15aaa69fcf [GCFrame: 00007ffc45d16e48] 
00007FFC45D17310 00007f15aaa69fcf [GCFrame: 00007ffc45d17310] 
(lldb) sos DumpHeap
Error requesting heap segment 00007F1310BF8000
Failed to retrieve segments for gc heap
Unable to build snapshot of the garbage collector state
DumpHeap  failed

@andtii
Copy link
Author

andtii commented Jun 15, 2019 via email

@joe-elliott
Copy link
Owner

I spent some time attempting to pull source and compile lldb 3.9.1 in the alpine container, but I've not had any luck. Some of the documentation I've read indicates libsosplugin.so was meant to be compatible with this version.

I have left a message in https://github.com/dotnet/coreclr/issues/25152 asking for help with lldb in an alpine container. Apologies for not coming to a more satisfying conclusion on this issue.

@joe-elliott
Copy link
Owner

So this is very promising: https://github.com/dotnet/diagnostics/blob/master/documentation/dotnet-dump-instructions.md and ultimately I think it is the solution for collecting and analyzing core dumps on alpine.

However, I'm unable to get it to work. Filed a ticket here: dotnet/diagnostics#341

@joe-elliott
Copy link
Owner

So it looks like alpine createdumps are broken:

dotnet/diagnostics#341

After the above issue gets worked out you should be able to create and analyze a dump on Alpine. Additionally, according to the thread if you choose a full dump you should be able to get it to work:

COMPlus_DbgMiniDumpType=4

I'm going to leave this issue open until I clarify some of these details in Alpine specific guides.

@joe-elliott
Copy link
Owner

Hey @andtii

I was able to get coredumps working in alpine containers. See the additional notes here:

https://github.com/joe-elliott/netcore-kubernetes-profiling/blob/master/coredumps/alpine.md

and the yaml here

https://github.com/joe-elliott/netcore-kubernetes-profiling/blob/master/coredumps/coredumps.alpine.yaml

Unfortunately you have to take a full core dump for this to work. The latest version of the alpine debugging container is all setup to perform this analysis. Good luck!

@andtii
Copy link
Author

andtii commented Jun 24, 2019

@joe-elliott Thnx for finding a working solution, i will try it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants