-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiler use after release on process exit #11885
Comments
Thanks for the report @iskiselev! You mentioned analyzing a coredump, do you have a stack trace where the failure occurred? I think we could set something up given the repro instructions you provided but we'll need to make a little time to look into it. |
Hi again @iskiselev, I set up a test that seemed similar to what you described but unfortunately I haven't been able to repro the problem yet. If you have any more details to share that would be helpful. I can believe that enabling the attach/detach logic you saw there would remedy the AV, but doing it that way suggests we might be leaving other symptoms of the problem behind. For example the profiler might now miss events at the end of the process that it should have been receiving. I don't mind using the attach/detach mechanism as an extra line of defence and leaving the rest unresolved if we have to, but given that you had a repro in hand it would great to understand the root cause if possible. Thanks! |
I've prepared test sample here: https://github.com/iskiselev/clr_profiler_segfault The key thing to increase probability of crash was adding console output to profiler destructor: https://github.com/iskiselev/clr_profiler_segfault/blob/38ec09bf84d2554bd6757ae73ed3d7bcd0895c9c/profiler/CorProfiler.cpp#L16 |
Hi @noahfalk, if you still need it - I can provide crash dump additionally, but hope you'll be able to reproduce it yourself. |
Thanks @iskiselev, its been a long time since anyone gave me such an easy to use repro! I really appreciate it. After some false starts I did manage to observe the crash, capture a core dump, and get it under a debugger. A few things have to line up timing-wise to get the failure, but when they do we get one thread doing shutdown:
And another managed thread finishing a JIT compilation and issuing a callback:
SOS DumpMD shows we were compiling token 6000057 in Microsoft.TestPlatform.CommunicationUtilities.dll but failed to resolve it to an exact method name. We can work it out later if it matters, but its probably just luck of the draw which method gets hit in the race window. I think we can put together a fix based on this and reviewing the code it appears the runtime is shutting down as expected overall, just the profiler was left exposed to the unexpected callbacks. Should I assume you are interested in having this fix ported to servicing branches for already released runtime versions? Unfortunately I don't see much that could be done to workaround this prior to getting a runtime fix. Even if you leaked your implementation of ICorProfilerCallback, the runtime is going to delete its EEToProfilerInterface wrapper object and NULL out the pointer to it so an AV will still occur if a callback happens within the race window. You could mitigate the issue somewhat by minimizing the time in that race window which starts here: Thanks for reporting this and let me know about the servicing interest and any other questions. I'll look into a fix and let you know when I've got more info. |
Thank you for information, @noahfalk. Based on your findings I suspect that windows .Net Core and .Net Framework are also affected by the issue. Is it correct? (I've not done tests to confirm it). We are very interested this issue to be fixed both in .Net Core and .Net Framework future releases. |
Fixes issue #22176. Use the profiler evacuation counters to ensure that we don't callback into the profiler when it has already been released. Previously we only did this as part of the attach/detach feature, but this is required for correctness during standard shutdown given that managed threads are still running concurrently. Thanks to @iskiselev for a very nice repro as well some accurate analysis of the runtime that made finding and fixing this much easier!
The opposite I think. .Net Framework and .Net Core on Windows both appear unaffected by this issue because enabling FEATURE_PROFAPI_ATTACH_DETACH caused the evacuations counters to be used. Doing those checks prevents the profiler from being released if a callback is in progress, as well as preventing future callbacks after the profiler has been released.
I've just opened PR dotnet/coreclr#22712 that should resolve the issue for .Net Core. I can't repro the failure any longer but of course please let me know if you see any sign that the issue wasn't fixed. |
@noahfalk, thank you for confirmation that, as I originally supposed, windows builds should not be affected by issue. I misunderstand your statement that pointed on race condition lines that have not included IsEvacuated check. |
Fixes issue #22176. Use the profiler evacuation counters to ensure that we don't callback into the profiler when it has already been released. Previously we only did this as part of the attach/detach feature, but this is required for correctness during standard shutdown given that managed threads are still running concurrently.
@davmason - The fix for this was merged right? Good to close? |
Yes, thanks |
Hi! I faced with the same problem on .NET Core 2.2.7. We run CI tests and periodically they fail with exit code 139. SIGSEGV happens in
disasm:
@noahfalk @davmason Is it still possible? Support for .NET Core 2.2 ends in December. I'm very interested in stable tests =)) |
@davmason Friendly ping. |
CoreCLR version: 2.1, 2.2 (probably all of them)
OS: Ubuntu/CentOs/MacOs (probably any non-Windows)
On application exit CLR sometimes crashes with SEGFAULT if profiler has subscribed to COR_PRF_MONITOR_JIT_COMPILATION.
Based on coredump analysis crash happened in
EEToProfInterfaceImpl::JITCompilationStarted(unsigned long, int) ()
.Looks like it is possible to crash on other profiler callback too - but was not able to stable reproduce with other, but colleague reported that it was observed once on
ModuleUnloadFinished
.We've started investigation on it, as have seen exit code 1 sometimes from
dotnet test
run with XUnit test, after test reported that all test passed successfully on Linux CI environment. We've never seen it on Windows CI environment.After further investigation, looks like it is enough that profiler will do nothing except:
SetEventMask(COR_PRF_MONITOR_JIT_COMPILATION | COR_PRF_DISABLE_ALL_NGEN_IMAGES | COR_PRF_DISABLE_OPTIMIZATIONS | COR_PRF_DISABLE_INLINING)
insideInitialize
AddRef
/Release
with deleting itself if ref count = 0.The issue can be stable reproduced with XUnit tests (one test that will do nothing is enough).
After first execution of dotnet test, I've repeated execution of
dotnet exec /usr/share/dotnet/sdk/<version>/vstest.console.dll --framework:.NETCoreApp,Version=v2.0 --logger:trx --Diag:TestResults/testDiagnostics.txt <path_to_test>/Release/netcoreapp2.0/<testname>.dll
It fails after ~20 executions with segfault and exit code 139.
I can provide sample application/profiler, but it will require some additional work on my side.
When I've looked on coreclr sources, I've found very suspicious place here: https://github.com/dotnet/coreclr/blob/a28b25aacdcd2adb0fdfa70bd869f53ba6565976/src/vm/profilinghelper.cpp#L1304
IsProfilerEvacuated
check is done only whenFEATURE_PROFAPI_ATTACH_DETACH
enabled, which may be a hint why have we seen problem only on non-Windows OS.CC: @noahfalk
The text was updated successfully, but these errors were encountered: