Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msquic crashes when stream closed from GC finalizer #1565

Closed
ManickaP opened this issue May 10, 2021 · 6 comments
Closed

msquic crashes when stream closed from GC finalizer #1565

ManickaP opened this issue May 10, 2021 · 6 comments
Labels
question Further information is requested
Milestone

Comments

@ManickaP
Copy link
Member

In one our tests (I'm not able to determine which), sometime (in my case very rarely), the stream is being closed from GC finalizer thread and it leads to a SIGABRT.

(lldb) bt
* thread #1, name = 'dotnet', stop reason = signal SIGABRT
    frame #0: 0x00007fd939b8bef5 libc.so.6`raise + 325
    frame #1: 0x00007fd939b75912 libc.so.6`abort + 454
    frame #2: 0x00007fd92963137a libmsquic.so`quic_bugcheck at platform_posix.c:59:5
    frame #3: 0x00007fd92961c4fe libmsquic.so`QuicOperationEnqueue(OperQ=0x00007fd7b4002090, Oper=0x00007fd7f80071e0) at operation.c:138:5
  * frame #4: 0x00007fd9295f485f libmsquic.so`QuicConnQueueOper(Connection=0x00007fd7b4001cc0, Oper=<unavailable>) at connection.c:720:9
    frame #5: 0x00007fd9295eb66b libmsquic.so`MsQuicStreamClose(Handle=0x00007fd7f8007590) at api.c:717:17
    frame #6: 0x00007fd8c0292842
    frame #7: 0x00007fd8c065a62a
    frame #8: 0x00007fd8bf939746
    frame #9: 0x00007fd8c064751d
    frame #10: 0x00007fd8c065a4e6
    frame #11: 0x00007fd8c06e7693
    frame #12: 0x00007fd9396e42f2 libcoreclr.so`FastCallFinalizeWorker at calldescrworkeramd64.S:30
    frame #13: 0x00007fd9394b6e88 libcoreclr.so`MethodTable::CallFinalizer(Object*) at methodtable.cpp:3600:5
    frame #14: 0x00007fd9394b6e3c libcoreclr.so`MethodTable::CallFinalizer(obj=0x0000000000000000) at methodtable.cpp:3725
    frame #15: 0x00007fd93957d572 libcoreclr.so`FinalizerThread::FinalizeAllObjects() [inlined] CallFinalizer(obj=0x00007fd8882a8420) at finalizerthread.cpp:75:9
    frame #16: 0x00007fd93957d51c libcoreclr.so`FinalizerThread::FinalizeAllObjects() at finalizerthread.cpp:104
    frame #17: 0x00007fd93957d795 libcoreclr.so`FinalizerThread::FinalizerThreadWorker(args=<unavailable>) at finalizerthread.cpp:325:9
    frame #18: 0x00007fd9394fe40a libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchInner(ManagedThreadCallState*) at threads.cpp:7300:5
    frame #19: 0x00007fd9394fe408 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7344
    frame #20: 0x00007fd9394fe3c2 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchOuter(this=<unavailable>)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const at threads.cpp:7502
    frame #21: 0x00007fd9394fe3c2 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7504
    frame #22: 0x00007fd9394fe353 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=0x00007fd931b59d50) at threads.cpp:7528
    frame #23: 0x00007fd9394fea9d libcoreclr.so`ManagedThreadBase::FinalizerBase(void (*)(void*)) [inlined] ManagedThreadBase_NoADTransition(pTarget=<unavailable>, filterType=FinalizerThread)(void*), UnhandledExceptionLocation) at threads.cpp:7572:5
    frame #24: 0x00007fd9394fea85 libcoreclr.so`ManagedThreadBase::FinalizerBase(pTarget=<unavailable>)(void*)) at threads.cpp:7598
    frame #25: 0x00007fd93957da38 libcoreclr.so`FinalizerThread::FinalizerThreadStart(args=<unavailable>) at finalizerthread.cpp:377:17
    frame #26: 0x00007fd939870539 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0000564627221dc0) at thread.cpp:1862:16
    frame #27: 0x00007fd93a068299 libpthread.so.0`start_thread + 233
    frame #28: 0x00007fd939c4e053 libc.so.6`__clone + 67

I have the core dump which I can provide. I don't have logs though.

This might be an error in our code (.NET Core, System.Net.Quic). If you think so, could you please help me with locating the root cause?

cc: @CarnaViire @scalablecory @wfurt @geoffkizer

@nibanks
Copy link
Member

nibanks commented May 10, 2021

Are you running with a debug build of MsQuic? If not, I'd highly recommend all tests use it, as we have a ton of debug asserts that get turned on that will help you debug this. For instance:

    CXPLAT_TEL_ASSERT(!Stream->Flags.HandleClosed);
    CXPLAT_TEL_ASSERT(!Stream->Flags.Freed);

Those are checked in that code path to see if you're deleting again (best effort). Also, Windows dumps will be much easier to debug because we have debugger extensions we built to help in windbg.

@ThadHouse
Copy link
Contributor

It looks like this is failing to lock a mutex. That mutex is on the connection. Is anything in .NET with the finalizer queue guaranteeing that all streams will be GC'd before the connection for those streams will be? This definitely seems like a connection getting closed before the stream.

@ManickaP
Copy link
Member Author

Are you running with a debug build of MsQuic?

AFAIK I am.

Also, Windows dumps will be much easier to debug

I predominantly run Linux so catching this on Windows is not likely for me personally, but I think @CarnaViire might have encountered the same/similar problem on Win.

Is anything in .NET with the finalizer queue guaranteeing that all streams will be GC'd before the connection for those streams will be? This definitely seems like a connection getting closed before the stream.

Nope, nothing guarantees the order of finalized objects in GC. We dispose in proper order, but if we fail in ctor, the object gets added to the finalization queue over which we don't have any control. This might be the issue here. This is a great point! I'll look into it and let you know what I've found. Thanks for now!

@nibanks nibanks added the question Further information is requested label May 11, 2021
@nibanks nibanks added this to the Future milestone May 11, 2021
@ManickaP
Copy link
Member Author

Looking more into the code, this is 100% our fault. We should catch all exceptions from ctors, properly Dispose the handles in exceptional cases and suppress the finalization.
Thanks @ThadHouse for pointing me in the right direction!

Closing, not a bug in msquic.

@geoffkizer
Copy link

@ManickaP is there a corresponding issue in dotnet/runtime for this?

@wfurt
Copy link
Member

wfurt commented May 12, 2021

Probably dotnet/runtime#52048 @geoffkizer. I'm planning take a look unless somebody else jumps in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants