Crash occurs in closePeerConnection #995

liutuzhao · 2020-12-08T08:18:34Z

Logging
crash-in-closePeerConnection.zip

Describe the bug
Doing two livestreming session at the same time . When 1 session detected broken in callbak "onConnectionStateChange" and the "terminateFlag" set as true. Another thread will check each session status and then free this broken session, found crash in SDK function "closePeerConnection ". Back trace as follows:

(gdb) where
#0 0x00504fa4 in pthread_mutex_lock ()
#1 0x00319500 in socketConnectionClosed ()
#2 0x0030f92c in connectionListenerRemoveAllConnection ()
#3 0x003108a0 in iceAgentShutdown ()
#4 0x002d163c in closePeerConnection ()
#5 0x0005a1ac in freeSampleStreamingSession ()
#6 0x00048a10 in CWebRTCClientMaster::SessionCleanupCheck(CQVMessageT*, unsigned int, unsigned int&) ()
#7 0x002ab82c in CQVThreadWorker::OnPolling(unsigned int&) ()
#8 0x002ac954 in CQVThreadWorker::OnThread() ()
#9 0x002aba58 in CQVThread::ThreadProc(void*) ()
#10 0x00503904 in start_thread ()
#11 0x0051cd20 in clone ()

SDK version number
V1.4.0

Open source building
default config in SDK

MushMal · 2020-12-08T09:00:30Z

@liutuzhao this is not the stock application and the issue is not actionable without debug symbols and local variables. We will only look at crashes with stock samples.

Please debug this further on your own. Please pull us in if you can pinpoint the actual crash in the SDK or the stock sample applications.

As the stack trace does not correspond to the sample application, I am not sure what's causing the crash.

Removing "bug" tags.

MushMal · 2020-12-09T22:14:43Z

Any updates? Have you been able to reproduce this with stock samples?

liutuzhao · 2020-12-10T01:50:58Z

Any updates? Have you been able to reproduce this with stock samples?

We're trying the Alexa's pull request #996 and your pull request #1001 on our camera.
We dis not find crash at the moment. We will keep testing for several days and if no relate crashe occurs again, we can close this issue.

MushMal · 2020-12-10T01:57:16Z

Sounds good. I a not sure if any of this will fix a crash. Try to get the stock applications running in parallel on your platform to get wider coverage. Try running under the gdb and have the symbols ready to be loaded if a crash happens

liutuzhao · 2020-12-12T04:02:19Z

We encountered another similar crash. @MushMal @codingspirit

(gdb) thread apply 1 bt

Thread 1 (LWP 6105):
#0 0x00505284 in pthread_mutex_lock ()
#1 0x00322f70 in lwsCompleteSync ()
#2 0x00323578 in getIceConfigLws ()
#3 0x002dd128 in getIceConfig ()
#4 0x002de2a0 in executeGetIceConfigSignalingState ()
#5 0x002f3fc0 in stepStateMachine ()
#6 0x002ddfe0 in stepSignalingStateMachine ()
#7 0x00320564 in reconnectHandler ()
#8 0x00503be4 in start_thread ()
#9 0x0051d000 in clone ()
#10 0x0051d000 in clone ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

MushMal · 2020-12-12T04:05:51Z

Do you have the symbols?

liutuzhao · 2020-12-14T01:40:17Z

I am uploading the gdb， coredump and sysmbol files together with the executable file before strip.
the core-xxx is the coredump file. The Sofia1 is execcutable before strip, Sofia is executable after strip, the .symbols is striped symbols file. And the gdb1 is the x86 gdb version with target is our camera platform. The attachement is more than 10M and I compressed it to 4 part. Because github only accept .zip file, I modified the file extention to zip. Please download and modify the file name to bak8-crash-lwsCompleteSync.zip.001, bak8-crash-lwsCompleteSync.zip.002, bak8-crash-lwsCompleteSync.zip.003,bak8-crash-lwsCompleteSync.zip.004, before umcompress them.
bak8-crash-lwsCompleteSync.004.zip

bak8-crash-lwsCompleteSync.003.zip

bak8-crash-lwsCompleteSync.002.zip

bak8-crash-lwsCompleteSync.001.zip

codingspirit · 2020-12-17T06:56:46Z

amazon-kinesis-video-streams-webrtc-sdk-c/src/source/Signaling/LwsApiCalls.c

Line 549 in b9e41d1

MUTEX_LOCK(pCallInfo->pSignalingClient->lwsSerializerLock);

, I couldn't find any scenario that pCallInfo->pSignalingClient->lwsSerializerLock is NULL while pCallInfo->pSignalingClient is not. Hi @MushMal any clue from your side?

MushMal · 2020-12-17T07:29:38Z

I couldn't

None that I can think of. If you are within the LwsApiCalls.c then you should have succeeded creating the entire signaling client object successfully.

Perhaps a stale public header file with the latest codebase that could have shifted the internal structure fields?

Sorry, I haven't had any time to look at the attached log files.

Nomidia · 2020-12-29T03:38:25Z

Similar issue in the same position:

#0 0x003318e4 in lws_callback_on_writable ()
#1 0x00323810 in wakeLwsServiceEventLoop ()
#2 0x00323d18 in lwsCompleteSync ()
#3 0x0032434c in getIceConfigLws ()
#4 0x002dceb4 in getIceConfig ()
#5 0x002de02c in executeGetIceConfigSignalingState ()
#6 0x002f4bd8 in stepStateMachine ()
#7 0x002ddd6c in stepSignalingStateMachine ()
#8 0x002daac4 in refreshIceConfigurationCallback ()
#9 0x002fab04 in timerQueueExecutor ()
#10 0x005049b4 in start_thread ()
#11 0x0051ddd0 in clone ()
#12 0x0051ddd0 in clone ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

hassanctech · 2021-01-13T20:12:49Z

Can you please try the latest commit on master and see if it resolves your issue?
If there is still a crash, please include symbols with the crash stack so we can better help.

suggestedfixes · 2021-01-14T14:12:09Z

@hassanctech Still reproducible on Windows, would be nice if someone on the AWS side to replicate the Windows scenario.

MushMal · 2021-01-14T19:18:45Z

@suggestedfixes this thread is getting stale very quickly. I have requested a dump with symbols + info whether you've made any changes. We do have Windows runs in Travis CI which don't crash. It's hard to for us to try to reproduce something that we have no understanding on.

Please include detailed description of the assets in use. Whether there have been any modifications to the samples that are being run.
Include detailed description of how the crash happes.
Include details on the platform - both hardware and software with their versions
Provide symbols for ALL of the threads in the crash dump

MushMal · 2021-01-19T05:12:05Z

Updates please?

MushMal · 2021-01-20T22:59:58Z

I am resolving this as we have no symbolic info and there is nothing actionable here.

Please use the latest commit which removes the auto-ICE refresh for the crash stack with ICE refresh in signaling. There is very little to work with on the other crash stack related to the connection removal

liutuzhao added the bug Something isn't working label Dec 8, 2020

MushMal added question Further information is requested and removed bug Something isn't working labels Dec 8, 2020

MushMal changed the title ~~[BUG] Crash occurs in closePeerConnection~~ Crash occurs in closePeerConnection Dec 8, 2020

MushMal added the awaiting response label Dec 8, 2020

codingspirit mentioned this issue Dec 8, 2020

socketConnectionClosed: Check mutex before lock/unlock #996

Closed

shiv50084 mentioned this issue Jan 10, 2021

SDK Crash in SCTP when two user trying to connect at same time #1019

Closed

MushMal closed this as completed Jan 20, 2021

MushMal mentioned this issue Jan 21, 2021

Fixing race condition in ConnectionListener and fixing unaligned acce… #1053

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash occurs in closePeerConnection #995

Crash occurs in closePeerConnection #995

liutuzhao commented Dec 8, 2020 •

edited

Loading

MushMal commented Dec 8, 2020

MushMal commented Dec 9, 2020

liutuzhao commented Dec 10, 2020

MushMal commented Dec 10, 2020

liutuzhao commented Dec 12, 2020

MushMal commented Dec 12, 2020

liutuzhao commented Dec 14, 2020

codingspirit commented Dec 17, 2020

MushMal commented Dec 17, 2020

Nomidia commented Dec 29, 2020

hassanctech commented Jan 13, 2021

suggestedfixes commented Jan 14, 2021

MushMal commented Jan 14, 2021

MushMal commented Jan 19, 2021

MushMal commented Jan 20, 2021

Crash occurs in closePeerConnection #995

Crash occurs in closePeerConnection #995

Comments

liutuzhao commented Dec 8, 2020 • edited Loading

MushMal commented Dec 8, 2020

MushMal commented Dec 9, 2020

liutuzhao commented Dec 10, 2020

MushMal commented Dec 10, 2020

liutuzhao commented Dec 12, 2020

MushMal commented Dec 12, 2020

liutuzhao commented Dec 14, 2020

codingspirit commented Dec 17, 2020

MushMal commented Dec 17, 2020

Nomidia commented Dec 29, 2020

hassanctech commented Jan 13, 2021

suggestedfixes commented Jan 14, 2021

MushMal commented Jan 14, 2021

MushMal commented Jan 19, 2021

MushMal commented Jan 20, 2021

liutuzhao commented Dec 8, 2020 •

edited

Loading