-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datachannel hangs with packet loss #1270
Comments
I managed to reproduce this with a test case. It appears related to datachannel options used in browser:
Those options along with 20% packet loss help to reproduce. Code attached: pionbug.zip for server and both go and javascript client. Logs for go client and server follow. server
client
|
|
Using pion/webrtc (v2.2.17) on both ends with 20% packetloss (with tylertreat/comcast), during the association handshake, I noticed this trace message on one end (client side):
On the server end, I see:
These logs occurred after COOKIE-ACK was received on the client end and the client began DCEP handshake. The problem is the client sending FWD-TSN (as maxRetransmits is set to 0), before the DCEP handshake completes. The server-side sees unexpected FWD-TSN, it responds with an Error Chunk. |
I fixed the above locally. Now DCEP handshake stably completes despite the large packet loss. Now, I am seeing a stall. When that happens, I see this error from ICE:
@Sean-Der I am using Comcast to cause packet loss, now 20% loss ratio. Do you think 20% loss is too much for ice to work correctly? |
I have found another bug, this time in pion/sctp. During the association handshake, if INIT-ACK was not received due to a packet loss, the node that did not receive INIT-ACK wouldn't have By fixing this in my local environment, I no longer the situation. I will create two PRs, one for pion/datachannel and the other for pion/sctp shortly. |
@tuexen Sorry to bother you but I could not find an official document that describes expected simultaneous-open in SCTP. The state transition chart below was derived from a compilation of what I saw in my search a while ago, but I have not come across the official spec and I wonder if you could refer me to an official document you know of? |
@tuexen never mind. I realized the text I was looking for was actually in RFC 4960 Sec 5.2. (I thought "simultaneous open" was an extension. |
Relates to pion/webrtc#1270
Glad you found it. |
@chrbsg told me that with the two fixes (pion/datachannel#64 and pion/sctp#127) made the situation much better, but he was still seeing the hang. As I was digging more, I found another bug:
The peer (association ptr being 0xc000314000) keeps retransmitting COOKIE-ECHO, then finally T1-cookie timed out. The problem is the other end (association 0xc0003141a0) was receiving the COOKIE-ECHO but it never sends back COOKIE-ACK once it entered into Established state. (Bug) |
This is related to #62 in which we removed the use of immediate ack (I) bit. This test became unstable as we no longer use immediate ack. Relates to pion/webrtc#1270
Can you provide a .pcapng file and the debug output of usrsctp assuming that it is your peer? |
Thanks @tuexen - but this is happening between pion nodes. :) (haven't thoroughly tested with browsers yet but will do) |
This is related to #62 in which we removed the use of immediate ack (I) bit. This test became unstable as we no longer use immediate ack. Relates to pion/webrtc#1270
Sorry, I wasn't realising that this is a bug report against pion, I assumed that it is a bug report against usrsctp. Sorry for the noise... |
This is related to #62 in which we removed the use of immediate ack (I) bit. This test became unstable as we no longer use immediate ack. Relates to pion/webrtc#1270
This is related to #62 in which we removed the use of immediate ack (I) bit. This test became unstable as we no longer use immediate ack. Relates to pion/webrtc#1270
This is related to #62 in which we removed the use of immediate ack (I) bit. This test became unstable as we no longer use immediate ack. Relates to pion/webrtc#1270
@Sean-Der @at-wat With the current pion/sctp and pion/datachannel master heads, data channel transmission over lossy connection is very stable to me and I no longer see any hang even with 20% of (extremely bad) packet loss. So far I have done the following combinations: 1:1 connection
1:2 connections
As to reliability options I have done both of these settings:
With the fixes (pion/sctp#127, pion/datachannel#64, pion/sctp#130) which are all landed on master, data channel got very robust and stable. Please let me tag the latest pion/sctp and pion/datachannel. |
Tagged:
|
That is amazing @enobufs! Sorry I haven't been more involved so much (great!) stuff going on with Pion I am losing track :( |
@enobufs pion/[email protected] still refers pion/[email protected]. |
Oops... I forgot about pion/datachannel still depends on pion/sctp for testing... Thanks @at-wat so much for your help! I guess we could remove datachannel's dependency on pion/sctp by using interface... well, someday! |
@at-wat I updated sctp and datachannel versions to the latest on v2 branch. FYI. |
Now all the fixes on this issue have been landed on pion/webrtc/[email protected]. (many thanks to @at-wat) |
Your environment.
What did you do?
This is a pion program that receives microphone RTP audio from a web browser and sends back packets on a datachannel. (
I tried to reproduce this with a simple example but it does not occur, something is differentcode attached below).I'm trying to simulate a datachannel issue seen on poor wifi. On Linux, simulate data loss with:
(alternatively, same happens with iPad on wifi)
At some point,usually in first 20 seconds, web client stops calling data channel packet receive handler. Pion log shows:
There is no indication that the datachannel is hung - the ICE state is connected, the RTP microphone audio is still being transmitted by the browser and received by pion, and the datachannel state has not changed.
What did you expect?
Datachannel should be stable on poor wifi
What happened?
Datachannel hangs and never recovers
Full log PION_LOG_DEBUG=all
Another log
The text was updated successfully, but these errors were encountered: