-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix video freezing with sync required when transport is connected #787
Conversation
Documentation recommends creation of consumer with |
I completely agree with @nazar-pc here. It's documented. The change you have done in this PR probably break video SingleConsumer since indeed the Consumer maybe already created at the time the transport is connected and hence a keyframe is required. Waiting for Gustavo'a response but this is not an issue if you follow those recommendations in the docs. |
@nazar-pc @ibc Thx for the quick response. We totally follow that recommendation, there is no other way to make it reliable with WebRTC. Consumers are always created paused and then resumed from our application to avoid receiving video from unsignaled ssrcs in the browser side. But the problem is that in some cases consumers are resumed before the transport is connected. Connecting a transport takes 4 roundtrips (1 ICE, 2 DTLS, 1 signaling) and resuming a transport takes 1 roundtrip so it is easy to have it resumed sooner than the transport is connected, right? In our testing scenario, where we were able to reproduce the issue reported with customers (simulating a rtt of ~300ms), the consumer is resumed in 0.5 secs and the consumer's transport is connected after 1 secs more or less. Does it makes sense? Why does a call to |
Here I assume you meant "and resuming a consumer takes 1 roundtrip".
Here I assume you mean
NOTE: Perhaps there is an issue here (spoiler: I don't see it, keep reading) since we may be calling void SimulcastConsumer::UserOnTransportConnected()
{
MS_TRACE();
this->syncRequired = true;
this->keyFrameForTsOffsetRequested = false;
if (IsActive())
MayChangeLayers();
} Here we are setting When the case Channel::ChannelRequest::MethodId::CONSUMER_RESUME:
{
if (!this->paused)
{
request->Accept();
return;
}
this->paused = false;
MS_DEBUG_DEV("Consumer resumed [consumerId:%s]", this->id.c_str());
if (IsActive())
UserOnResumed();
request->Accept();
break;
} Note that bool IsActive() const override
{
// clang-format off
return (
RTC::Consumer::IsActive() &&
std::any_of(
this->producerRtpStreams.begin(),
this->producerRtpStreams.end(),
[](const RTC::RtpStream* rtpStream)
{
return (rtpStream != nullptr && (rtpStream->GetScore() > 0u || rtpStream->HasDtx()));
}
)
);
// clang-format on
} and it also calls parent's virtual bool IsActive() const
{
// The parent Consumer just checks whether Consumer and Producer are
// not paused and the transport connected.
// clang-format off
return (
this->transportConnected &&
!this->paused &&
!this->producerPaused &&
!this->producerClosed
);
// clang-format on
} This is: a So here two things can happen:
In the first case this is obviously fine since this is the perfect time for requesting a keyframe. In the second case, we are not requesting since the transport is connected, and once the transport is connected, mediasoup will call |
Thx for the detailed review.
Yes, that's what is happening in this case and why I suggest that an alternative solution (or workaround) could be to avoid the second
The issue we have is only in this case, we can ignore the rest scenarios.
Can you clarify where exactly |
To be clear, what I meant is that I still don't see how this can be a problem.
The thing is that both scenarios (when a consumer is resumed and when its transport is connected) trigger exactly the same code: void SimulcastConsumer::UserOnResumed()
{
MS_TRACE();
this->syncRequired = true;
this->keyFrameForTsOffsetRequested = false;
if (IsActive())
MayChangeLayers();
} void SimulcastConsumer::UserOnTransportConnected()
{
MS_TRACE();
this->syncRequired = true;
this->keyFrameForTsOffsetRequested = false;
if (IsActive())
MayChangeLayers();
} In your case, the consumer is created and resumed before the transport is connected, so void SimulcastConsumer::MayChangeLayers(bool force)
{
MS_TRACE();
int16_t newTargetSpatialLayer;
int16_t newTargetTemporalLayer;
if (RecalculateTargetLayers(newTargetSpatialLayer, newTargetTemporalLayer))
{
// If bitrate externally managed, don't bother the transport unless
// the newTargetSpatialLayer has changed (or force is true).
// This is because, if bitrate is externally managed, the target temporal
// layer is managed by the available given bitrate so the transport
// will let us change it when it considers.
if (this->externallyManagedBitrate)
{
if (newTargetSpatialLayer != this->targetSpatialLayer || force)
this->listener->OnConsumerNeedBitrateChange(this);
}
else
{
UpdateTargetLayers(newTargetSpatialLayer, newTargetTemporalLayer);
}
}
} Here So if we follow the flow above, Yes, this flow won't trigger any PLI. It never does it here. This doesn't happen here. That happens in |
All the above is correct and I agree with it. The problem is what happens with the next call received to The first call happens when the DTLS connection is established ( That second call to |
I'm reviewing your comment but for now I'm lost here:
Definitely we don't call |
Wait, you are right. In |
So the thing is that we call this method twice due to ICE and DTLS events (in void Consumer::TransportConnected()
{
MS_TRACE();
this->transportConnected = true;
MS_DEBUG_DEV("Transport connected [consumerId:%s]", this->id.c_str());
UserOnTransportConnected();
} And indeed // If we need to sync and this is not a key frame, ignore the packet.
if (this->syncRequired && !packet->IsKeyFrame())
return; Ok, @ggarber please try this. Unless I miss something this should be the proper way to go: diff --git a/worker/src/RTC/Consumer.cpp b/worker/src/RTC/Consumer.cpp
index 52b3f2257..c12299cd1 100644
--- a/worker/src/RTC/Consumer.cpp
+++ b/worker/src/RTC/Consumer.cpp
@@ -373,6 +373,9 @@ namespace RTC
{
MS_TRACE();
+ if (this->transportConnected)
+ return;
+
this->transportConnected = true;
MS_DEBUG_DEV("Transport connected [consumerId:%s]", this->id.c_str());
@@ -384,6 +387,9 @@ namespace RTC
{
MS_TRACE();
+ if (!this->transportConnected)
+ return;
+
this->transportConnected = false;
MS_DEBUG_DEV("Transport disconnected [consumerId:%s]", this->id.c_str()); |
…nnected() Rationale given here: #787 (comment) We always check flags and return fast if the flag was already set (or unset, it depends), then we set or unset the flag and run code. Here we failed and didn't honor this pattern.
Just in case here a PR with the above diff: #788 |
I tested the previous patch from @ibc with large RTT conditions, it works for me. |
Thanks, I'll merge and release. |
Closing in favour of PR #788. |
…nnected() (#788) Rationale given here: #787 (comment) We always check flags and return fast if the flag was already set (or unset, it depends), then we set or unset the flag and run code. Here we failed and didn't honor this pattern.
Published in 3.9.9. Thanks guys! |
Thx @ibc . Not calling transport connected twice was one of the alternatives proposed so I'm good with it. But for your consideration changing the "stream state" (syncRequired = true) based on transport state maybe is more complicated than needed and those staes can be independent. Thx |
Actually we should not set a flag that was already set, meaning that we should not run code twice. We just missed to check flag value before doing things as we do everywhere. |
There are cases where the consumers video is frozen for some seconds (2-4s) after starting to consume and having had proper video for 1s already. This happens specially when users round trip time is not low.
After some debugging it looks like the root cause is that
Consumer::UserOnTransportConnected
is being called multiple times (I think once when the DTLS connection is established and another one when the client sends the 'connect' message) and every time mediasoup resets the video stream sync (syncRequired = true
).The first time that happens it is not a big deal because during the initial setup mediasoup will switch spatial layers from -1 to 1 and request a new keyFrame so video will be unfrozen soon. The problem is the second that that happens (when the client sends the 'connect' message) because in that case nobody is requesting a keyframe so the video will be stuck until the client timeout for no video is triggered (2-3s) and it requests a new PLI.
There could be different solutions for this:
This PR implements the last one of those approaches.