-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProxyNode failure #195
Comments
The encoder for our live stream is currently in my home. Just found this in my personal email from the 8th:
So that explains the outage. It should be fairly simple to replicate by pulling a cable once we have a potential solution. Ideally, the encoder should be able to recover from an intermittent network outage. After making the initial report, I neglected to restart the service. (Oops.) After checking it just now, it was still stuck. So it definitely does not recover with time only. The service has been restarted for now. |
I'm running an experiment now where I deliberately block uploads with a firewall rule: sudo iptables -A OUTPUT -p tcp --destination-port 443 -j REJECT --reject-with tcp-reset If this adequately reproduces the outage, I can then test a recovery mechanism. |
After the first error in the log, I removed the firewall rule. According to tcpdump, Packager stops trying to write to the local ProxyNode. The ffmpeg throughput rate continues to drop in the logs. So I think this is a faithful enough repro. |
With a prospective fix in place, I blocked connections after 7 minutes of streaming. 12 minutes later, with ffmpeg throughput having fallen from 0.996x to 0.537x, I removed the block from iptables. In that time, the delay of retries and timeouts meant that only 2x 4-second segments failed to write. The number of missing segments was found by looking at cloud storage contents. After restoring connections, the average ffmpeg throughput has been climbing, and new segments are being written to cloud storage faster than realtime. (About 1.133x.) The rate of uploads was found by counting segments in cloud storage over time. So I think the fix is working! |
When uploading a live stream, if a single segment upload fails, we shouldn't give up. This adds an option to ignore HTTP output failures so that a single failed upload does not result in a hung stream. See shaka-project/shaka-streamer#195 for details.
When uploading a live stream, if a single segment upload fails, we shouldn't give up. This adds an option to ignore HTTP output failures so that a single failed upload does not result in a hung stream. See shaka-project/shaka-streamer#195 for details.
Instruct Packager to ignore HTTP output failures, so that the pipeline overall doesn't fail during a live stream. Once the system recovers, segments will be missing, but the stream overall will survive. This requires a new release of Shaka Packager (v3.4.0). Closes shaka-project#195
Our Shaka Player History stream went down last night. The first errors in the log look like this:
This eventually culminates in:
There have been no more errors, but the logged throughput has fallen from 0.979x at 10:15:
to 0.658x at 18:08:
The system does not appear to be recovering gracefully from the initial errors.
The text was updated successfully, but these errors were encountered: