Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Player freezes video frame and then jumps back during discontinuity after ~1 sec #1247

Closed
dzianis-dashkevich opened this issue Feb 3, 2022 · 12 comments

Comments

@dzianis-dashkevich
Copy link
Contributor

Description

Hi!

I am working with STB and we are using Video.Js for HLS live playback. I noticed interesting behavior during discontinuity:

The player freezes at the last video frame of the last segment of the previous media sequence. Audio frames from the next media sequence overlap this frozen frame. Video frames from the next media sequence do not start.

After ~1 second: the player seeks back, buffering occurs, and the next media sequence plays fine: Audio frames start over in sync with video frames from the next media sequence.

Additional Information

Please include any additional information necessary here. Including the following:

I noticed that this issue occurs only on STB's browser(which is Chromium v72) and does not occur on my laptop's browser (which is the latest Chrome).

So I started my investigation:

I collected Video.Js debug logs from STB and my laptop and compared them:

The only difference I found is that on STB it triggers vhs-unknown-waiting event at some point. I checked PlaybackWatcher source code and found out that it recursively checks the current time each 250ms. In case if current time stays the same - consecutiveUpdates variable is increased. When consecutiveUpdates reaches its limit (for some reason it is 5 from the source code). The Player tries to recover and seeks to the last reported current time.

Well, I decided to check behavior with an increased consecutiveUpdates limit to bypass this seek-back:

Result: after 2 seconds of freezing, the player continues playback of the next program.

At this point, I decided to check other players and their implementations.
Here is a small comparison table based on my brief source code dive:

Player Transmuxer Keeps Original DTS/PTS Does TimestampOffset in use for alignment?
Video.Js Uses mux.js library with the following options: keepOriginalTimestamps: true and remux: true. The Transmuxer produces both audio and video mp4. Does not override source timestamps. Yes, Video.Js uses the source buffer's TimestampOffset in order to align the source timeline with its own time.
Shaka Uses mux.js library with the following options: keepOriginalTimestamps: true and remux: false. The Transmuxer produces only one mp4 with combined audio and video. Does not override source timestamps. No, Shaka uses the same timeline values as in the source. Thus no alignment is needed.
Hls.Js Uses its own transmuxer implementation. The Transmuxer produces both audio and video mp4. Overrides source timestamps. No, Hls.Js overrides source timestamps with its own timeline values (based on buffered.end).

Shaka and Hls.Js work fine, so it looks like something is wrong with a timestamp offset approach.

To prove my theory, I implemented my custom player based on mux.js library with Hls.js approach - override timestamps during transmuxing with use of baseMediaDecodeTime option and buffered.end.

Result: it works fine, without any freezes.

But why isn't it working with timestamp offset?
To answer this question, I decided to take a look into frames:

ffprobe -v quiet -print_format json -show_format -show_streams -show_frames -show_packets -count_frames -count_packets "$session_mp4_video_segment_input" > "$ffprobe_video_mp4_output"

ffprobe -v quiet -print_format json -show_format -show_streams -show_frames -show_packets -count_frames -count_packets "$session_mp4_audio_segment_input" > "$ffprobe_audio_mp4_output"

ffprobe -v quiet -print_format json -show_format -show_streams -show_frames -show_packets -count_frames -count_packets "$session_ts_segment_input" > "$ffprobe_ts_output"

Based on the reports from ffprobe
I noticed, that buffered ranges reported from Chromium on STB are based on DTS,
but buffered ranges reported from Chrome on my laptop are based on PTS.

While it was possible to replace timingInfo.start.pts to timingInfo.start.dts in transmuxer-worker.js, i decided to limit changes only to timestampOffset calculation:

I replaced segmentInfo.timingInfo.start(which is frame's PTS value from the transmuxer)

to segmentInfo.segment.videoTimingInfo.transmuxedDecodeStart (which is frame's DTS value from the transmuxer)

in segmentLoader timestamp offset calculation. Finally it works.

But why is it working with DTS-based timestamp Offset and not working with PTS-based timestamp Offset?

To answer this question I decided to draw several segments from the stream:
image

And how different types of frames are encoded and reference each other's macroblocks:
image

I did not dive really deep into chromium source code, and it was not possible to collect chromium debug logs from STB.

But I noticed an interesting flag in the old chromium revision:

https://chromium.googlesource.com/chromium/src/+/refs/tags/72.0.3626.121/media/base/media_switches.cc#293

Based on the information above, I made the following assumption:

It looks like after updating timestamp offset and appending the next frames browser does not hit the very first I-Frame, and because of this - it is not possible to decode further frames in the closed GOP. That is why the browser drops all frames until the next random access point - in our case next I-Frame. Because the stream is 30 fps and because it is based on Apple's recommendation:
1.13. Keyframes (IDRs) SHOULD be present every two seconds.
I saw this 2 seconds of freeze. (the root cause of the vhs-unknown-waiting from the beginning.)

In order to prove my assumption (even without debug logs from the chromium), I decided to re-encode the original stream with frame info overlay on it:

ffmpeg -i $input -vf "drawtext=fontfile=Arial.ttf:text='Frame\: %{frame_num} Type\: %{pict_type} PTS\: %{pts}':start_number=0:x=(w-text_w)/2:y=(h-text_h)/2:fontcolor=white:fontsize=21:box=1:[email protected]:boxborderw=6" -c:v libx264 -c:a copy -copyts -muxdelay 0.01568455555555555555 -map_metadata 0 -force_key_frames:v source -force_key_frames:a source -r 30000/1001 $output

I tested this stream both using PTS-based offset and DTS-based offset:

with PTS-based offset -> I see that after 2 seconds of freeze it starts the next program with frame #60 which is the first I-Frame of the 2nd closed GOP.
image

with DTS-based offset -> I see no freezes and it starts the next program with frame #0 which is the first I-Frame of the 1st closed GOP.
image

Went even further and re-encoded the stream with more frequent GOP size, let's say 12 frames:

we can achieve this by adding libx264 options: -x264opts "keyint=12:min-keyint=12:no-scenecut"

with PTS-based offset -> I see that the next program starts with frame #12 which is the first I-Frame of the 2nd closed GOP.

Is there any option to add a config flag to choose between PTS-based and DTS-based calculations?

Sources

I can provide you with two links - original source and with debug filtering by e-mail. Streams are not encrypted.

Steps to reproduce

  1. Run any stream with a discontinuity in any DTS-based buffered browsers (eg: chromium v72).
  2. Wait for a discontinuity.

Results

Video frame freeze during discontinuity. Audio from the next segment is overlapping this frame. Then the player jumps back.

Expected

Stream plays fine without freezing, audio overlapping, and jumps.

videojs-http-streaming version

what version of videojs-http-streaming does this occur with?
I was testing videojs-http-streaming 2.9.2 and 2.13.1

videojs version

what version of videojs does this occur with?
I was testing with videojs v7.14.3 (VHS v2.9.2) and v 7.18.0 (VHS v2.13.1)

Browsers

what browsers are affected? please include browser and version for each

  • Chromium 72.0.3626.121

Platforms

what platforms are affected? please include operating system and version or device and version for each

  • Linux

Other Plugins

are any other videojs plugins being used on the page? If so, please list them with version below.

  • No other plugins.

Other JavaScript

are you using any other javascript libraries or frameworks on the page? if so please list them below.

  • Issue occurs even without any other js libraries
@welcome
Copy link

welcome bot commented Feb 3, 2022

👋 Thanks for opening your first issue here! 👋

If you're reporting a 🐞 bug, please make sure you include steps to reproduce it. We get a lot of issues on this repo, so please be patient and we will get back to you as soon as we can.
To help make it easier for us to investigate your issue, please follow the contributing guidelines.

@singerxt
Copy link

singerxt commented Feb 3, 2022

I'm experiencing same issue :-)

@gesinger
Copy link
Contributor

gesinger commented Feb 4, 2022

Hey @dzianis-dashkevich , I haven't had time yet to look into this, but I wanted to thank you for doing so much investigation into the issue, and providing so much detail in this ticket. I will try to look further into this early this coming week. Thank you again!

@gesinger
Copy link
Contributor

gesinger commented Feb 7, 2022

First off @dzianis-dashkevich , thank you again for doing so much research into the issue. It's very much appreciated, and makes me very happy to continue the investigation. It also made finding answers much easier.

I believe all of your findings are correct, and the fix just needs a slight update.

Before that, I wanted to provide a bit of history to give context. Originally, I thought the issue was due to Chromium switching to reporting PTS values instead of DTS, but that change happened in version 69, https://developer.chrome.com/blog/media-updates-in-chrome-69/#pts, (thanks @gkatsev for the link), and you're working with version 72. In addition, that change is for values that are shown via MSE, and don't have to do with content removed during decode.

This specific issue is the removal of negative DTS frames. As mentioned, when the DTS is before the PTS, and the timestamp offset is set to the PTS, then some frames may have negative DTS values. This follows the media source specification:

4. If timestampOffset is not 0, then run the following steps:

  1. Add timestampOffset to the presentation timestamp.
  2. Add timestampOffset to the decode timestamp.

3.5.8 Coded Frames Processing

In theory, any MSE implementation should remove frames with a negative timestamp, and in Chrome 72, it does. But, as this use-case is common, the case of negative DTS values is sometimes handled by browsers, and is addressed in later Chrome versions. This is why the issue doesn't happen with your laptop (and why we don't see reports of this issue very often): https://bugs.chromium.org/p/chromium/issues/detail?id=398141

In that bug, negative DTS was allowed only after MseBufferByPts was enabled, as that flag prevented users from seeing strange MSE property values (e.g., a buffered range with a negative start). This flag was enabled in "99% of M73 and M74 Chrome Stable" and by default in M75. 72 is just below those.

All that said, the solution is, as you mentioned, to set the timestamp offset to use the DTS start instead of the PTS start, which should avoid the problem of negative timestamp offsets altogether. However, since transmuxedDecodeStart is a property that only TS segments have (because they're transmuxed), and we need to support MP4 as well, we also need to get the decode start from MP4. Luckily, the MP4's start time should be derived from the mp4 probe for start time, which uses the TFDT box and provides the Base Media Decode Time. So all we should have to do is conditionally use what's available. Something like:

const getStartingDts = ({ videoTimingInfo, audioTimingInfo, timingInfo }) => {
  if (videoTimingInfo && typeof videoTimingInfo.transmuxedDecodeStart === 'number') {
    return videoTimingInfo.transmuxedDecodeStart;
  }

  // handle audio only
  if (audioTimingInfo && typeof audioTimingInfo.transmuxedDecodeStart === 'number') {
    return audioTimingInfo.transmuxedDecodeStart;
  }

  // handle content not transmuxed (e.g., MP4)
  return timingInfo.start;
};

...

segmentInfo.timestampOffset -= getStartingDts({
  videoTimingInfo: segmentInfo.segment.videoTimingInfo,
  audioTimingInfo: segmentInfo.segment.audioTimingInfo,
  timingInfo: segmentInfo.timingInfo
});

In theory this would be good to do for all browsers, in case we encounter any other browsers or browser versions without support for negative DTS. But, as you mentioned, it may be better to at least start behind a flag, because it's possible some browsers may generate a gap if there's a PTS gap.

If you'd like to submit a PR, we'd be happy to review it, but if you'd prefer, we can try to get a PR in for this ourselves when we can.

Thank you again!

@dzianis-dashkevich
Copy link
Contributor Author

Hi @gesinger
Thank you for your feedback!

but that change happened in version 69, https://developer.chrome.com/blog/media-updates-in-chrome-69/#pts, (thanks @gkatsev for the link), and you're working with version 72.

Yes, I saw this article, but It is about chrome, rather than pure chromium.
I was always focused on Issue 718641: Tracking bug for MSE PTS/DTS compliance work
and MSE: Remove legacy buffering-by-DTS Part 2: Buffering logic PR
which is dated May 10, 2019
which is approximately M76 according to the chromium schedule

This follows the media source specification:

Yes, here is the actual implementation of the Coded Frame Processing algorithm in chromium:
https://chromium.googlesource.com/chromium/src/media/+/refs/heads/main/filters/frame_processor.cc#752
I was trying to find something interesting there with no luck.

we need to support MP4 as well, we also need to get the decode start from MP4

Yes, I understand, my initial assumption was to provide a configuration flag to the probeTs which will provide pts/dts values and update segmentInfo.timingInfo here and here. And it seems like the only change needed.
Something like:

 const getStartTime = (timeInfo, isPtsBased) => isPtsBased ? timeInfo.ptsTime : timeInfo.dtsTime;

  probeTs({data, baseStartTime, isPtsBased }) {
    const tsStartTime = (typeof baseStartTime === 'number' && !isNaN(baseStartTime)) ?
      (baseStartTime * ONE_SECOND_IN_TS) :
      void 0;
    const timeInfo = tsInspector.inspect(data, tsStartTime);
    let result = null;

    if (timeInfo) {
      result = {
        // each type's time info comes back as an array of 2 times, start and end
        hasVideo: timeInfo.video && timeInfo.video.length === 2 || false,
        hasAudio: timeInfo.audio && timeInfo.audio.length === 2 || false
      };

      if (result.hasVideo) {
        result.videoStart = getStartTime(timeInfo.video[0]);
      }
      if (result.hasAudio) {
        result.audioStart = getStartTime(timeInfo.audio[0]);
      }
    }

    this.self.postMessage({
      action: 'probeTs',
      result,
      data
    }, [data.buffer]);
  }

As I mentioned before, I was not 100% sure that this will be safe enough. I tested it manually and it was working fine. But I am not aware of the full codebase - that is why I decided to limit my changes only to the scope of the updateSourceBufferTimestampOffset_ method.

If you'd like to submit a PR, we'd be happy to review it, but if you'd prefer, we can try to get a PR in for this ourselves when we can.

Sure, I would be glad to submit PR, we should only decide which approach is better.

@gesinger
Copy link
Contributor

gesinger commented Feb 9, 2022

Thank you @dzianis-dashkevich for all of the additional details and the implementation thought. It's a good question regarding where to put the changes. I think your approach is interesting, as, in theory, we should be providing all relevant timing info from media-segment-request where the probe/transmux is done for handling both MP4 and TS. So MP4 should get its decode time too, and then when the source buffer is updated it can use the same property for both MP4 and TS, But I think the changes required to do it properly may be more involved, and, as you said, might be for another time.

Let's keep it contained for now. Since this change is specifically about what offset is set on the source buffer, I think your minimal change approach would be good, making the change in updateSourceBufferTimestampOffset_. Lets put a conditional there based on a flag that can be passed down through VHS' options. Maybe useDtsForOffset (or if you think of another name, let me know) that defaults to false. We can add a TODO for the future to investigate all browsers and see if they can all handle (i.e., without a gap) always using DTS for offsets, at which point it would be better for us to make it the default.

Thanks again for all of the help!

@misteroneill
Copy link
Member

This is a fantastic example of a proper issue report and the fact that you opened a solid PR for it is very much appreciated. Thank you!

@dzianis-dashkevich
Copy link
Contributor Author

dzianis-dashkevich commented Mar 1, 2022

Hi everyone,

Do you have any updates?

@misteroneill
Copy link
Member

Hi @dzianis-dashkevich, sorry for the delay. We're planning to release your PR this week.

@visbits
Copy link

visbits commented Mar 17, 2022

I just wanted to comment that we've also experienced this and the amount of detail and educational information in this issue is incredible.

@dzianis-dashkevich if you ever need work, holler!

@KVPasupuleti
Copy link
Contributor

Facing the same problem, and thanks for all the info @dzianis-dashkevich. The details are incredible in this issue.

@misteroneill Is the PR merged and released?

@misteroneill
Copy link
Member

Yes, it's merged and released in 2.14.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants