v4.0.0 shaka player cannot display webvtt caption for HLS(either fmp4 or mpegts), DASH is fine #4191

liuyang5832 · 2022-05-03T18:49:47Z

Have you read the FAQ and checked for duplicate open issues?
yes

What link can we use to reproduce this?
https://shaka-player-demo.appspot.com/demo/#audiolang=en-US;textlang=en-US;uilang=en-US;asset=https://storage.googleapis.com/livestream-demo-output/miltonliu-webvtt-shaka-4-0-0-test/manifest.m3u8;panel=CUSTOM%20CONTENT;build=uncompiled

What version of Shaka Player are you using?
v4.0.0-uncompiled

What browser and OS are you using?
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36

What did you do?
simply playback a generated hls manifest with v4.0.0 Shaka player and failed to see the caption display, it used to be good with v3.x.x version, and I tried with v3.3.2 version and it's still good.

link to v3.3.2 version that displays the same content well:
https://v3-3-2-dot-shaka-player-demo.appspot.com/demo/#audiolang=en-US;textlang=en-US;uilang=en-US;asset=https://storage.googleapis.com/livestream-demo-output/miltonliu-webvtt-shaka-4-0-0-test/manifest_ts.m3u8;panel=CUSTOM%20CONTENT;build=uncompiled

What did you expect to happen?
webvtt caption should be displayed for HLS upon selecting

What actually happened?
no caption display

joeyparrish · 2022-05-11T19:02:41Z

This may be related to X-TIMESTAMP-MAP and the use of sequence mode for the audio/video content. In this WebVTT content, I see:

WEBVTT
X-TIMESTAMP-MAP=LOCAL:01:00:00.000,MPEGTS:324000000

02:08:06.923 --> 02:08:07.157
- Not at all?

VTT timestamps at 1 hour map to main content at 324000000 / 90k = 3600.00000 = 1 hour. So there is no relative offset.

However, the media timestamps are ignored in sequence mode. The first audio segment, for example, has an internal timestamp of 7686.952, or 2:08:06.952. This would align with the first subtitle, except that due to sequence mode, the first audio segment appears in the presentation timeline at ~0 instead.

Since we are not extracting timestamps from media, and X-TIMESTAMP-MAP relies on media timestamps, this system is broken.

joeyparrish · 2022-05-11T19:14:49Z

If we could perfectly emulate sequence mode for text, then the first text segment would appear at time 0, without regard for the timestamps in it. However, we don't know when a text segment "starts" from its contents. The segment could cover a 10-second period of time, but only have a cue appear at time 5. Or it could be completely empty. So the distance from the conceptual start of a text segment and the start of the first cue cannot be known from the contents of the text. (Unlike with audio and video segments, where there are no periods of time without samples.) Trying to offset the text timestamps back to 0 to align with audio & video won't work without additional information.

We could go back to extracting timestamps from media for HLS, but avoid the latency hit we took for this in v3. Instead, we could wait until the first segment is fetched anyway. We could still use sequence mode, but extract the timestamp of the very first segment we fetch. The difference between that timestamp and the startTime of that segment's SegmentReference could be used to align text segments.

The biggest problems with this are the complexity of format parsing and timestamp extraction, and support for containerless or packed audio streams, which don't have internal timestamps at all. (Though we could argue that X-TIMESTAMP-MAP only works with video or audio in an MP4 or TS container, and say anyone with a weird WebVTT+audio-only HLS stream just needs to align their subtitles to 0.)

It would be nice if we could get away with forcing the platform to extract timestamps for us. I don't know if this would work, but if we could dynamically set sequence mode on SourceBuffers, then we could always do something like this for the very first segment, without complicated parsers and without high startup latency:

If first segment:
1. Set segment mode
2. Append the first segment
3. Check buffered to see what its timestamp was
4. Clear the buffer
Set sequence mode
Set timestamp offset
Append the segment

joeyparrish · 2022-05-11T19:57:57Z

Looks like the trick to change modes works on desktop Chrome. Now to test it on all of our other supported platforms in the lab.

joeyparrish · 2022-05-11T20:50:50Z

Works on all other platforms, except Tizen 2 & 3, which don't support sequence mode at all, and are already excluded from our new HLS parser.

There are some tests which need updating, but the fix seems good.

Since the transition to sequence mode for HLS in v4.0.0, VTT cue timings were broken. This is mainly because VTT cue timing in HLS is meant to be based on an offset from the media timestamps, and we generally don't know those now that we use sequence mode. To fix it, this change uses MediaSource segment mode for the very first video segment as a way to extract the timestamp, then clears the buffer, switches to sequence mode, and appends it again. This lets us get the timing data we need, while avoiding major drawbacks of the previous HLS implementation: - We don't need to fetch segments upfront (which is high latency) - We don't need to fetch segments twice (once for timestamps, and once again to buffer) - We don't need to maintain parsers (which were complex and limited the formats we could support) Closes shaka-project#4191

Since the transition to sequence mode for HLS in v4.0.0, VTT cue timings were broken. This is mainly because VTT cue timing in HLS is meant to be based on an offset from the media timestamps, and we generally don't know those now that we use sequence mode. To fix it, this change uses MediaSource segment mode for the very first video segment as a way to extract the timestamp, then clears the buffer, switches to sequence mode, and appends it again. This lets us get the timing data we need, while avoiding major drawbacks of the previous HLS implementation: - We don't need to fetch segments upfront (which is high latency) - We don't need to fetch segments twice (once for timestamps, and once again to buffer) - We don't need to maintain parsers (which were complex and limited the formats we could support) Closes #4191

avelad added type: bug Something isn't working correctly component: HLS The issue involves Apple's HLS manifest format priority: P1 Big impact or workaround impractical; resolve before feature release component: WebVTT The issue involves WebVTT subtitles specifically labels May 4, 2022

avelad added this to the v4.1 milestone May 4, 2022

joeyparrish self-assigned this May 11, 2022

joeyparrish mentioned this issue May 11, 2022

fix: Fix VTT cue timing in HLS #4217

Merged

joeyparrish closed this as completed in #4217 May 11, 2022

This was referenced May 11, 2022

chore(main): release 4.1.0 #4187

Closed

chore(main): release 4.1.0 #4235

Merged

github-actions bot mentioned this issue May 17, 2022

chore(v4.0.x): release 4.0.1 #4238

Merged

github-actions bot added the status: archived Archived and locked; will not be updated label Jul 10, 2022

github-actions bot locked as resolved and limited conversation to collaborators Jul 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.0.0 shaka player cannot display webvtt caption for HLS(either fmp4 or mpegts), DASH is fine #4191

v4.0.0 shaka player cannot display webvtt caption for HLS(either fmp4 or mpegts), DASH is fine #4191

liuyang5832 commented May 3, 2022 •

edited

Loading

joeyparrish commented May 11, 2022

joeyparrish commented May 11, 2022

joeyparrish commented May 11, 2022

joeyparrish commented May 11, 2022

v4.0.0 shaka player cannot display webvtt caption for HLS(either fmp4 or mpegts), DASH is fine #4191

v4.0.0 shaka player cannot display webvtt caption for HLS(either fmp4 or mpegts), DASH is fine #4191

Comments

liuyang5832 commented May 3, 2022 • edited Loading

joeyparrish commented May 11, 2022

joeyparrish commented May 11, 2022

joeyparrish commented May 11, 2022

joeyparrish commented May 11, 2022

liuyang5832 commented May 3, 2022 •

edited

Loading