Skip to content
This repository has been archived by the owner on Feb 17, 2023. It is now read-only.

Latest commit

 

History

History
156 lines (114 loc) · 8.19 KB

codec-switching-explainer.md

File metadata and controls

156 lines (114 loc) · 8.19 KB

Media Source Extensions: Codec Switching Explainer

Author: Matthew Wolenetz, Google Inc. - March 15, 2018. Last update September 17, 2018.

tl;dr

We propose adding a changeType method on SourceBuffer that allows the type (bytestream and codec(s)) of media bytes subsequently appended to the SourceBuffer to be changed. We plan to incubate this idea via the WICG, with goal of eventually working with WebPlatformWG to get the result of WICG incubation as part of the next version of the Media Source Extensions API (MSE).

Implementation status as of last update

Background

Web authors have consistently requested that MSE afford a mechanism for switching codecs and bytestreams:

In the absence of such mechanisms, web authors are forced to switch among multiple MediaSource instances programmatically to approach work-arounds for the specification gaps. Beyond complexity, a primary concern of those workarounds is that they are limited to the scheduling of javascript execution; there can be significant delays at transition points that impair the user experience.

Design consideration was given to alternative ideas that are not in scope of this proposal:

  • Leveraging in-band or application-provided text track cues to control the timing of track transitions among a dynamic set of SourceBuffers. Though this could possibly apply to any HTMLMediaElement (not just those extended by MSE), the complexity of this approach led us to propose a simpler change scoped to just MSE.
  • Likewise, we rejected pursuing a new HTMLMediaElement (or perhaps TrackList) API that would enable web apps to programmatically declare similar kinds of track-change timings.
  • We could continue to require same bytestream, but allow codec to vary with tracks of the same kind of media (e.g. audio or video) across initialization segments. This would be insufficient for supporting cross-bytestream switching.

Note that this proposal does not preclude separate effort to pursue these other ideas.

Proposed Plan

This proposal is focused on enabling cross-codec and cross-bytestream buffering and playback of media in a single MediaSource instance.

While the REC version of Media Source Extensions API (MSE) supports adaptive playback of media, that adaptation requires that any media appended to a SourceBuffer must conform to the mimetype provided when initially creating the SourceBuffer via MediaSource.addSourceBuffer(type).

With the addition of a proposed changeType method on SourceBuffer, with a type parameter similar to that in the existing addSourceBuffer method on MediaSource, a SourceBuffer could buffer and support playback across different bytestream formats and codecs. This new method would retain previously buffered media modulo future MSE coded frame eviction or removal, and leverage the splicing and buffering logic in the existing MSE coded frame processing algorithm.

If type is not supported by the user agent for the SourceBuffer and MediaSource, changeType would synchronously throw a NotSupportedError exception (modulo the API would be unaware of potentially unsupported content transitions, including those implicitly occurring via codec changes in subsequent initialization segments when the codecs parameter of the SourceBuffer's MIME type is ambiguous, and including transitions involving encrypted content - see also the Media Capabilities API's stream transitioning work referenced in Resolved Questions, below.)

Resolved Questions

Should the initialization segment received algorithm continue to require the same number of audio, video and text tracks - and if more than one of a particular type, that the set of track IDs for that type be the same?
Pros:

For a long time, user agents (such as Chrome) chose the route of allowing a maximum of one audio and one video track across all SourceBuffers in a MediaSource instance. Practically, this met the REC MSE requirement; further, it intended to improve the user experience such that only the expected media would be fetched and incur resource utilization.

Cons:

Retaining this restriction precludes one of the Implementation Use Cases.

Route taken:

For simplicity, and due to the prevalence of MSE apps that use single-track bytestreams and up to two SourceBuffers (one each for audio and video) to manage adaptation of each independently, the initialization segment received algorithm continues to require the same number of audio, video and text tracks - and if more than one of a particular type, that the set of track IDs for that type remain the same. The algorithm is adjusted to allow codecs to change in the initialization segment when the bytestream format does not change, even without explicitly signalling changeType(). In this latter "implicit codec change" situation, there is new non-normative text guiding both the API users and user agent implementors, as some user agents may in short-term continue to disallow implicit codec switching until they relax their codec-strictness for addSourceBuffer() and changeType().

Other than the existing MEDIA_ERR_DECODE and MediaError.message error reporting mechanism, is there a way applications could (ideally proactively, before fetching and buffering) determine whether or not the user agent has the capability of supporting playback across various levels of mixed encrypted and unencrypted content along with bytestream and codec changes?
Initial Thoughts:

This proposal is focused on the MSE API alone. Media Capabilities and Encrypted Media Extensions may need work to support such proactive queries.

Route taken:

Proactive codec and bytestream switching capability detection is being worked on as part of the Media Capabilities API: Transitioning between stream configurations proposal.

Should the proposed changeType method implicitly perform the reset parser state algorithm, or should it instead require the application to ensure the parser is reset (via abort(), if necessary)?
Initial Thoughts:

As there is no way currently for an application to be certain of the SourceBuffer's current append state, changeType should probably run the reset parser state algorithm.

Route taken:

changeType includes running the reset parser state algorithm once preliminary state and support checks have passed.

To what level should we specify "seamless" playback across bytestream, codec (and perhaps encryption) changes?
Initial Thoughts and Route Taken:

This is likely a quality-of-implementation output, rather than a specified input. Decoder reconfiguration, for instance, may not be sufficient in all implementation instances, to support precision across a transition. This is analogous to the same treatment of playback quality across adaptations allowed by REC MSE today: implicitly quality-of-implementation.