Author: Matthew Wolenetz, Google Inc. - March 15, 2018. Last update September 17, 2018.
We propose adding a changeType
method on SourceBuffer
that allows the type
(bytestream and codec(s)) of media bytes subsequently appended to the
SourceBuffer
to be changed. We plan to incubate this idea via the WICG, with
goal of eventually working with WebPlatformWG to get the result of WICG
incubation as part of the next version of the Media Source Extensions
API (MSE).
- Chrome M70 shipped SourceBuffer.changeType().
- Firefox 63 shipped SourceBuffer.changeType().
- Safari Technical Preview Release 64 added Experimental Feature support for SourceBuffer.changeType()
- Current experimental web-platform-test results:
Web authors have consistently requested that MSE afford a mechanism for switching codecs and bytestreams:
In the absence of such mechanisms, web authors are forced to switch among
multiple MediaSource
instances programmatically to approach work-arounds for the
specification gaps. Beyond complexity, a primary concern of those workarounds is
that they are limited to the scheduling of javascript execution; there can be
significant delays at transition points that impair the user experience.
Design consideration was given to alternative ideas that are not in scope of this proposal:
- Leveraging in-band or application-provided text track cues to control the timing of track transitions among a dynamic set of SourceBuffers. Though this could possibly apply to any HTMLMediaElement (not just those extended by MSE), the complexity of this approach led us to propose a simpler change scoped to just MSE.
- Likewise, we rejected pursuing a new HTMLMediaElement (or perhaps TrackList) API that would enable web apps to programmatically declare similar kinds of track-change timings.
- We could continue to require same bytestream, but allow codec to vary with tracks of the same kind of media (e.g. audio or video) across initialization segments. This would be insufficient for supporting cross-bytestream switching.
Note that this proposal does not preclude separate effort to pursue these other ideas.
This proposal is focused on enabling cross-codec and cross-bytestream buffering
and playback of media in a single MediaSource
instance.
While the REC version of Media Source Extensions
API (MSE) supports adaptive playback of media,
that adaptation requires that any media appended to a SourceBuffer
must
conform to the mimetype provided when initially creating the SourceBuffer
via
MediaSource.addSourceBuffer(
type)
.
With the addition of a proposed changeType
method on SourceBuffer
, with a
type parameter similar to that in the existing addSourceBuffer
method on
MediaSource
, a SourceBuffer could buffer and support playback across different
bytestream formats and codecs. This new method would retain previously buffered
media modulo future MSE coded frame eviction or removal, and leverage the
splicing and buffering logic in the existing MSE coded frame processing
algorithm.
If type is not supported by the user agent for the SourceBuffer
and MediaSource
, changeType
would synchronously throw a NotSupportedError
exception (modulo the API would be unaware of potentially unsupported content
transitions, including those implicitly occurring via codec changes in
subsequent initialization segments when the codecs parameter of the
SourceBuffer
's MIME type is ambiguous, and including transitions involving
encrypted content - see also the Media Capabilities API's stream transitioning
work referenced in Resolved Questions, below.)
Should the initialization segment received algorithm continue to require the same number of audio, video and text tracks - and if more than one of a particular type, that the set of track IDs for that type be the same?
For a long time, user agents (such as Chrome) chose the route of allowing a
maximum of one audio and one video track across all SourceBuffers
in a
MediaSource
instance. Practically, this met the REC MSE requirement; further,
it intended to improve the user experience such that only the expected media
would be fetched and incur resource utilization.
Retaining this restriction precludes one of the Implementation Use Cases.
For simplicity, and due to the prevalence of MSE apps that use single-track
bytestreams and up to two SourceBuffers
(one each for audio and video) to manage
adaptation of each independently, the initialization segment received algorithm
continues to require the same number of audio, video and text tracks - and if
more than one of a particular type, that the set of track IDs for that type
remain the same. The algorithm is adjusted to allow codecs to change in the
initialization segment when the bytestream format does not change, even without
explicitly signalling changeType()
. In this latter "implicit codec change"
situation, there is new non-normative text guiding both the API users and user
agent implementors, as some user agents may in short-term continue to disallow
implicit codec switching until they relax their codec-strictness for
addSourceBuffer()
and changeType()
.
Other than the existing MEDIA_ERR_DECODE
and MediaError.message
error reporting mechanism, is there a way applications could (ideally proactively, before fetching and buffering) determine whether or not the user agent has the capability of supporting playback across various levels of mixed encrypted and unencrypted content along with bytestream and codec changes?
This proposal is focused on the MSE API alone. Media Capabilities and Encrypted Media Extensions may need work to support such proactive queries.
Proactive codec and bytestream switching capability detection is being worked on as part of the Media Capabilities API: Transitioning between stream configurations proposal.
Should the proposed changeType
method implicitly perform the reset parser state algorithm, or should it instead require the application to ensure the parser is reset (via abort()
, if necessary)?
As there is no way currently for an application to be certain of the
SourceBuffer
's current append state, changeType
should probably run the
reset parser state algorithm.
changeType
includes running the reset parser state algorithm once
preliminary state and support checks have passed.
To what level should we specify "seamless" playback across bytestream, codec (and perhaps encryption) changes?
This is likely a quality-of-implementation output, rather than a specified input. Decoder reconfiguration, for instance, may not be sufficient in all implementation instances, to support precision across a transition. This is analogous to the same treatment of playback quality across adaptations allowed by REC MSE today: implicitly quality-of-implementation.