Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of edit lists and timed text tracks #40

Open
cconcolato opened this issue Oct 18, 2021 · 4 comments
Open

Use of edit lists and timed text tracks #40

cconcolato opened this issue Oct 18, 2021 · 4 comments

Comments

@cconcolato
Copy link
Contributor

I want to explore the use of edit lists of TTML in MP4 (and WebVTT in MP4).

Reminder on edit lists

As defined in 14496-12, a track has:

  • a decode timeline. Each sample has a decode time on this decode timeline, aka media time. The first sample of the track has a decode time of 0.
  • a composition timeline. Except for video streams that use frame reordering and that may have their first sample with a non-0 composition time, for audio and text tracks, composition time and decode time are equal.
  • a presentation timeline. It is a common timeline to all tracks in the same file. The sample to be played at a given time on this presentation timeline is determined by applying the edit list of the track.

The edit list of a track can provide a very complex transformation of the composition timeline. For example, it can indicate that for a presentation time interval, no samples are played (empty edit); or that for a presentation time interval, a sample is paused; or that for a presentation time interval, a composition time interval is played at a given speed ... Edit lists can even select parts of a sample to be in the presentation timeline, e.g. a sample that starts at composition time 5 and ends at composition time 7 is only played from its 6s to 6.5s.

Simple (typical) use cases of edit lists are:

  • editing out the initial time interval in the composition timeline that has no sample (video stream with frame reordering)
  • editing out the priming samples of an audio track
    Note that when one track has such edit list, the other tracks don't need to have an edit list. For example, if a file has 3 tracks (audio, video, subs), if the audio track has priming but the video track has no b-frame, the video track and the subs track don't need an edit list.

Advanced use cases for edit lists are editing operations in non-linear authoring tools (cut, insert, reorder, ...).

CMAF restricts the use of edit lists to the typical cases above.

Timed text and edit lists

As an example, if a timed text file is authored against a video file and later on that video file is modified to add a bumper (i.e. an ident, an intro, a title card, ...), effectively shifting the dialogs in the audio and the video forward in time, it could be interesting to adjust the timing of the timed text track. This may be done by inserting an empty edit with an edit list.

WebVTT

For WebVTT tracks, given that the WebVTT cue time is derived from the sample time, shifting the sample time with an edit list actually affects the cue time as expected. The following cue:

00:11.000 --> 00:13.000
We are in New York City

gets stored in a sample with CTS = 11s, duration 2s, with the payload:

We are in New York City

if the sample CTS is changed to 15s (adding a 4s empty edit), that effectively does as if the initial cue had been:

00:15.000 --> 00:17.000
We are in New York City

Note that additional care (i.e. use of the ctim box) needs to be taken if the cue payload contained a WebVTT Cue Timestamp.

TTML

For a TTML track, times in the TTML document as relative to the start of the track, but a question is: is it the start of the composition timeline or the start of the presentation timeline?
Let's assume that the following document:

<tt
...
<body>
<div>
<p begin="11s" end="13s">We are in New York City</p>
</div>
</body>
</tt>

is stored in a sample (CTS = 11s, duration = 2s).

i. if TTML document times are interpreted as delta from the start of the composition timeline, the behavior would be the same as for the WebVTT case. When applying the same edit list, and playing presentation time 15s, the player would know that it is actually playing composition time 11s, and which would match the time values in the document.

ii. if TTML document times are interpreted as delta from the start of the presentation timeline, when applying the same edit list, at presentation time 15s, when the TTML parser is fed the same document and seeked at time 15s, there is no active element. Nothing plays. To make it work, when adding the edit list, one has to adjust the TTML document in the sample to be:

...
<body>
<div>
<p begin="15s" end="17s">We are in New York City</p>
</div>
</body>
</tt>

Currrent spec text

ISO/IEC 14496-30 2nd edition, Section 4.2 says:

The rendering of the sample happens at the composition time, taking into account edit lists if any

This means, as usual, that the presentation of a timed text track behaves like a video or audio track and is driven by the presentation time, from which a composition time and a sample number is derived.

It then says:

The subclauses defining the storage of specific formats in the ISOBMFF specify how internal timing values relate
to the track time or to the sample decode or composition time (see subclauses 5.3 and 6.3). For instance, start or end
times may be relative to the start of the sample, or the start of the track.

Section 5.3 (TTML) says:

The top-level internal timing values in the timed text samples based on TTML express times on the track
presentation timeline – that is, the track media time as optionally modified by the edit list. For example, the begin
and end attributes of the element, if used are relative to the start of the track, not relative to the start of the
sample.

So clearly edit lists are meant to apply to TTML, but nothing warns about the issue described above.
Note that the text from the second edition has other flaws/ambiguities and is rephrased in the amendment 1 to the second edition.

Recommendation

My recommendation would then be to update the TTML section and add something along the lines of:

Edit lists on TTML tracks should be used with care due to the fact that times in the TTML document in a sample are not relative to the sample time. Authoring tools adding edit lists to TTML tracks are expected to update the times in the TTML documents in the track.

@nigelmegitt
Copy link

I've had a think and a discussion about this and would add the following points:

  • It would be unfortunate if the timing models in 14496-30 create different capabilities for WebVTT and TTML with respect to timing

  • I agree that the need to modify the times within the TTML payload makes use of edit lists complex

  • I don't think it's right to assume that any edit lists added are added by authoring tools and that whatever is adding the edit lists also has the capability to modify the TTML document itself

  • There may be times when the TTML document times are mistimed against the sample, but some more acceptable result can be achieved by applying an edit list without modifying the times

    • For example, a sample beginning at 10s (composition time) with a duration of 5s, containing a TTML payload that defines subtitles in the interval 5s to 10s would not result in any content being displayed, if no edit list is applied; however applying an edit list whose effect is to bring the presentation time of that sample 5s earlier than the composition time would result in it being displayed. Whether the results are editorially desirable or not depends on the content.
  • An alternative not apparently considered so far, is to shift the effective TTML times in the client, i.e. to change the interpretation of times from being on the presentation timeline to the composition timeline, so that sections 4.2 and 5.3 are consistent with each other.

    • For example, a client could generate the set of ISDs arising from the TTML document sample for a particular composition time, and then time-shift/truncate each of those ISDs as necessary to process the edit list.
    • This adds further complexity to the client, but is at least a logically feasible way to proceed, in principle, should edit list functionality be desired.
    • It makes clear the consequences of the TTML document timeline being on the composition timeline.
    • It possibly is unclear that the TTML document timeline is currently on the presentation timeline, so that for me is the key point that needs to be clarified. After all, how can the document be rendered at composition time when the timestamps are on the presentation timeline? It doesn't seem to make sense.

Overall, the simplest option for typical use (e.g. for a use case like CMAF) is probably to state that edit lists cannot be used with TTML, so that the composition timeline and the presentation timeline are effectively identical.

I agree with what I think is the intent behind your proposal @cconcolato , which is to define more clearly what it means to apply an edit list to a TTML sample, and what the consequences might be, without changing the current sense that they can be applied. Then it is down to profiles and applications to define whether edit lists are permitted in a particular context.

@sdp198
Copy link

sdp198 commented Jan 17, 2022

@cconcolato You give two possibilities about TTML timing i) TTML times correlate to Composition Times, and ii) TTML times correlate to Presentation Times.

I think your recommentation is OK, but needs to explicitly explain that it's option ii) that is the correct interpretation.

Previously (before all these discussions around this amendment) I'd expected it was option i), but with the drawback that most clients were unlikely to support edit lists at all with TTML.

@cconcolato
Copy link
Contributor Author

I think we should start an amendment to 14496-30 with this item.

@jpiesing
Copy link

If this is taken forwards then I recommend (request) that test content be produced (or at least extremely detailed instructions written for it) that distinguishes between 1) a correct implementation of what is specified, 2) implementations not supporting edit lists on TTML at all, 3) the most obvious way or ways in which an implementation might support edit lists with TTML but incompatibly with what is specified. A description of what would be seen in each case would also be really helpful. Obviously some test content that behaves visibly differently in each of these may need to be very artificial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants