Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where should "headers" go relative to the WEBVTT magic string? #485

Open
icbaker opened this issue Jun 5, 2020 · 11 comments
Open

Where should "headers" go relative to the WEBVTT magic string? #485

icbaker opened this issue Jun 5, 2020 · 11 comments
Labels

Comments

@icbaker
Copy link

icbaker commented Jun 5, 2020

We're implementing a WebVTT parser that also handles the X-TIMESTAMP-MAP header specified by Section 3.5 of HLS RFC 8216.

A question has come up about where headers should be in a WebVTT file. The spec seems to only mention "headers" in 6.1 WebVTT file parsing, and by my interpretation of steps 6 - 11 they must be on lines directly beneath WEBVTT (i.e. no blank lines in between). It seems step 6 allows for additional text after WEBVTT but then ignores/skips it in step 7. Then step 9 advances to the next line and step 11 parses any headers.

It seems by this algorithm this would be a valid file:

WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:46.582 --> 00:00:48.305
  First cue

However 4.1 WebVTT file structure seems to require one or more blank lines directly underneath WEBVTT (step 4) and doesn't mention headers at all. It does allow text immediately after WEBVTT on the first line (i.e. the same text that is skipped by the parsing algorithm) (step 3).

So by that interpretation, my example file above is invalid, and the only place to add 'extra' info to the start of the file is directly after WEBVTT, e.g.:

WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:46.582 --> 00:00:48.305
  First cue

But the X-TIMESTAMP-MAP data is now in the section that's skipped by step 7 of the algorithm in 6.1 WebVTT file parsing.

So I guess there's two questions:

  • Do 4.1 and 6.1 in the spec disagree about what lines can follow WEBVTT?
  • Should headers be directly beneath WEBVTT or on the same line, or is either valid?
@fsoder
Copy link

fsoder commented Jun 5, 2020

  • Do 4.1 and 6.1 in the spec disagree about what lines can follow WEBVTT?

No - one defines syntax and the other parsing. The note in section 2.1 explains what the difference is.

  • Should headers be directly beneath WEBVTT or on the same line, or is either valid?

First I'd like to say that whatever defines this "extension" should define where to expect this additional piece of data. Preferably it should also provide testcases to that effect.

I don't really want to use the "valid" word for any of those places, because what it's supposed to be is not defined by WEBVTT [1]. The parser was provided with various potential "extension points" for forward-compatibility. Anything within the characters collected in step 7 I'd call "signature extension" (because "WEBVTT" is the "file signature") and anything collected by step 11 a "header extension" - there are however no such public extension points.

From my PoV, what's described above looks more like a "header extension" than a "signature extension". As mentioned above though, it's really up to whoever defined the "extension" / metadata to define where one would expect to find it (and how it should be parsed). See also some example in #346.

[1] If fact that the syntax section seems to allow "random garbage" after the signature is read as being a way for a file to be deemed as "valid", I think that's unfortunate (and perhaps it should be rectified?).

@icbaker
Copy link
Author

icbaker commented Jun 5, 2020

Thanks! So to confirm my understanding: Technically any file with "header extensions" (as collected by 6.1 step 11a) is invalid. But this is OK, the parser is worded to deliberately allow 'invalid' files that might be carrying useful custom info (like this header).

Ah I just found a reference to this specific header: #304 (comment)

And that also took me to this commit, showing "headers" used to be allowed in a previous version of the syntax section and were removed in 2016:
bf72d5b

Looks like they were added in 2013: 02d95ce

And then clarified/fleshed out later that year: b82c0c5

I guess Apple wrote the HLS spec when "WebVTT header" was an unambiguous phrase - at the time it clearly meant colon-separated, key-value pairs on consecutive lines directly after the WEBVTT magic string.

Now the header has been removed from the syntax section I agree it's ambiguous, and ideally Apple would update the HLS spec to take up the ambiguity slack and clearly define what their expectations are.

But for the sake of our current discussion I think the intent is pretty clear: They should be on separate lines directly after WEBVTT, even though that technically creates an 'invalid' file.

@nigelmegitt
Copy link
Contributor

Note to group: @icbaker also posted notification of this to IETF at https://mailarchive.ietf.org/arch/msg/hls-interest/MzTHEqH3FVE5E3chn8ULjUon9b8/ - thank you!

I think at this stage it would be fair to say that the TTWG has not discussed it, nor has the Web Media Text Tracks CG. I don't know what such a discussion may result in, but I guess it is possible that some concept could be restored into the WebVTT spec that would revert the need to ask IETF to change the RFC.

pinging @gkatsev

@gkatsev
Copy link
Collaborator

gkatsev commented Jun 10, 2020

The question is whether WebVTT should include it given that it won't be used for anything in WebVTT itself.
There has been want of being able to provide extra information about the type of WebVTT file it is in the file itself as opposed to only out of band. Having it as part of a WebVTT header may make sense.
Maybe the interim solution is to publish the webvtt header definition as a WG note or something?

@dwsinger
Copy link

I (personally) prefer self-describing self-contained files; I think maintaining external bits of data that should travel with and be kept aligned with the data is a pain. I argued for a general syntax for metadata headers, and rules for what to do with them...I did not prevail

@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Where should "headers" go relative to the `WEBVTT` magic string? webvtt#485, and agreed to the following:

  • SUMMARY: Discussions continuing, further inputs welcome.
The full IRC log of that discussion <nigel> Topic: Where should "headers" go relative to the `WEBVTT` magic string? webvtt#485
<nigel> github: https://github.com//issues/485
<nigel> Gary: HLS has a concept of segmented WebVTT.
<nigel> .. To be able to display them properly they added a TIMESTAMP-MAP that maps the
<nigel> .. WebVTT times to the HLS timeline.
<nigel> .. The HLS spec refers to "WebVTT Header" for specifying this timestamp map.
<nigel> .. The problem was that an issue was opened for supporting TIMESTAMP-MAP in a place
<nigel> .. and the question was "what are WebVTT headers?" because the current specification no
<nigel> .. longer includes that concept.
<nigel> .. A long time ago regions were specified in WebVTT headers but it was removed.
<nigel> .. What prompted this was a question about if the header can be on the same line as the
<nigel> .. WEBVTT marker or whether it is on a new line.
<nigel> .. Then they also opened a question with IETF about amending the HLS RFC that refers to
<nigel> .. WebVTT header.
<nigel> Nigel: I added a comment because I think it is not obvious where the best place is to fix
<nigel> .. this: in the HLS spec or in WebVTT.
<nigel> Gary: Yes. WebVTT spec, aside, it's a bit tricky because if WebVTT doesn't use headers
<nigel> .. itself it seems a bit weird to have a definition that the spec doesn't use.
<nigel> .. But maybe that's fine because HLS and other things may refer to these headers.
<nigel> .. Or, maybe more future work, there are some feature requests and enhancements for
<nigel> .. WebVTT like adding metadata, that could be implemented as headers.
<nigel> .. If we think of it as step 1 toward that, maybe that's fine.
<nigel> Nigel: Why was it removed, only because it was no longer being used?
<nigel> Gary: It sounds like regions were translated to be blocks, and then the syntax of headers
<nigel> .. was unclear so it was removed instead of specifying it because no other feature was
<nigel> .. using it.
<nigel> Nigel: Is there any usage data about the syntax of files that use these headers?
<nigel> Gary: It is very common in HLS, maybe all segmented WebVTT in HLS has this header.
<gkatsev> -> https://github.com//issues/304 issue that triggered removal of headers from webvtt
<nigel> Nigel: It feels like it would be appropriate for Apple to make a proposal here, as key
<nigel> .. proponents of both HLS and WebVTT.
<nigel> Gary: I'm not sure what the best approach is here.
<nigel> .. I did have one other proposal, which is to grab the WebVTT header text and publish
<nigel> .. it separately as a WG Note, and punt on updating the spec itself until a later date.
<nigel> .. I don't know if it is worth doing.
<nigel> Nigel: And in that proposal it wouldn't be referenced by anything?
<nigel> Gary: Right, but it would be slightly more official than looking at an old version of the spec.
<nigel> Nigel: Does the RFC have a dated reference to WebVTT?
<nigel> .. Oh, it is the Draft CG Report.
<nigel> -> https://w3c.github.io/webvtt/ Reference from HLS
<nigel> Gary: It does have a date associated with it.
<nigel> .. June 2017. But the link references the github.io version which is basically the latest.
<nigel> SUMMARY: Discussions continuing, further inputs welcome.

@gkatsev
Copy link
Collaborator

gkatsev commented Jun 15, 2020

Roger Pantos replies to the hls-interest mailing list. https://mailarchive.ietf.org/arch/msg/hls-interest/4vmLpEsV-EnmkEwMQZkzbGQai_4/

I think I agree that with him that HLS should probably define the location specifically, regardless of whether the WebVTT specification has it. I guess the only risk there is to not get into an incompatible state.

@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed Where should "headers" go relative to the `WEBVTT` magic string? webvtt#485, and agreed to the following:

  • SUMMARY: @gkatsev to draft pull request bringing pared-down header definition back into WebVTT.
The full IRC log of that discussion <nigel> Topic: Where should "headers" go relative to the `WEBVTT` magic string? webvtt#485
<nigel> github: https://github.com//issues/485
<nigel> Gary: We discussed this last week. We put it back on this week because Roger Pantos,
<nigel> .. the main editor of HLS, replied saying that they should specify the header location in HLS.
<nigel> .. I kinda agree with him except that there are some feature requests for WebVTT for the
<nigel> .. future that might use headers. We don't want to be in a situation where there are
<nigel> .. incompatible header specifications in different specs. At the same time, until we add
<nigel> .. features that use headers I'm not sure that WebVTT should describe what headers are.
<nigel> q+
<nigel> ack ni
<nigel> Nigel: I agree with the concern that incompatible definitions would be a bad thing.
<nigel> .. Since WebVTT previously defined this, I think it would make sense at least for WebVTT
<nigel> .. to define header conceptually even if with a note saying it isn't directly used by WebVTT
<nigel> .. yet, but may be in the future, and is available as an extension point for other uses.
<nigel> Gary: Yes, actually having it as an extension/feature and calling it out as that makes sense.
<nigel> .. I hadn't thought of that as a possibility.
<nigel> .. Then HLS could potentially still specify it but also still point to WebVTT.
<nigel> .. We could also try going with the most minimal definition of header.
<nigel> .. As I said last time, it was removed because someone was asking for clarification of usage,
<nigel> .. and it was removed rather than trying to iron out the issues. We either need to iron
<nigel> .. out the issues or pare down the definition so it doesn't really matter.
<nigel> Nigel: Makes sense to me.
<nigel> Gary: I'll take a look at that as Editor.
<nigel> .. I'll try to reply to the HLS mailing list (I have to join it first!)
<nigel> SUMMARY: @gkatsev to draft pull request bringing pared-down header definition back into WebVTT.

@silviapfeiffer
Copy link
Member

Note, there's also a PR here for an example: w3c/webvtt.js#38

@silviapfeiffer
Copy link
Member

I think we need to make a change to the WebVTT spec to allow for such "header" extensions even if the WebVTT spec simply ignores any non-empty lines beneath the "WEBVTT" header until it finds an empty line. This way, it becomes extensible for such headers.

@Nickwiz
Copy link

Nickwiz commented Sep 19, 2024

For what it is worth. YouTube also add extra meta-data in the header section. Typically something like:

WEBVTT
Kind: captions
Language: en

00:00:01.234 --> 00:00:15.678 align:start position:0%

...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants