`: Start time of monologue (if available)
-- ``: Content of monologue
+While there is no defined maximum line-length, to ensure that displaying WebVTT as closed-captions can work well, a maximum line-length of 65 characters is recommended. If you're using whisper-cpp or equivalent, `--split-on-word --max-len 65` would be a method of achieving this.
+
+The full specification includes formatting features; these are typically not used in podcasting applications.
#### Snippet:
-```html
-Kevin:
-0:00
-
We have an update planned where we would like to give the ability to upload an artwork file for these videos
-Alban :
-0:09
-You're triggering Tom right now with a hey, here's a cool feature.
```
+WEBVTT
-Example file: [example.html](example.html)
+00:00:00.000 --> 00:00:05.000
+Podcasting 2.0 is really changing the game.
-
+00:00:05.000 --> 00:00:10.000
+Yeah, absolutely. The new features are incredible.
+00:00:10.000 --> 00:00:15.000
+It's amazing how it's empowering creators like never before.
-## JSON
+00:00:15.000 --> 00:00:20.000
+And the enhanced monetization options are a game-changer.
-The JSON representation is a flexible format that accomodates various degrees of fidelity in a concise way. This format for podcast transcripts should adhere to the following specifications.
+00:00:20.000 --> 00:00:25.000
+Exactly, Tom. It's revolutionizing the industry.
-#### Elements included in this representation:
-- ``: The version of JSON transcript specification
-- ``: An array of dialogue elements (segments)
-- ``: Speaker
-- ``: Start time for the segment
-- ``: End time for the segment (if available)
-- ``: Dialogue content
+00:00:25.000 --> 00:00:30.000
+No doubt about it. Podcasting 2.0 is the future.
-#### Snippet:
-```json
-{
- "version": "1.0.0",
- "segments": [
- {
- "speaker": "Darth Vader",
- "startTime": 0.5,
- "endTime": 0.75,
- "body": "I"
- },
- {
- "speaker": "Darth Vader",
- "startTime": 1,
- "endTime": 1.25,
- "body": "am"
- },
- {
- "speaker": "Darth Vader",
- "startTime": 1.5,
- "endTime": 2.0,
- "body": "your"
- },
- {
- "speaker": "Darth Vader",
- "startTime": 2.25,
- "endTime": 2.50,
- "body": "father.\n"
- },
- {
- "speaker": "Luke",
- "startTime": 2.75,
- "endTime": 3.0,
- "body": "Nooooo"
- }
- ]
-}
+00:00:30.000 --> 00:00:35.000
+Couldn't agree more, Tom. The future looks bright.
```
-Example file: [example.json](example.json)
+Example file: [example.vtt](example.vtt)
-
+#### Web browser support example
+This example code will add an audio player on a web page, and display the accompanying WebVTT file as the audio plays. (Note that this basic code will not show speaker names).
+
+```
+
+
+
+```
+
+
## SRT
The SRT format was designed for video captions but provides a suitable solution for podcast transcripts. The SRT format contains medium-fidelity timestamps and are a
-popular export option from transcription services. SRT transcripts used for podcasts should adhere to the following specifications.
+popular export option from transcription services. An SRT file can be generated programmatically from a VTT file (and vice-versa).
+
+SRT transcripts used for podcasts should adhere to the following specifications:
#### Properties:
- Max number of lines: 2
@@ -144,50 +125,80 @@ do we need a podcast trailer?
Example file: [example.srt](example.srt)
-## WebVTT
+## JSON
-Web Video Text Tracks Format (WebVTT) are an alternative to SRT primarily designed for the use in HTML on the web. It is supported in all major web browsers and is similar enough to SRT to be converted.
-
-### Differences from SRT taken from [Wikipedia](https://en.wikipedia.org/wiki/WebVTT):
-- WebVTT's first line starts with WEBVTT after the optional UTF-8 byte order mark
-- There is space for optional header data between the first line and the first cue
-- Timecode fractional values are separated by a full stop instead of a comma
-- Timecode hours are optional
-- The frame numbering/identification preceding the timecode is optional
-- Comments identified by the word NOTE can be added
-- Metadata information can be added in a JSON-style format
-- Chapter information can be optionally specified
-- Only supports extended characters as UTF-8
-- CSS in a separate file defined in the companion HTML document for C tags is used instead of the FONT tag
-- Cue settings allow the customization of cue positioning on the video
+The JSON representation is a flexible format that accomodates various degrees of fidelity in a concise way. At the most precise, it enables word-by-word highlighting. This format for podcast transcripts should adhere to the following specifications.
-#### Properties:
-- Speaker names (optional): Speakers can be included in a voice span tag `` at the beginning of each caption.
+#### Elements included in this representation:
+- ``: The version of JSON transcript specification
+- ``: An array of dialogue elements (segments)
+- ``: Speaker
+- ``: Start time for the segment
+- ``: End time for the segment (if available)
+- ``: Dialogue content
#### Snippet:
+```json
+{
+ "version": "1.0.0",
+ "segments": [
+ {
+ "speaker": "Darth Vader",
+ "startTime": 0.5,
+ "endTime": 0.75,
+ "body": "I"
+ },
+ {
+ "speaker": "Darth Vader",
+ "startTime": 1,
+ "endTime": 1.25,
+ "body": "am"
+ },
+ {
+ "speaker": "Darth Vader",
+ "startTime": 1.5,
+ "endTime": 2.0,
+ "body": "your"
+ },
+ {
+ "speaker": "Darth Vader",
+ "startTime": 2.25,
+ "endTime": 2.50,
+ "body": "father.\n"
+ },
+ {
+ "speaker": "Luke",
+ "startTime": 2.75,
+ "endTime": 3.0,
+ "body": "Nooooo"
+ }
+ ]
+}
```
-WEBVTT
-
-00:00:00.000 --> 00:00:05.000
-Podcasting 2.0 is really changing the game.
-00:00:05.000 --> 00:00:10.000
-Yeah, absolutely. The new features are incredible.
+Example file: [example.json](example.json)
-00:00:10.000 --> 00:00:15.000
-It's amazing how it's empowering creators like never before.
+
-00:00:15.000 --> 00:00:20.000
-And the enhanced monetization options are a game-changer.
+## HTML
-00:00:20.000 --> 00:00:25.000
-Exactly, Tom. It's revolutionizing the industry.
+The HTML transcript format provides a solution when a transcript is available but no or limited timecode data is available. HTML transcript files are considered low-fidelity and are designed to serve as an accessibility aid and provide searchable episode content. The HTML format used for podcast transcripts should adhere to the following specifications.
-00:00:25.000 --> 00:00:30.000
-No doubt about it. Podcasting 2.0 is the future.
+#### HTML tags used:
+- ``: Name of the speaker (if available)
+- ``: Start time of monologue (if available)
+- ``: Content of monologue
-00:00:30.000 --> 00:00:35.000
-Couldn't agree more, Tom. The future looks bright.
+#### Snippet:
+```html
+Kevin:
+0:00
+We have an update planned where we would like to give the ability to upload an artwork file for these videos
+Alban :
+0:09
+You're triggering Tom right now with a hey, here's a cool feature.
```
-Example file: [example.vtt](example.vtt)
+Example file: [example.html](example.html)
+
+