-
Notifications
You must be signed in to change notification settings - Fork 23
About Transcript Files and the OH Solution Pack
The Oral History module supports two types of transcript files on ingest. The different datastreams produced on ingest of either format depend on how you have configured the module. You can review a diagram of the workflow for both types of source files on the documentation home page. There are pros and cons to both approaches.
The Oral History module does not support transcripts with the following
- In-text markup (such as html styling)
- Overlapping tiers and timecodes
In the XML transcript, the structure is as follows
<cues>
<!-- cues is the root level of the XML file -->
<solespeaker>One Speaker</solespeaker>
<!-- use the solespeaker element if there is only one speaker throughout the transcript -->
<cue>
<speaker>Different Speaker</speaker>
<!-- only declare the speaker element if you have not declared the "solespeaker" element at the "cues" level-->
<start>0.000</start>
<end>12.124</end>
<!-- 'start' and 'end' elements are start time and end time in seconds for the cue. -->
<transcript>This is the transcript text content.</transcript>
<translation>This is the annotation content.</translation>
<!-- 'transcript' and/or 'translation' are default content tiers of the cue.
Extra tier(s) can be added as long as they are listed in the configuration page.
'transcript' element is required if 'Enable captions/subtitles display' is configured to be true, as this
element will be crosswalked to a webvtt file on ingest and used to power closed captioning in the viewer -->
</cue>
<!-- add more cues with above structure.-->
</cues>
Validators have been written for the XML format required by the module, and are available in the tests
folder or directly from this link.
-
If only one person speaks throughout the interview, you do not have to add the stamp to each time cue. Just indicate the speaker at the beginning. For a sample XML transcript with a single tier and single speaker, visit our testing repository.
-
If multiple people speak throughout the interview, declare a speaker for each time cue. An example is provided, and the issue of speakers described in more detail in the example above. For a sample XML transcript with multiple tiers and speakers, visit our testing repository
Tiers are additional layers of information that can be added to your transcript that aren’t transcribed information. This can include translations, transliterations, annotations, and so on. Annotation and Transcription are enabled by default, but additional tiers can be defined in the administration screen for the module.
Unlike the custom XML format utilized by the module, WebVTT is developed by the W3C. Also unlike the custom XML format utilized by the module, WebVTT files only support a single "tier" of information per file, and utilize multiple files in order to reflect different tiers of information. In the Oral History module, WebVTT files can be ingested as source files. Everything after an underscore (_) in the file name will be considered a language code. Using ISO 639-1 language codes is recommended. Example: EN, FR, ES
YES:
- sampletranscript.vtt
- sampletranscript_en.vtt
- sampletranscript_fr.vtt
- 0012-004308-000000-0002.vtt
NO:
- sample_transcript.vtt
- sample_transcript_en.vtt
- sample_transcript_fr.vtt
- 0012_004308_000000_0002_en.vtt
- 0012_004308_000000_0002_fr.vtt
0012_004308_000000_0002.vtt results in the language selection to be 004308.
These language codes will appear in the Closed Captioning portion of the Oral History viewer after they are ingested, provided that the language code has been identified in the administrative interface. However, only one WebVTT will be parsed and for display beneath the viewer and crosswalked for Solr indexing. This means that the WebVTT format is lossier for some display and discovery features in Islandora. On the other hand, sample .vtt files and tools are easier to locate as it is a supported standard. A WebVTT Validator is available to validate your mediatrack files. A demo is also available. Note: Validator may or may not be up to date with current WebVTT specification.
Note that not all of WebVTT's features are supported by the module. We are exploring the use of things like cue language span in future versions of the software.