About Transcript Files and the OH Solution Pack

The Oral History module supports two types of transcript files on ingest. The different datastreams produced on ingest of either format depend on how you have configured the module. You can review a diagram of the workflow for both types of source files on the documentation home page. There are pros and cons to both approaches.

The Oral History module does not support transcripts with the following

In-text markup (such as html styling)
Overlapping tiers and timecodes

Custom XML Format

In the XML transcript, the structure is as follows

<cues>
<!-- cues is the root level of the XML file -->
    <solespeaker>One Speaker</solespeaker>
<!-- use the solespeaker element if there is only one speaker throughout the transcript -->
    <cue>
        <speaker>Different Speaker</speaker>
        <!-- only declare the speaker element if you have not declared the "solespeaker" element at the "cues" level-->
        <start>0.000</start>
        <end>12.124</end>
        <!-- 'start' and 'end' elements are start time and end time in seconds for the cue. -->
        <transcript>This is the transcript text content.</transcript>
        <translation>This is the annotation content.</translation>

        <!-- 'transcript' and/or 'translation' are default content tiers of the cue.
              Extra tier(s) can be added as long as they are listed in the configuration page.
             'transcript' element is required if 'Enable captions/subtitles display' is configured to be true, as this 
             element will be crosswalked to a webvtt file on ingest and used to power closed captioning in the viewer -->
    </cue>

    <!-- add more cues with above structure.-->

</cues>

Validators have been written for the XML format required by the module, and are available in the tests folder or directly from this link.

Recording speakers in the transcript

If only one person speaks throughout the interview, you do not have to add the stamp to each time cue. Just indicate the speaker at the beginning. For a sample XML transcript with a single tier and single speaker, visit our testing repository.
If multiple people speak throughout the interview, declare a speaker for each time cue. An example is provided, and the issue of speakers described in more detail in the example above. For a sample XML transcript with multiple tiers and speakers, visit our testing repository

Multi-tiered transcripts

Tiers are additional layers of information that can be added to your transcript that aren’t transcribed information. This can include translations, transliterations, annotations, and so on. Annotation and Transcription are enabled by default, but additional tiers can be defined in the administration screen for the module.

WebVTT Format

Unlike the custom XML format utilized by the module, WebVTT is developed by the W3C. Also unlike the custom XML format utilized by the module, WebVTT files only support a single "tier" of information per file, and utilize multiple files in order to reflect different tiers of information. In the Oral History module, WebVTT files can be ingested as source files. Everything after an underscore (_) in the file name will be considered a language code. Using ISO 639-1 language codes is recommended. Example: EN, FR, ES

YES:

sampletranscript.vtt
sampletranscript_en.vtt
sampletranscript_fr.vtt
0012-004308-000000-0002.vtt

Screen Shot of language dropdown

NO:

sample_transcript.vtt
sample_transcript_en.vtt
sample_transcript_fr.vtt
0012_004308_000000_0002_en.vtt
0012_004308_000000_0002_fr.vtt

0012_004308_000000_0002.vtt results in the language selection to be 004308.

Screen Shot of bad vtt file naming

These language codes will appear in the Closed Captioning portion of the Oral History viewer after they are ingested, provided that the language code has been identified in the administrative interface. However, only one WebVTT will be parsed and for display beneath the viewer and crosswalked for Solr indexing. This means that the WebVTT format is lossier for some display and discovery features in Islandora. On the other hand, sample .vtt files and tools are easier to locate as it is a supported standard. A WebVTT Validator is available to validate your mediatrack files. A demo is also available. Note: Validator may or may not be up to date with current WebVTT specification.

Note that not all of WebVTT's features are supported by the module. We are exploring the use of things like cue language span in future versions of the software.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Transcript Files and the OH Solution Pack

Custom XML Format

Recording speakers in the transcript

Multi-tiered transcripts

WebVTT Format

Islandora Oral History Solution Pack Documentation 7.x-1.10 Home

Configuring Solr

Module Administration

Content and Ingesting

Resources

Clone this wiki locally