Skip to content

Latest commit

 

History

History
151 lines (107 loc) · 6.6 KB

prodigy.md

File metadata and controls

151 lines (107 loc) · 6.6 KB

Annotating your own data with 💥 Prodigy

Manually segmenting and labeling audio data is time consuming. For speaker diarization, depending on the required level of precision, it may take more than 10 times the duration of a recording to annotate it.

Table of content

Recipes

pyannote.audio comes with a bunch of 💥 Prodigy recipes designed to speed things up a bit.

Recipe Usage
🦻 pyannote.audio Annotate with a pretrained pipeline in the loop
🧐 pyannote.review Merge multiple annotations
🤲 pyannote.diff Show differences between two annotations
🗄 pyannote.database Dump annotations as pyannote.database protocols

🦻 pyannote.audio | Annotate with a pretrained pipeline in the loop

prodigy pyannote.audio dataset /path/to/audio/directory pyannote/speaker-segmentation

pyannote.audio screenshot

pyannote.audio recipe will stream in .wav files in chunks and apply a pretrained pipeline. You can then adjust the regions manually if needed.

More options
prodigy pyannote.audio [options] dataset source pipeline

  dataset           Prodigy dataset to save annotations to.
  source            Path to directory containing audio files to annotate.
  pipeline          Name of pretrained pipeline on huggingface.co (e.g.
                    pyannote/speaker-segmentation) or path to local YAML file.
  -chunk DURATION   Split audio files into shorter chunks of that many seconds.
                    Defaults to 10s.
  -precision STEP   Temporal precision of keyboard controls, in milliseconds.
                    Defaults to 200ms.
  -beep             Produce a beep when the player reaches the end of a region.

🧐 pyannote.review | Merge multiple annotations

prodigy pyannote.review dataset /path/to/audio/directory input1.rttm,input2.rttm

pyannote.review screenshot

pyannote.review recipe take as many annotation files, using the RTTM file format, as you want and let you compare and choose which ones are best within the same stream as pyannote.audio recipe. Click on a segment of the annotation files to add it to the ouput audio, or on "Input X" to add all segments at once.

More options
prodigy pyannote.review [options] dataset source annotations

  dataset           Prodigy dataset to save annotations to.
  source            Path to directory containing audio files whose annotation is to be checked.
  annotations       Comma-separated paths to annotation files.
  -chunk DURATION   Split audio files into shorter chunks of that many seconds.
                    Defaults to 30s.
  -diarization      Make a optimal one-to-one mapping between the first annotation and the others.
  -precision STEP   Temporal precision of keyboard controls, in milliseconds.
                    Defaults to 200ms.
  -beep             Produce a beep when the player reaches the end of a region.

🤲 pyannote.diff | Show differences between two annotations

prodigy pyannote.diff dataset /path/to/audio/directory /path/to/reference.rttm /path/to/hypothesis.rttm

pyannote.diff screenshot

pyannote.diff recipe take one reference file and one hypothesis file, using the RTTM file format, and focus where there are the most errors among missed detections, false alarms and confusions. You can filter on one or more error types and their minimum duration with the corresponding options.

More options
prodigy pyannote.diff [options] dataset source reference hypothesis

  dataset                    Prodigy dataset to save annotations to.
  source                     Path to directory containing audio files whose annotation is to be checked.
  reference                  Path to reference file.
  hypothesis                 Path to hypothesis file.
  -chunk DURATION            Split audio files into shorter chunks of that many seconds.
                             Defaults to 30s.
  -min-duration DURATION     Minimum duration of errors in ms.
                             Defaults to 200ms.
  -diarization               Make a optimal one-to-one mapping between reference and hypothesis.
  -false-alarm               Display false alarm errors.
  -speaker-confusion         Display confusion errors.
  -missed-detection          Display missed detection errors.

🗄 pyannote.database | Dump annotations as pyannote.database protocols

Work in progress

Keyboard shortcuts

Though pyannote.audio recipes are built on top of the Prodigy audio interface, they provide a bunch of handy additional keyboard shortcuts.

Shortcut Description
left / right (+ w) Shift player cursor (speed up)
up / down Switch active region
shift + left / shift + right Shift active region start time
ctrl + left / ctrl + right Shift active region end time
shift + up Create a new region
shift + down / backspace Remove active region
spacebar Play/pause player
escape Ignore this sample
enter Validate annotation

RTTM file format

RTTM files contain one line per speech turn, using the following convention:

SPEAKER {uri} 1 {start_time} {duration} <NA> <NA> {speaker_id} <NA> <NA>
  • uri: file identifier (as given by pyannote.database protocols)
  • start_time: speech turn start time in seconds
  • duration: speech turn duration in seconds
  • confidence: confidence score (can be anything, not used for now)
  • gender: speaker gender (can be anything, not used for now)
  • speaker_id: speaker identifier