Skip to content

This is the pipeline based on the research in Peter Uhrig's thesis - updated as need

License

Notifications You must be signed in to change notification settings

RedHenLab/newsscape_processing_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

newsscape_processing_pipeline

This is the pipeline based on the research in Peter Uhrig's thesis - updated as needed

This pipeline assumes as starting points the NewsScape text file and the NewsScape video file

Processing Steps

  • Where exactly? - Data quality assessment step 1
    • spell checking
    • audio signal quality check
    • video error check
  • Sentence splitting (includes transformation of turn and story boundaries - extend to commercials?
  • ToDo: Where do we incorporate the information about commercials? Either here or before/during sentence splitting.
  • Extraction of non-spoken text
  • Run CoreNLP
    • tokenization
    • pos
    • lemma
    • TrueCase
    • dependency parsing
    • NER
    • coreference resolution
  • (optional) Run PathLSTM for semantic frame annotation
  • Verticalize (including XML parsing to check integrity)
  • Create input for Gentle
  • Run Gentle (modified version!)
  • Integrate Gentle results ToDo: FIX THIS!
  • Data quality assessment step 2 - audio annotation quality (based on Gentle)
  • Audio annotation with gender detection
  • Run OpenPose and other image annotation software on video
  • Integrate results
  • In the long run: biometric, yolo, etc.

About

This is the pipeline based on the research in Peter Uhrig's thesis - updated as need

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published