Skip to content

Latest commit

 

History

History
26 lines (18 loc) · 837 Bytes

README.md

File metadata and controls

26 lines (18 loc) · 837 Bytes

VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features

Accepted to INTERSPEECH 2024 arXiv preprint

Sample code will be available soon.

Alignment examples

  • annotated: Manually annotated phoneme boundaries in the corpus
  • Proposed: Predicted boundaries using proposed method
  • MFA: Predicted boundaries using Montreal Forced Aligner
  • CTC: Predicted boundaries using CTC forced alignment
  • OTA: Predicted boundaries using "One TTS alignment to rule them all"

CSJ dataset

example1

example2

example3

TIMIT dataset

timit_example

Buckeye dataset

buckeye_example