This project demonstrates forced alignment between audio and text using PyTorch and Wav2Vec2. It generates synthetic speech from text and aligns it with the transcript, visualizing the alignment process.
- Text-to-speech generation using gTTS
- Forced alignment using Wav2Vec2
- Multiple visualizations:
- Frame-wise class probability
- Alignment path in trellis matrix
- Word segments with spectrogram
- Clone this repository
- Install the required dependencies:
- v1.0.0: Initial release
- Basic forced alignment implementation
- Text-to-speech generation
- Visualization of alignment path
- Support for 3-word demo phrase