Simplified diarization pipeline using some pretrained models.
Made to be a simple as possible to go from an input audio file to diarized segments.
import soundfile as sf
import matplotlib.pyplot as plt
from simple_diarizer.diarizer import Diarizer
from simple_diarizer.utils import combined_waveplot
diar = Diarizer(
embed_model='xvec', # 'xvec' and 'ecapa' supported
cluster_method='sc' # 'ahc' and 'sc' supported
)
segments = diar.diarize(WAV_FILE, num_speakers=NUM_SPEAKERS)
signal, fs = sf.read(WAV_FILE)
combined_waveplot(signal, fs, segments)
plt.show()
Simplified diarization is available on PyPI:
pip install simple-diarizer
"Some Quick Advice from Barack Obama!"
The following pretrained models are used:
- Voice Activity Detection (VAD)
- Deep speaker embedding extraction
- (Optional/Experimental) Speech-to-text
- ESPnet Model Zoo
- English ASR model
- ESPnet Model Zoo
It can be checked out in the above link, where it will try and diarize any input YouTube URL.
- Spectral clustering methods lifted from https://github.com/wq2012/SpectralCluster