You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have extracted the audio channel from the YouTube video found at this link. Subsequently, I tested the audio file using the local file streaming mode with diart.stream. However, the resulting rttm file did not meet my expectations. It is evident that there are four distinct speakers in the video, each with a sufficient duration of speech. Yet, the Speaker Diarization pipeline only identified three speakers, not mention the alignment performance of speakers and their respective speech segments on the timeline.
I used "pyannote/segmentation-3.0" and "pyannote/wespeaker-voxceleb-resnet34-LM" for segmentation and embedding model respectively. How should I adjust the parameters? I do not have lablled data.
Hello,
I have extracted the audio channel from the YouTube video found at this link. Subsequently, I tested the audio file using the local file streaming mode with diart.stream. However, the resulting rttm file did not meet my expectations. It is evident that there are four distinct speakers in the video, each with a sufficient duration of speech. Yet, the Speaker Diarization pipeline only identified three speakers, not mention the alignment performance of speakers and their respective speech segments on the timeline.
I used "pyannote/segmentation-3.0" and "pyannote/wespeaker-voxceleb-resnet34-LM" for segmentation and embedding model respectively. How should I adjust the parameters? I do not have lablled data.
The text was updated successfully, but these errors were encountered: