Using a bigger chunk size does not improve der #256

fedexman · 2024-12-03T01:45:49Z

Hello,
I've been using Diart to perform audio diarization (in offline mode), From my understanding diart chunks audios and process them one by one to get real time diarization. And also Diart improves accuracy as it processes more chunks, utilizing previous data to achieve better results. Since Diart uses the same model as PyAnnote, I initially assumed that using larger chunks (e.g., 3 minutes) would provide similar accuracy to the PyAnnote library.
However, in my testing, I found that smaller chunk sizes, specifically 5 seconds, delivered better accuracy compared to 60 or 180 seconds. Here are the results from testing on the AMI test set:

5s chunk: 31.57% DER
180s chunk: 35.68% DER
PyAnnote library: 17.11% DER
PyAnnote API: 14.29% DER
Is it expected that smaller chunk sizes would yield better accuracy with Diart? Is there an error in my assumption that larger chunk sizes should improve accuracy? My primary goal is to achieve faster results than those provided by the PyAnnote library, although waiting 30 to 60 seconds for the initial batch is acceptable for my use case.
Thank you for your help.

juanmc2005 · 2024-12-13T11:14:59Z

Hi @fedexman,

Unlike pyannote.audio, diart is built for streaming (i.e. online) diarization. Even though it leverages pyannote models, this does not mean that the two pipelines are comparable. Sacrifices in accuracy need to be made in order to provide a fast-enough diarization in streaming, in particular because of the lack of future context and the low-latency requirements.
It is then expected that offline diarization yield superior performance.

Concerning the chunk size, this is the amount of audio that is sent at once to the model. Larger chunks will make your inference slower, and if the size during inference doesn't match the size used during training you can also get worse results.
If you're using diart for the speed (as opposed to the real-time capabilities), it would make sense to increase the chunk size, as well as the step and the latency. At the end of the day, you should tune these parameters to what best suits your task.

juanmc2005 added the question Further information is requested label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a bigger chunk size does not improve der #256

Using a bigger chunk size does not improve der #256

fedexman commented Dec 3, 2024 •

edited

Loading

juanmc2005 commented Dec 13, 2024 •

edited

Loading

Using a bigger chunk size does not improve der #256

Using a bigger chunk size does not improve der #256

Comments

fedexman commented Dec 3, 2024 • edited Loading

juanmc2005 commented Dec 13, 2024 • edited Loading

fedexman commented Dec 3, 2024 •

edited

Loading

juanmc2005 commented Dec 13, 2024 •

edited

Loading