You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I've been using Diart to perform audio diarization (in offline mode), From my understanding diart chunks audios and process them one by one to get real time diarization. And also Diart improves accuracy as it processes more chunks, utilizing previous data to achieve better results. Since Diart uses the same model as PyAnnote, I initially assumed that using larger chunks (e.g., 3 minutes) would provide similar accuracy to the PyAnnote library.
However, in my testing, I found that smaller chunk sizes, specifically 5 seconds, delivered better accuracy compared to 60 or 180 seconds. Here are the results from testing on the AMI test set:
5s chunk: 31.57% DER
180s chunk: 35.68% DER
PyAnnote library: 17.11% DER
PyAnnote API: 14.29% DER
Is it expected that smaller chunk sizes would yield better accuracy with Diart? Is there an error in my assumption that larger chunk sizes should improve accuracy? My primary goal is to achieve faster results than those provided by the PyAnnote library, although waiting 30 to 60 seconds for the initial batch is acceptable for my use case.
Thank you for your help.
The text was updated successfully, but these errors were encountered:
Unlike pyannote.audio, diart is built for streaming (i.e. online) diarization. Even though it leverages pyannote models, this does not mean that the two pipelines are comparable. Sacrifices in accuracy need to be made in order to provide a fast-enough diarization in streaming, in particular because of the lack of future context and the low-latency requirements.
It is then expected that offline diarization yield superior performance.
Concerning the chunk size, this is the amount of audio that is sent at once to the model. Larger chunks will make your inference slower, and if the size during inference doesn't match the size used during training you can also get worse results.
If you're using diart for the speed (as opposed to the real-time capabilities), it would make sense to increase the chunk size, as well as the step and the latency. At the end of the day, you should tune these parameters to what best suits your task.
Hello,
I've been using Diart to perform audio diarization (in offline mode), From my understanding diart chunks audios and process them one by one to get real time diarization. And also Diart improves accuracy as it processes more chunks, utilizing previous data to achieve better results. Since Diart uses the same model as PyAnnote, I initially assumed that using larger chunks (e.g., 3 minutes) would provide similar accuracy to the PyAnnote library.
However, in my testing, I found that smaller chunk sizes, specifically 5 seconds, delivered better accuracy compared to 60 or 180 seconds. Here are the results from testing on the AMI test set:
Is it expected that smaller chunk sizes would yield better accuracy with Diart? Is there an error in my assumption that larger chunk sizes should improve accuracy? My primary goal is to achieve faster results than those provided by the PyAnnote library, although waiting 30 to 60 seconds for the initial batch is acceptable for my use case.
Thank you for your help.
The text was updated successfully, but these errors were encountered: