Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a bigger chunk size does not improve der #256

Open
fedexman opened this issue Dec 3, 2024 · 1 comment
Open

Using a bigger chunk size does not improve der #256

fedexman opened this issue Dec 3, 2024 · 1 comment
Labels
question Further information is requested

Comments

@fedexman
Copy link

fedexman commented Dec 3, 2024

Hello,
I've been using Diart to perform audio diarization (in offline mode), From my understanding diart chunks audios and process them one by one to get real time diarization. And also Diart improves accuracy as it processes more chunks, utilizing previous data to achieve better results. Since Diart uses the same model as PyAnnote, I initially assumed that using larger chunks (e.g., 3 minutes) would provide similar accuracy to the PyAnnote library.
However, in my testing, I found that smaller chunk sizes, specifically 5 seconds, delivered better accuracy compared to 60 or 180 seconds. Here are the results from testing on the AMI test set:

  • 5s chunk: 31.57% DER
  • 180s chunk: 35.68% DER
  • PyAnnote library: 17.11% DER
  • PyAnnote API: 14.29% DER
    Is it expected that smaller chunk sizes would yield better accuracy with Diart? Is there an error in my assumption that larger chunk sizes should improve accuracy? My primary goal is to achieve faster results than those provided by the PyAnnote library, although waiting 30 to 60 seconds for the initial batch is acceptable for my use case.
    Thank you for your help.
@juanmc2005 juanmc2005 added the question Further information is requested label Dec 13, 2024
@juanmc2005
Copy link
Owner

juanmc2005 commented Dec 13, 2024

Hi @fedexman,

Unlike pyannote.audio, diart is built for streaming (i.e. online) diarization. Even though it leverages pyannote models, this does not mean that the two pipelines are comparable. Sacrifices in accuracy need to be made in order to provide a fast-enough diarization in streaming, in particular because of the lack of future context and the low-latency requirements.
It is then expected that offline diarization yield superior performance.

Concerning the chunk size, this is the amount of audio that is sent at once to the model. Larger chunks will make your inference slower, and if the size during inference doesn't match the size used during training you can also get worse results.
If you're using diart for the speed (as opposed to the real-time capabilities), it would make sense to increase the chunk size, as well as the step and the latency. At the end of the day, you should tune these parameters to what best suits your task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants