-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why the sliding windows are so tight in the diariazation process? #261
Comments
With the above doubts, I have tried setting step=4.5 and duration=5.0, e.g., with overlapping of 0.5 seconds. I have not found the SpeakerAwareDiariazation results getting worse. But it will be much more fast. |
Hi @ywangwxd, this kind of sliding window is made for the diarization pipeline, but if I remember correctly, in my blogpost about combining whisper and diart I used non-overlapping 2s windows to do this, so basically the window had to be readjusted down the line. Otherwise you get duplicate captions |
So you mean, there is no need to have any overlapping between two consecutive sliding windows for diariazatio at all? |
Luckily, I have integrated faster whisper successfully into the diart-spk branch. Maybe I will submit a PR later.
But I have a question about the sliding windows in diariazation. I used the default step parameter, set 5s as the duration for
diariazation and 5s for ASR respectively. I found the sliding windows features passed into the
__call__
function of theSpeakerAwareTranscription pipeline are very dense. They look like this:
There are too much overlapping between two consecutive windows. Even if I set batch size 32 to the diariazation process, the effective audio length for ASR is only
31*0.5+5=20.5s
. This also makes the diariazation process much less efficient since there are two much redundant computation between two windows. Do I understand the underlying logic correctly? Should I assign a large value to the step parameter?The text was updated successfully, but these errors were encountered: