-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shape mismatch in certain batches #9
Comments
Interesting, I'm happy to help you with debugging this. Do the clips transcribe properly on the official whisper implementation? |
It does, yes. Interestingly, not only does it not generate the IndexError, it also does a better job with the transcription itself. Perhaps related to the temperature cascading discussed in the other issue? I'm not sure. Without changing anything except adding print statements to diagnose this issue, on maybe the 10th run it did actually pass the step it had previously failed (no IndexError) and put in a bunch of garbage in that segment's transcription. I suppose nothing guaranteed the outputs here are deterministic, but that was surprising to me. In trying to answer this I also found that the fix for no_speech_prob returning an array of all of the probabilities breaks running whisper against a single audio file (when it bypasses all of the batch code). Edit to clarify on the non-deterministic behavior: - that was probably related to the other files in the batch potentially changing. I'm batching by files size and there were quite a few files with the exact same size so that likely accounts for the differences between runs rather than the model itself being responsible. If so, then it's pretty clear the temperature linking can have a negative effect. Batching certainly has some effect on outcomes because the file is fully and properly transcribed when run by itself. |
I missed this too |
I've tried for a while here to figure out what is causing this without much success. Batch processing will run for a variety of files but I've come to a group here that throws an IndexError on the 2nd segments of the batch:
In a normal loop self.sot_index is the same at all indicies:
[8, 8, 8, 8, 8, 8]
or[11, 11, 11, 11, 11, 11]
In the batch and segment number that fails it looks like this:
I'm not tracking how this is happening. I'm not providing any different languages or an initial prompt, so I'm not understanding the mismatch with sot_index here.
I do see that it hasn't properly transcribed portions of that file from the first segment in the output. I don't see where it would be hanging onto that to cause this problem, but something is broken.
Sorry I'm not of more help on this. I'll keep digging.
The text was updated successfully, but these errors were encountered: