-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError thrown when using batch transcribe function. #5
Comments
Could you share the script you're using that produces the error? |
I see the same IndexError from time to time. Usually, I'm processing 4 files as one batch. If an error occurs, that's reproducible for that set of files. My script to reproduce the error for certain file sets is trivial; input is either German or English and will be auto-detected.
|
This appears to be the same issue as #9. There's currently a bug in the way that temperature fallback logic is handled for batched cases, which we think is the cause of this issue. It would make sense that switching to a larger model reduces the frequency of the issue, because the larger models produce better / less repetitive predictions for borderline/difficult transcriptions. It also explains why the specific files matter. |
I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances. |
Fixing the temperature fallback process is on the road-map, but I won't have a fix for another month or two. If you really need this tool in the mean time, I would suggest using the OpenAI Whisper API. It costs ~$1.80 for 50 hours of audio. |
I considered that, but most of the files I'm working with are larger than 25 MB, and I'm not aware of a way to split them automatically without splitting sentences into multiple files, which would degrade the transcription quality. |
Providing list of 4 English audio files (each about 3 hours 45 minutes) to batch transcribe. Consistent error being thrown below with multiple different files.
Traceback (most recent call last):
File "C:\envs\iw-analytics\parse_audio.py", line 129, in
segments_df, transcript_df = transcribe_audio(model, audio_list)
File "C:\envs\iw-analytics\parse_audio.py", line 63, in transcribe_audio
results = model.transcribe(audio_file_list,
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 75, in transcribe
return batch_transcribe(model=model,
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 474, in batch_transcribe
results: List[DecodingResult] = decode_with_fallback(torch.stack(batch_segments))
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 382, in decode_with_fallback
decode_result = model.decode(segment, options)
File "C:\envs\iw-analytics\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 860, in decode
result = DecodingTask(model, options).run(mel)
File "C:\envs\iw-analytics\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 772, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 692, in _main_loop
probs_at_sot.append(logits[:, self.sot_index[i]].float().softmax(dim=-1))
IndexError: index 224 is out of bounds for dimension 1 with size 3
The text was updated successfully, but these errors were encountered: