Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError thrown when using batch transcribe function. #5

Open
winterbulletvoyage opened this issue Jan 18, 2023 · 6 comments
Open

Comments

@winterbulletvoyage
Copy link

Providing list of 4 English audio files (each about 3 hours 45 minutes) to batch transcribe. Consistent error being thrown below with multiple different files.

Traceback (most recent call last):
File "C:\envs\iw-analytics\parse_audio.py", line 129, in
segments_df, transcript_df = transcribe_audio(model, audio_list)
File "C:\envs\iw-analytics\parse_audio.py", line 63, in transcribe_audio
results = model.transcribe(audio_file_list,
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 75, in transcribe
return batch_transcribe(model=model,
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 474, in batch_transcribe
results: List[DecodingResult] = decode_with_fallback(torch.stack(batch_segments))
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\transcribe.py", line 382, in decode_with_fallback
decode_result = model.decode(segment, options)
File "C:\envs\iw-analytics\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 860, in decode
result = DecodingTask(model, options).run(mel)
File "C:\envs\iw-analytics\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 772, in run
tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
File "C:\envs\iw-analytics\venv\lib\site-packages\whisper\decoding.py", line 692, in _main_loop
probs_at_sot.append(logits[:, self.sot_index[i]].float().softmax(dim=-1))
IndexError: index 224 is out of bounds for dimension 1 with size 3

@Blair-Johnson
Copy link
Owner

Could you share the script you're using that produces the error?

@tz-rrze
Copy link

tz-rrze commented Jan 26, 2023

I see the same IndexError from time to time. Usually, I'm processing 4 files as one batch. If an error occurs, that's reproducible for that set of files.
If I only process one file at a time, everything is fine - still using the same batch-whisper version which was checked out from Github on 2023-01-17.
Switching from medium to large-v2 usually also runs without error for the very same 4 files as one batch.
Thus, it's a combination of input files and models. Unfortunately, I did not find a set of files yet which I can easily share.

My script to reproduce the error for certain file sets is trivial; input is either German or English and will be auto-detected.

#!/apps/whisper/envs/20230117/bin/python3.9
import sys
import batchwhisper
import batchwhisper.utils

MODEL = "medium"
all_files = sys.argv[1:]

model = batchwhisper.load_model(MODEL)
results= model.transcribe(all_files)

for r in results:
       print("----------")
       print(r['text'])

@Blair-Johnson
Copy link
Owner

Blair-Johnson commented Jan 26, 2023

This appears to be the same issue as #9. There's currently a bug in the way that temperature fallback logic is handled for batched cases, which we think is the cause of this issue. It would make sense that switching to a larger model reduces the frequency of the issue, because the larger models produce better / less repetitive predictions for borderline/difficult transcriptions. It also explains why the specific files matter.

@Martok88
Copy link

Martok88 commented Apr 6, 2023

I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.

@Blair-Johnson
Copy link
Owner

I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.

Fixing the temperature fallback process is on the road-map, but I won't have a fix for another month or two. If you really need this tool in the mean time, I would suggest using the OpenAI Whisper API. It costs ~$1.80 for 50 hours of audio.

@Martok88
Copy link

Martok88 commented Apr 7, 2023

I'm attempting to transcribe a very large number of files, and have been encountering this issue as well. I've been trying to work around it by changing the batch size, but the error still happens often, making the whole process rather frustrating, especially when running multiple instances.

Fixing the temperature fallback process is on the road-map, but I won't have a fix for another month or two. If you really need this tool in the mean time, I would suggest using the OpenAI Whisper API. It costs ~$1.80 for 50 hours of audio.

I considered that, but most of the files I'm working with are larger than 25 MB, and I'm not aware of a way to split them automatically without splitting sentences into multiple files, which would degrade the transcription quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants