You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Similar as #28977 , the long-form decoding recently added in [Whisper] Add sequential longform decoding seems to have issues in some parameters. There it's the task specification that seems problematic.
It is also linked to another issue : transformers's Whisper implementation seems to force the output language to be English. Tested with French, German, Dutch audios, result is always the same : Whisper translate the audio into English when the task isn't set (and language aswell obviously).
So while trying to bypass this issue of English-only output, I tried, as mentionned in the discussion, to set the task="transcribe" to force the model to transcribe the audio. But when working with long audio and the new implementation of long-form decoding, the issue occured.
Here is a minimal example to reproduce the issue:
fromtransformersimportWhisperForConditionalGeneration, WhisperProcessor, pipelineimportlibrosaSR=16000model=WhisperForConditionalGeneration.from_pretrained("openai/whisper-medium")
processor=WhisperProcessor.from_pretrained("openai/whisper-medium")
file_path="path_to_more_than_30_sec_audio"audio, _=librosa.load(file_path, sr=SR)
# Long-form transcription with model.generate()input_features=processor(audio,
sampling_rate=SR,
return_tensors="pt",
truncation=False, # False so the audio isn't truncated and whole audio is sent to the modelreturn_attention_mask=True,
padding="longest")
predicted_ids=model.generate(**input_features,
task="transcribe") # If you remove this parameter, it works as expected
System Info
transformers
version: 4.37.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Similar as #28977 , the long-form decoding recently added in [Whisper] Add sequential longform decoding seems to have issues in some parameters. There it's the task specification that seems problematic.
It is also linked to another issue : transformers's Whisper implementation seems to force the output language to be English. Tested with French, German, Dutch audios, result is always the same : Whisper translate the audio into English when the task isn't set (and language aswell obviously).
Here is the discussion about the issue : https://huggingface.co/openai/whisper-large-v3/discussions/71
So while trying to bypass this issue of English-only output, I tried, as mentionned in the discussion, to set the
task="transcribe"
to force the model to transcribe the audio. But when working with long audio and the new implementation of long-form decoding, the issue occured.Here is a minimal example to reproduce the issue:
Traceback
Expected behavior
Model should be able to work with the task parameter when processing long audio after #27492
The text was updated successfully, but these errors were encountered: