You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Following [Whisper] Add sequential longform decoding, it seems that there is an issue when asking for token timestamps when dealing with the new way of handling long-form transcriptions.
If using model.generate() method, passing return_token_timestamps=True causes the issue. Occurs also with the pipeline object if setting return_timestamps="word".
Here is a simple example to reproduce the issue:
fromtransformersimportWhisperForConditionalGeneration, WhisperProcessor, pipelineimportlibrosaSR=16000model=WhisperForConditionalGeneration.from_pretrained("openai/whisper-medium")
processor=WhisperProcessor.from_pretrained("openai/whisper-medium")
file_path="path_to_more_than_30_sec_audio"audio, _=librosa.load(file_path, sr=SR)
# Long-form transcription with model.generate()input_features=processor(audio,
sampling_rate=SR,
return_tensors="pt",
truncation=False, # False so the audio isn't truncated and whole audio is sent to the modelreturn_attention_mask=True,
padding="longest")
predicted_ids=model.generate(**input_features,
return_token_timestamps=True)
# With pipelinepipe=pipeline("automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
return_timestamps="word",
return_language=True
)
pipe(audio)
System Info
transformers
version: 4.37.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Following [Whisper] Add sequential longform decoding, it seems that there is an issue when asking for token timestamps when dealing with the new way of handling long-form transcriptions.
If using
model.generate()
method, passingreturn_token_timestamps=True
causes the issue. Occurs also with the pipeline object if settingreturn_timestamps="word"
.Here is a simple example to reproduce the issue:
Traceback:
Works fine if you don't ask the timestamps per token.
Expected behavior
Model should be able to return the timestamps per token when working with long audio after #27492
The text was updated successfully, but these errors were encountered: