Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper pipeline raises error when using return_timestamps (ValueError: The following model_kwargs are not used by the model: ['return_timestamps']) #905

Closed
2 of 4 tasks
xenova opened this issue Mar 21, 2023 · 1 comment · Fixed by #919
Labels
bug Something isn't working

Comments

@xenova
Copy link
Contributor

xenova commented Mar 21, 2023

System Info

optimum: 1.7.1
Python: 3.8.3
transformers: 4.27.2
platform: Windows 10

Who can help?

@philschmid

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

This is a working example using the transformers pipeline function:

from transformers import pipeline

transcriber = pipeline('automatic-speech-recognition', 'openai/whisper-tiny.en')

text = transcriber(
    'https://xenova.github.io/transformers.js/assets/audio/ted_60.wav',

    return_timestamps=True,
    chunk_length_s=30,
    stride_length_s=5
)

print(f'{text=}')
# outputs correctly

After converting to onnx using this command:

python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/

and running the equivalent code:

import onnxruntime
from transformers import pipeline, AutoProcessor
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq

whisper_model_name = './whisper_onnx/'
processor = AutoProcessor.from_pretrained(whisper_model_name)
session_options = onnxruntime.SessionOptions()

model_ort = ORTModelForSpeechSeq2Seq.from_pretrained(
    whisper_model_name,
    use_io_binding=True,
    session_options=session_options
)
generator_ort = pipeline(
    task="automatic-speech-recognition",
    model=model_ort,
    feature_extractor=processor.feature_extractor,
    tokenizer=processor.tokenizer,
)

out = generator_ort(
    'https://xenova.github.io/transformers.js/assets/audio/ted_60.wav',

    return_timestamps=True,
    chunk_length_s=30,
    stride_length_s=5
)

print(f'{out=}')

I get the error:

ValueError: The following `model_kwargs` are not used by the model: ['return_timestamps'] (note: typos in the generate arguments will also show up in this     
list)

Expected behavior

The code which uses the ONNX model should work the same as the transformers version (and not throw an error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants