Whisper assistant decoding not working with pipeline #30611

kamilakesbi · 2024-05-02T10:21:22Z

System Info

transformers version: 4.41.0.dev0
Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.29
Python version: 3.8.10
Huggingface_hub version: 0.22.2
Safetensors version: 0.4.2
Accelerate version: 0.29.1
Accelerate config: not found
PyTorch version (GPU?): 2.2.2+cu121 (True)
Tensorflow version (GPU?): 2.13.1 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
Jax version: 0.4.13
JaxLib version: 0.4.13
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Following issues #29869 and #30407, I've tried to reproduce the errors mentioned and identified two problems:

First, we get errors if the main and assistant models don't share the same encoder (for example with whisper-large-v2 and whisper-tiny) and we only load the decoder part of the assistant with AutoModelForCausalLM.

In this case we could throw an error and suggest to the user to use AutoModelForSpeechSeq2Seq instead to load both encoders and decoders.

Second: Only the pipeline seems broken when using different sized whisper models:

Here's a code for reproducing the error:

import numpy as np
from datasets import load_dataset
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, AutomaticSpeechRecognitionPipeline, AutoModelForCausalLM

# load data to test
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation[:1]")
sample = dataset[0]

# load base model
model_id = "openai/whisper-large-v2"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    use_safetensors=True,
)

# load tiny version of model from same origin (openai)
assistant_tiny_model_id = "openai/whisper-tiny"
assistant_direct_tiny_model = AutoModelForSpeechSeq2Seq.from_pretrained(
    assistant_tiny_model_id,
    low_cpu_mem_usage=True,
    use_safetensors=True,
)

inputs = processor(sample["audio"]["array"], sampling_rate=sample["audio"]["sampling_rate"], return_tensors="pt")
output = model.generate(**inputs, assistant_model=assistant_direct_tiny_model, language="en")
print(processor.batch_decode(output, skip_special_tokens=True, normalize=True)[0])

# load pipeline for base model
pipe = AutomaticSpeechRecognitionPipeline(
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    generate_kwargs={"language":"en"},
)

inputs = {
    "sampling_rate": sample["audio"]["sampling_rate"],
    "raw": np.array(sample["audio"]["array"]),
}

output = pipe(inputs=inputs, generate_kwargs={"assistant_model":assistant_direct_tiny_model})["text"]
print(processor.tokenizer.normalize(output))

It will work with model.generate, but not anymore when using the pipeline. I think the problem comes from the way the inputs are provided to the generate method when using the pipeline. I'll open a PR to fix this.

Expected behavior

ValueError: Whisper expects the mel input features to be of length 3000, but found 1500. Make sure to pad the input mel features to 3000.

The text was updated successfully, but these errors were encountered:

kamilakesbi mentioned this issue May 3, 2024

Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size #30637

Merged

gante mentioned this issue May 9, 2024

Whisper: fix asr pipeline with seq2seq assistant model #30726

Closed

kamilakesbi closed this as completed May 23, 2024

kamilakesbi added the Audio label May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper assistant decoding not working with pipeline #30611

Whisper assistant decoding not working with pipeline #30611

kamilakesbi commented May 2, 2024 •

edited

Loading

Whisper assistant decoding not working with pipeline #30611

Whisper assistant decoding not working with pipeline #30611

Comments

kamilakesbi commented May 2, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

kamilakesbi commented May 2, 2024 •

edited

Loading