Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper assistant decoding not working with pipeline #30611

Closed
4 tasks
kamilakesbi opened this issue May 2, 2024 · 0 comments
Closed
4 tasks

Whisper assistant decoding not working with pipeline #30611

kamilakesbi opened this issue May 2, 2024 · 0 comments
Labels

Comments

@kamilakesbi
Copy link
Contributor

kamilakesbi commented May 2, 2024

System Info

  • transformers version: 4.41.0.dev0
  • Platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • Huggingface_hub version: 0.22.2
  • Safetensors version: 0.4.2
  • Accelerate version: 0.29.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.2+cu121 (True)
  • Tensorflow version (GPU?): 2.13.1 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
  • Jax version: 0.4.13
  • JaxLib version: 0.4.13
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@sanchit-gandhi

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Following issues #29869 and #30407, I've tried to reproduce the errors mentioned and identified two problems:

  • First, we get errors if the main and assistant models don't share the same encoder (for example with whisper-large-v2 and whisper-tiny) and we only load the decoder part of the assistant with AutoModelForCausalLM.

In this case we could throw an error and suggest to the user to use AutoModelForSpeechSeq2Seq instead to load both encoders and decoders.

  • Second: Only the pipeline seems broken when using different sized whisper models:

Here's a code for reproducing the error:

import numpy as np
from datasets import load_dataset
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, AutomaticSpeechRecognitionPipeline, AutoModelForCausalLM

# load data to test
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation[:1]")
sample = dataset[0]

# load base model
model_id = "openai/whisper-large-v2"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    use_safetensors=True,
)

# load tiny version of model from same origin (openai)
assistant_tiny_model_id = "openai/whisper-tiny"
assistant_direct_tiny_model = AutoModelForSpeechSeq2Seq.from_pretrained(
    assistant_tiny_model_id,
    low_cpu_mem_usage=True,
    use_safetensors=True,
)

inputs = processor(sample["audio"]["array"], sampling_rate=sample["audio"]["sampling_rate"], return_tensors="pt")
output = model.generate(**inputs, assistant_model=assistant_direct_tiny_model, language="en")
print(processor.batch_decode(output, skip_special_tokens=True, normalize=True)[0])

# load pipeline for base model
pipe = AutomaticSpeechRecognitionPipeline(
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    generate_kwargs={"language":"en"},
)

inputs = {
    "sampling_rate": sample["audio"]["sampling_rate"],
    "raw": np.array(sample["audio"]["array"]),
}

output = pipe(inputs=inputs, generate_kwargs={"assistant_model":assistant_direct_tiny_model})["text"]
print(processor.tokenizer.normalize(output))

It will work with model.generate, but not anymore when using the pipeline. I think the problem comes from the way the inputs are provided to the generate method when using the pipeline. I'll open a PR to fix this.

Expected behavior

ValueError: Whisper expects the mel input features to be of length 3000, but found 1500. Make sure to pad the input mel features to 3000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant