Whisper Prompting #22395

sanchit-gandhi · 2023-03-27T10:24:32Z

Feature request

Add prompting for the Whisper model to control the style/formatting of the generated text.

Motivation

During training, Whisper can be fed a "previous context window" to condition on longer passages of text.

The original OpenAI Whisper implementation provides the user with the option of passing an initial_prompt to the model. This prompt is replaces the "previous context window" during inference.

By passing the prompt as the "previous context window", the Whisper model conditions its generation on whatever text is passed as the prompt. This allows the user to control aspects of the generation, such as spellings of named entities and punctuation formatting (see openai/whisper#963 (comment)).

This is possibly a cheaper way of adapting the Whisper model to specific decoding constraints than fine-tuning.

This notebook demonstrates prompting with the initial codebase, and explains how this can be achieved for HF's Whisper: https://colab.research.google.com/drive/14FSeaoRvgs5arOTfiMQBnQ5NaLyma7Tq?usp=sharing

The proposed API for prompting would look something as follows:

Encode prompt text to prompt token ids (processor.get_prompt_ids) - this method is a wrapper around processor.tokenizer.__call__ that doesn't add the special token ids:

prompt = "IR, Newswire"
prompt_ids = processor.get_prompt_ids(prompt)

Pass the input audio and prompt token ids to the .generate method to get the predicted ids:

pred_ids = model.generate(input_features, prompt_ids=prompt_ids)

Decode the predicted ids and 'slice' off the prompt (we can do this by passing the prompt_ids):

pred_str = processor.batch_decode(pred_ids, prompt_ids=prompt_ids)

=> We would need to wrap all of this forced_decoder_ids logic into the generate method and update the processor/tokenizer accordingly.

Your contribution

Happy to guide the integration and review any PRs!

The text was updated successfully, but these errors were encountered:

sanchit-gandhi · 2023-03-27T10:25:06Z

cc @hollance

pmollerus23 · 2023-03-29T15:06:24Z

Hello, I'd like to pick up this issue!

sanchit-gandhi · 2023-03-30T17:22:30Z

Hey @mollerup23! Super cool! We would first need to update the generate modelling code to slide the forced decoder ids as explained in the notebook:

transformers/src/transformers/models/whisper/modeling_whisper.py

Line 1453 in d5de578

def generate(

And then add a new method in the tokenizer to ignore the prompt ids. Does this sound good to you?

connor-henderson · 2023-03-30T19:16:57Z

Hey @mollerup23 @sanchit-gandhi. Apologies, I'm not sure how picking these up works, I started working on it cause I saw there was no assignee and now have something I think is ready for review. Should I just keep it locally or push it up?

Totally fine with whatever, @mollerup23 commented first.

pmollerus23 · 2023-03-31T15:08:25Z

@connor-henderson @sanchit-gandhi I have not yet started on this issue, feel free to push your commits and pick it up!

pmollerus23 · 2023-03-31T15:08:50Z

I will continue to look into what @sanchit-gandhi mentioned in the meantime.

connor-henderson · 2023-03-31T15:33:17Z

Sounds good, thanks

sanchit-gandhi · 2023-05-23T11:32:17Z

Closed via #22496

romitjain · 2023-10-22T10:18:58Z

Hi @sanchit-gandhi and @connor-henderson
I saw the PR, but I was wondering if we also integrated always_use_initial_prompt and condition_on_previous_text to the API? If no, is there any active work going towards it?
Thanks

sanchit-gandhi · 2023-11-22T17:57:41Z

Hey @romitjain - we're working on integrating the OpenAI Whisper algorithm into Transformers, which will provide more support for these fine-grained decoding parameters! c.f. #27492

M-Ali-ML · 2023-11-26T17:13:41Z

Hey @romitjain - we're working on integrating the OpenAI Whisper algorithm into Transformers, which will provide more support for these fine-grained decoding parameters! c.f. #27492

are contribution allowed here? I'd like to help on that.

connor-henderson mentioned this issue Mar 31, 2023

feat: Whisper prompting #22496

Merged

5 tasks

github-actions bot closed this as completed May 4, 2023

huggingface deleted a comment from github-actions bot May 23, 2023

sanchit-gandhi reopened this May 23, 2023

sanchit-gandhi closed this as completed May 23, 2023

Lauler mentioned this issue Jul 12, 2023

Finetuning Whisper with prompts #24272

Open

cfasana mentioned this issue Nov 24, 2023

Whisper prompting support microsoft/Olive#753

Closed

amyeroberts mentioned this issue Jan 23, 2024

Fine tuning whisper and whisper lora with prompts #28549

Closed

vchagari mentioned this issue Feb 1, 2024

Prompt feature causing repetitions and hallucinations #28822

Closed

4 tasks

cfasana mentioned this issue Mar 12, 2024

[Feature Request] Whisper Prompting feature quic/ai-hub-models#24

Open

vchagari mentioned this issue Jan 10, 2025

Prompt_ids feature causing repetitions and hallucinations #35603

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper Prompting #22395

Whisper Prompting #22395

sanchit-gandhi commented Mar 27, 2023 •

edited

Loading

sanchit-gandhi commented Mar 27, 2023

pmollerus23 commented Mar 29, 2023

sanchit-gandhi commented Mar 30, 2023

connor-henderson commented Mar 30, 2023

pmollerus23 commented Mar 31, 2023

pmollerus23 commented Mar 31, 2023

connor-henderson commented Mar 31, 2023

sanchit-gandhi commented May 23, 2023

romitjain commented Oct 22, 2023

sanchit-gandhi commented Nov 22, 2023

M-Ali-ML commented Nov 26, 2023

Whisper Prompting #22395

Whisper Prompting #22395

Comments

sanchit-gandhi commented Mar 27, 2023 • edited Loading

Feature request

Motivation

Your contribution

sanchit-gandhi commented Mar 27, 2023

pmollerus23 commented Mar 29, 2023

sanchit-gandhi commented Mar 30, 2023

connor-henderson commented Mar 30, 2023

pmollerus23 commented Mar 31, 2023

pmollerus23 commented Mar 31, 2023

connor-henderson commented Mar 31, 2023

sanchit-gandhi commented May 23, 2023

romitjain commented Oct 22, 2023

sanchit-gandhi commented Nov 22, 2023

M-Ali-ML commented Nov 26, 2023

sanchit-gandhi commented Mar 27, 2023 •

edited

Loading