-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt_ids vs. decoder_input_ids in Whisper #28228
Comments
Hey @vymao, On the other hand, When it comes to generating text via The implementation difference between |
Thanks. I'm still slightly confused: when you say |
Maybe for @sanchit-gandhi or @ylacombe |
Hey @vymao, I'm not a Whisper expert yet but as I understand and as the documentation suggests, transformers/src/transformers/models/whisper/tokenization_whisper.py Lines 827 to 836 in 3cefac1
As you can see from the code, get_prompt_ids handles the input text so you don't have to worry about special tokens that need to be inserted to tell the model that this text is context and not the start of the transcription. Then the code processes the In other words, you can use I hope that it helps! cc @sanchit-gandhi or @ArthurZucker if you want to correct me or give a more advanced explanation |
Hey @vymao, That's a very good question! In a nutshell, Please use |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Feature request
I am trying to understand the different between adding prior text to
prompt_ids
vs.decoder_input_ids
when generating text via Whisper. The documentation is not very clear on how these differ implementation-wise; AFAIK, it seems like usingprompt_ids
will lead toforced_input_ids
being modified here. But I'm not sure how exactly usingdecoder_input_ids
differs from this.Motivation
To add context to the whisper transcription. For example, if the model previously transcribed
I have a
in a streaming fashion, I would like to add this as "context" into the model to help it predict the next word. I believe the actual OpenAI Whisper implementation has a feature called "prefix" that does this.Your contribution
Will try.
The text was updated successfully, but these errors were encountered: