-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt_ids feature causing repetitions and hallucinations #35603
Comments
cc @eustlb |
Hey @vchagari 🤗 Thanks for providing the audio! I'm unsure why you're reporting an issue for a commit dating from 2023. Using last version of transformers: pip install -U transformers it returns from transformers import WhisperForConditionalGeneration, WhisperFeatureExtractor, WhisperProcessor
import librosa
model_dir = "openai/whisper-large-v3"
prompt = "Donald Duck"
audio, sr = librosa.load("prompt_ids_test.wav", sr=16000)
model = WhisperForConditionalGeneration.from_pretrained(model_dir).to("cuda")
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
prompt_ids = processor.get_prompt_ids(prompt, return_tensors="pt").to("cuda")
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to("cuda"), prompt_ids=prompt_ids)
text = [processor.decode(predicted_id, skip_special_tokens=True) for predicted_id in predicted_ids]
transcript = text[0] |
@eustlb: Thanks for your response. That's the commit id i used to develop my application. Upgrading to the latest transformers, causing compatibility issues on my end, Hence i froze my transformers version. |
I am in process of upgrading it, will share more info asap, thank you. I do see repetitions and hallucinations though with the newer version (4.43.1) as well when i use the prompt feature. My use-case is long form transcription. |
System Info
System Info
Hi @sanchit-gandhi and @gante
Using Prompt Feature like it is mentioned here (#22395) causing the model output to have too many repetitions and too much of hallucinations.
I recorded an audio and gave it to the Whisper ASR model with prompt like as mentioned below.
More details:
Transformers Commit: 1c7e5e2
Test-Case: Steps how to reproduce the issue.
Audio contents: "The full name of Donald is Donald J. Trump Jr"
prompt = "Donald Duck"
model = WhisperForConditionalGeneration.from_pretrained(model_dir).to("cuda")
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
prompt_ids = processor.get_prompt_ids(prompt)
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to("cuda"), prompt_ids=prompt_ids, num_beams=4)
text = [processor.decode(predicted_id, skip_special_tokens=True) for predicted_id in predicted_ids]
transcript = text[0]
Output: The full name of Donald is Donald J. Trump Jr. Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck
Link to the audio: https://drive.google.com/file/d/1ud-B0uepD8Sk6ArkvJdqPmFWYpCmAooi/view?usp=drive_link
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Reproduction
Test-Case: Steps how to reproduce the issue.
Audio contents: "The full name of Donald is Donald J. Trump Jr"
prompt = "Donald Duck"
model = WhisperForConditionalGeneration.from_pretrained(model_dir).to("cuda")
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
prompt_ids = processor.get_prompt_ids(prompt)
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to("cuda"), prompt_ids=prompt_ids, num_beams=4)
text = [processor.decode(predicted_id, skip_special_tokens=True) for predicted_id in predicted_ids]
transcript = text[0]
Output: The full name of Donald is Donald J. Trump Jr. Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck
Expected behavior
Expected behavior
It has to give either "The full name of Donald is Donald J. Trump" or "The full name of Donald is Donald Duck", not infinite no of prompt key words.
The text was updated successfully, but these errors were encountered: