Prompt_ids feature causing repetitions and hallucinations #35603

vchagari · 2025-01-10T07:28:16Z

System Info

System Info
Hi @sanchit-gandhi and @gante

Using Prompt Feature like it is mentioned here (#22395) causing the model output to have too many repetitions and too much of hallucinations.

I recorded an audio and gave it to the Whisper ASR model with prompt like as mentioned below.

More details:
Transformers Commit: 1c7e5e2

Test-Case: Steps how to reproduce the issue.
Audio contents: "The full name of Donald is Donald J. Trump Jr"
prompt = "Donald Duck"

model = WhisperForConditionalGeneration.from_pretrained(model_dir).to("cuda")
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
prompt_ids = processor.get_prompt_ids(prompt)
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to("cuda"), prompt_ids=prompt_ids, num_beams=4)
text = [processor.decode(predicted_id, skip_special_tokens=True) for predicted_id in predicted_ids]
transcript = text[0]

Output: The full name of Donald is Donald J. Trump Jr. Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck

Link to the audio: https://drive.google.com/file/d/1ud-B0uepD8Sk6ArkvJdqPmFWYpCmAooi/view?usp=drive_link

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Reproduction
Test-Case: Steps how to reproduce the issue.
Audio contents: "The full name of Donald is Donald J. Trump Jr"
prompt = "Donald Duck"

model = WhisperForConditionalGeneration.from_pretrained(model_dir).to("cuda")
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
prompt_ids = processor.get_prompt_ids(prompt)
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to("cuda"), prompt_ids=prompt_ids, num_beams=4)
text = [processor.decode(predicted_id, skip_special_tokens=True) for predicted_id in predicted_ids]
transcript = text[0]

Output: The full name of Donald is Donald J. Trump Jr. Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck Donald Duck Donald Duck Donald Duck Donald Duck Donal Donald Duck

Expected behavior

Expected behavior
It has to give either "The full name of Donald is Donald J. Trump" or "The full name of Donald is Donald Duck", not infinite no of prompt key words.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-01-10T14:59:36Z

cc @eustlb

eustlb · 2025-01-10T17:28:12Z

Hey @vchagari 🤗

Thanks for providing the audio!

I'm unsure why you're reporting an issue for a commit dating from 2023. Using last version of transformers:

pip install -U transformers

it returns The full name of Donald is Donald J. Trump, as expected.
Moreover, you're reproducer is incomplete, a correct one would be:

from transformers import WhisperForConditionalGeneration, WhisperFeatureExtractor, WhisperProcessor
import librosa

model_dir = "openai/whisper-large-v3"
prompt = "Donald Duck"

audio, sr = librosa.load("prompt_ids_test.wav", sr=16000)

model = WhisperForConditionalGeneration.from_pretrained(model_dir).to("cuda")
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
prompt_ids = processor.get_prompt_ids(prompt, return_tensors="pt").to("cuda")
input_features = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features.to("cuda"), prompt_ids=prompt_ids)
text = [processor.decode(predicted_id, skip_special_tokens=True) for predicted_id in predicted_ids]
transcript = text[0]

vchagari · 2025-01-10T22:40:02Z

@eustlb: Thanks for your response. That's the commit id i used to develop my application. Upgrading to the latest transformers, causing compatibility issues on my end, Hence i froze my transformers version.

vchagari · 2025-01-27T19:15:20Z

I am in process of upgrading it, will share more info asap, thank you. I do see repetitions and hallucinations though with the newer version (4.43.1) as well when i use the prompt feature. My use-case is long form transcription.

vchagari added the bug label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt_ids feature causing repetitions and hallucinations #35603

Prompt_ids feature causing repetitions and hallucinations #35603

vchagari commented Jan 10, 2025

Rocketknight1 commented Jan 10, 2025

eustlb commented Jan 10, 2025

vchagari commented Jan 10, 2025

vchagari commented Jan 27, 2025 •

edited

Loading

Prompt_ids feature causing repetitions and hallucinations #35603

Prompt_ids feature causing repetitions and hallucinations #35603

Comments

vchagari commented Jan 10, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Jan 10, 2025

eustlb commented Jan 10, 2025

vchagari commented Jan 10, 2025

vchagari commented Jan 27, 2025 • edited Loading

vchagari commented Jan 27, 2025 •

edited

Loading