Replies: 17 comments 21 replies
-
Great initiative @ryanheise Here is one to start - I have been looking at creating short synthetic audio clips that induce hallucination and have no private or copyrighted content. With
|
Beta Was this translation helpful? Give feedback.
-
I frequently get these repeats like this, requiring the need to re-do them. |
Beta Was this translation helpful? Give feedback.
-
@ryanheise , do I understand right that we can use those audio segments that produce hallucination lines to re-train (finetune) Whisper model? Is that what you meant? [duplicate warning]: I had just posted a similar note in another thread before I saw this conversation. |
Beta Was this translation helpful? Give feedback.
-
I like this one, it has given different hallucinations each run. And I had never heard of a Cyclophone before but apparently it's a real thing. :)
Run 1 Run 2 Run 3 large-v3 consistently seems to produce the same (less interesting) hallucination on this audio |
Beta Was this translation helpful? Give feedback.
-
Why does the Whisper return random text (hallucinations) for an audio that does not contain any speech? I tested it with random traffic and one minute of silence. The output contains few words. But as per my thinking, it should not return anything in case there is no speech in the audio. How can I handle this ? Output: Output: |
Beta Was this translation helpful? Give feedback.
-
So I guess no one here has heard about the --suppress_tokens "" trick? I've posted about it before, maybe it was simply lost in traffic. UPDATE: This method does NOT work with Turbo (and by extension, it probably won't work with large-v3 either). |
Beta Was this translation helpful? Give feedback.
-
Hi @misutoneko , do the tokens need to be grouped/sequenced together when passed to --suppress_tokens? For example below is a hallucination. Should the command be --suppress_tokens "50364, 7, 6847, 96, 7016, 32045, 8, 50464" ? Does this suppress any occurences of these tokens in results, or only if they are sequenced as given?
|
Beta Was this translation helpful? Give feedback.
-
Hi, here is a 3min audio with a combination of music and kitchen noise. The hallucination liness by large-v2:
And, here is the hallucination by large-v3:
And, hallucination by large-v1:
|
Beta Was this translation helpful? Give feedback.
-
FWIW, my shop has processed thousands of hours of English files with a pretty wide range of speakers and lots of silence/music hallucinations. (we stick to large-v2). All our outputs are VTTs which we post-process by manipulating the output with webvtt CaptionLists. I do dozens of other corrections for casing, time, named entities, and line length that are all particular to my org's use cases (currently in a private repo, sorry I can't share the whole thing ATM), but here's a handful of the things that are most common in what we see, YMMV obviously:
We also filter to a limited subset of unicode characters to drop obvious non-latin strings. Rationale: Why would Asiatic characters ever legitimately appear in an English text?
Additionally:
None of these are prefect but they cut down a lot of the extra noise to a more reasonable looking output in many circumstances. |
Beta Was this translation helpful? Give feedback.
-
Now, I want to download the top XF SUPER SUS TOWER |
Beta Was this translation helpful? Give feedback.
-
First time I've seen this hallucination "KING OF THE R" during outro music of segment. Unless of course the outro music is called "KING OF THE R" and it is being "labelled"
What do I mean by "labelled"? When a segment is announced, and then it repeats the name of the segment: "The Wooden Prince by Bela Bartók is played now by Christian Maciolaru." (source https://www.bbc.co.uk/programmes/m001whrr ) Note that after that it completely goes off the rails. Where are the hallucinations of form "StSq2 2.60" and "3F 5.83 x (-1.00), FCSp4 3.20" coming from??
|
Beta Was this translation helpful? Give feedback.
-
Another fun hallucination is "counting" (source: https://www.bbc.co.uk/programmes/m001qklp )
Source: https://www.bbc.co.uk/programmes/m001rws1 also "En-za-mex" == "In the mix" ??
|
Beta Was this translation helpful? Give feedback.
-
https://archive.org/details/movie-1992-musketeers.twenty.years.after Whisper surely loves you. Specially when playing |
Beta Was this translation helpful? Give feedback.
-
Here's one i just got - searched "openai CastingWords" on google and found this thread.
"Transcription by CastingWords" Wtf? |
Beta Was this translation helpful? Give feedback.
-
Hi, I encounter a hallucination problem when using Whisper-large-v3. Please see here. |
Beta Was this translation helpful? Give feedback.
-
Hello, I am working on Whisper-large-v3 to transcribe Chinese audio and I meet the same hallucination problem as #2071. |
Beta Was this translation helpful? Give feedback.
-
So I finally figured out where this class of hallucination comes from -- figure skating: https://www.isuresults.com/results/season2425/jgpcze2024/FSKXPAIRS---JUNIOR----QUAL000100--_JudgesDetailsperSkater.pdf
(source https://www.bbc.co.uk/programmes/m0022ctz ) Suggestion: don't use train your language models on Ice Skating videos, or classical music will trigger transcription of figure skating notations. :-) |
Beta Was this translation helpful? Give feedback.
-
Please share below any links to audio/video files that you have found to induce hallucinations in Whisper.
You may include:
The people working on various solutions to the hallucination problem can use these examples to help evaluate and improve those solutions.
Beta Was this translation helpful? Give feedback.
All reactions