Share your hallucinations here #1873

ryanheise · 2023-12-05T09:41:05Z

ryanheise
Dec 5, 2023

Please share below any links to audio/video files that you have found to induce hallucinations in Whisper.

You may include:

The audio/video file itself
Timestamps where hallucinations occur (unless they occur everywhere)

The people working on various solutions to the hallucination problem can use these examples to help evaluate and improve those solutions.

glangford · 2023-12-05T13:46:12Z

glangford
Dec 5, 2023

Great initiative @ryanheise

Here is one to start - I have been looking at creating short synthetic audio clips that induce hallucination and have no private or copyrighted content.

With --model large-v3 --language en this produces

[00:00.000 --> 00:02.000]  Welcome to a new video.
[00:02.000 --> 00:05.000]  This time, I'm going to show you how to make a
[00:05.000 --> 00:08.000]  very simple, easy-to-use, and easy-to-use
[00:08.000 --> 00:11.000]  water bottle.
[00:11.000 --> 00:14.000]  First, I'm going to make a water bottle.
[00:14.000 --> 00:17.000]  First, I'm going to make a water bottle.
[00:17.000 --> 00:20.000]  First, I'm going to make a water bottle.
[00:20.000 --> 00:23.000]  First, I'm going to make a water bottle.
[00:23.000 --> 00:26.000]  First, I'm going to make a water bottle.
[00:26.000 --> 00:29.000]  I'm going to make a water bottle.
[00:29.000 --> 00:32.000]  I'm going to make a water bottle.
[00:32.000 --> 00:35.000]  I'm going to make a water bottle.
[00:35.000 --> 00:38.000]  Thank you.

whisper-synthetic-5.mp3.zip

0 replies

DrLucky · 2023-12-08T00:20:06Z

DrLucky
Dec 8, 2023

I frequently get these repeats like this, requiring the need to re-do them.
Would doing this with CUDA on a GPU help?

2 replies

ryanheise Dec 8, 2023
Author

The purpose of this thread is merely to share examples of hallucinations so that people working on solutions to that problem can test their solutions on your examples. Do you have any examples of hallucinatory audio data to share?

DrLucky Dec 8, 2023

Oh sorry, my apologies.
I do, but they aren't repeatable. My solution for them has been to just re-run whisper against the audio again and it fixes it the second time.

dgoryeo · 2023-12-08T01:09:45Z

dgoryeo
Dec 8, 2023

@ryanheise , do I understand right that we can use those audio segments that produce hallucination lines to re-train (finetune) Whisper model? Is that what you meant?

[duplicate warning]: I had just posted a similar note in another thread before I saw this conversation.

1 reply

ryanheise Dec 8, 2023
Author

There are a variety of ways people are already trying to solve the hallucination problem, but in order to measure how effective those solutions are, or to compare one solution with another, we would need data to test it on. So the purpose of this thread is just to collect data. Do you have any hallucinatory audio samples you could share?

glangford · 2023-12-08T13:32:56Z

glangford
Dec 8, 2023

I like this one, it has given different hallucinations each run. And I had never heard of a Cyclophone before but apparently it's a real thing. :)

--model large-v2 --language en

Run 1
[00:00.000 --> 00:02.960] CYCLOPHONE CRACKLING
[00:08.760 --> 00:11.700] CYCLOPHONE CORRUPTED
[00:26.100 --> 00:29.580] APPLAUSE

Run 2
[00:00.000 --> 00:05.000] The End
[00:05.000 --> 00:10.000] The End
[00:10.000 --> 00:15.000] The End
[00:15.000 --> 00:20.000] The End
[00:20.000 --> 00:25.000] The End
[00:25.000 --> 00:26.000] Thank you.
[00:26.000 --> 00:27.000] Thank you.
[00:27.000 --> 00:27.020] Thank you.

Run 3
[00:00.000 --> 00:06.400] SFX
[00:06.400 --> 00:11.800] SFX
[00:11.800 --> 00:16.400] SFX
[00:16.400 --> 00:20.480] зовут
[00:20.480 --> 00:23.460] Howard
[00:23.460 --> 00:29.700] Howard
[00:29.700 --> 00:31.760] you

large-v3 consistently seems to produce the same (less interesting) hallucination on this audio
[00:00.000 --> 00:29.680] Transcription by CastingWords
whisper-synthetic-5-2.mp3.zip

5 replies

glangford Dec 9, 2023

@ryanheise This file also triggers hallucination in other languages, and the transcript also seems to be different each time. So this gives the possibility of a single audio file giving a much bigger set of hallucination data to evaluate.

Can I assist here by generating multiple whisper outputs from this one file (ie. multiple runs across multiple languages)? Is .json output helpful here or do you need to run something custom?

ryanheise Dec 11, 2023
Author

That's a good question since my top post was a little brief (I've edited it) - the main things are 1) the audio file and 2) which timestamps specifically to look out for hallucination, because if someone is working on a solution, they will also be running the inference process many times themselves and will be able to see the full output, perhaps with their own additional debug output. Those would be the main things, but obviously anything else you might share in addition to that can possibly be interesting and valuable too. For example, when I ran whisper on your file, it seems I got very different hallucinations to yours. I've only run it twice, but in both cases, I got "Thanks for watching! Applause" and "Thank you for watching! Applause". I'm not sure if that's maybe because you were running some other options in addition to --model large-v2 --language en.

glangford Dec 11, 2023

Ok, well the .json will record the timestamps, the no_speech probabilities, etc so hopefully that works.

But more importantly now you have me wondering now about the reproducibility of different hallucinations each run since you got different results. For initial testing I ran with just --model large-v2 --language en on CPU (which also implies --fp16 False), whisper 20231117, Python 3.11.6 and in case it matters

torch                    2.1.1
torchaudio               2.1.1
torchvision              0.14.1

I ran this again just now and got 3 different hallucinations in 3 runs.

I also confirmed that I get different hallucinations on Colab, using large-v2 on CUDA and lang='es' as shown below. My thought is to use Colab to generate a large number of .json hallucinations, if you think that will be helpful.

result = model.transcribe(vfile, language=lang, fp16=False, beam_size=5, task='transcribe',
                              best_of=5, word_timestamps=True, verbose=True)

glangford Dec 11, 2023

...and after retesting on CUDA with language en, I got this

./whisper-synthetic-5-2.mp3
1
[00:00.000 --> 00:29.740]  A sound effect.
[00:29.720 --> 00:29.740]  you
--------------------
2
[00:00.000 --> 00:03.320]  Fast forward.
[00:08.140 --> 00:10.940]  Fast forward.
[00:10.940 --> 00:11.260]  Fast forward.
[00:14.980 --> 00:15.180]  Fast forward.
[00:15.880 --> 00:16.520]  Music.
[00:16.520 --> 00:16.980]  Music.
[00:26.520 --> 00:26.600]  Applause.
--------------------
3
[00:02.100 --> 00:02.200]  What she will do.
[00:29.700 --> 00:29.740]  If she wants to save her children.
[00:29.720 --> 00:29.740]  you
--------------------
4
[00:00.000 --> 00:13.780]  STAGES OF THE NEW WORLD
[00:13.780 --> 00:17.780]  ANOTHER CENTURY OF THE PARTY
[00:17.780 --> 00:29.740]  THE BRICK METROGRAPH
--------------------
5
[00:00.000 --> 00:02.620]  Number Three
[00:05.280 --> 00:05.460]  Two
[00:05.460 --> 00:11.720]  Number Four
[00:11.720 --> 00:12.780]  Number Five
--------------------
6
[00:00.000 --> 00:01.440]  Buzzing
[00:01.440 --> 00:13.100]  Music
[00:26.100 --> 00:26.480]  Clapping
--------------------
7
[00:00.000 --> 00:01.500]  A.D.T.
[00:01.500 --> 00:01.500]  A.D.T.
[00:02.240 --> 00:03.400]  A.D.T.
[00:04.940 --> 00:10.840]  A.D.T.
--------------------
8
[00:00.000 --> 00:06.460]  Music
[00:10.700 --> 00:17.820]  Music
[00:17.820 --> 00:17.820]  Music
[00:17.820 --> 00:26.600]  Clapping
--------------------
9
[00:00.000 --> 00:03.920]  【aggressive pencil writing】
[00:03.920 --> 00:08.400]  【erratic pencil writing】
[00:09.920 --> 00:13.900]  【aggressive pencil writing】
[00:16.000 --> 00:18.920]  《Good Jansses!》
[00:22.780 --> 00:26.480]  【applause】
--------------------
10
[00:00.780 --> 00:02.000]  F-A-N-C-H
[00:02.000 --> 00:02.200]  F-A-N-C-H
[00:02.200 --> 00:02.840]  F-A-N-C-H
[00:04.240 --> 00:05.020]  F-A-N-C-H
[00:05.020 --> 00:06.960]  F-A-N-C-H
[00:06.960 --> 00:07.020]  F-A-N-C-H
[00:07.020 --> 00:23.780]  F-A-N-C-H
--------------------

Note that this is for the second posted file, 5-2.

ryanheise Dec 12, 2023
Author

Ok, well the .json will record the timestamps, the no_speech probabilities, etc so hopefully that works.

I was thinking more along the lines of where, if someone shares a file that tends to induce hallucinations, but the file is 1 hour long, and the hallucinations tend to occur at the beginning, and at the 18 minute mark, then sharing those two locations will be very helpful. Hopefully that sets the bar low on contributing, since it will be better to collect a wide variety of different audio files with a variety of different trigger conditions for hallucinations.

qaixerabbas · 2023-12-12T10:28:45Z

qaixerabbas
Dec 12, 2023

Why does the Whisper return random text (hallucinations) for an audio that does not contain any speech? I tested it with random traffic and one minute of silence. The output contains few words. But as per my thinking, it should not return anything in case there is no speech in the audio. How can I handle this ?

Output:
"bub protectoご視聴ありがとうございました"
for Input:
No Sound Test.zip

Output:
"you you you"
for input:
TEST_One minute of silence (ID 0917)_BSB.zip

2 replies

ryanheise Dec 12, 2023
Author

@qaixerabbas thanks for sharing the samples.

Why does the Whisper return random text (hallucinations) for an audio that does not contain any speech?

It is likely because the training data contained transcripts during silence. For example, if you think about subtitles for movies, sometimes you might see a Copyright notice in the subtitles, or you might see credits such as "Subtitles written by xyz", even though there is no actual speech happening in the audio at the timestamp where these subtitles appear. So imagine the neural net learning from this data. It will basically learn (for example) that when there is silence at the end of the video, it should "make up" some text to go along with the silence, the sort of text that it learns typically goes at the end of a video. The situation would be better with carefully cleaned up training data, although there is a tradeoff between how much effort it takes to clean up data, and how massive a data set you can build (more data = better).

There are many solutions and workarounds people are working on to prevent hallucinations, and most of them involve just cutting out the silent parts of the audio so that Whisper is never tempted by the silence (you can search the discussions for "hallucinations" to find the various solutions).

qaixerabbas Dec 12, 2023

@ryanheise thank you for the detailed explanation. I will look at this.

misutoneko · 2023-12-12T13:17:08Z

misutoneko
Dec 12, 2023

So I guess no one here has heard about the --suppress_tokens "" trick?
Or is there some problem using it?

I've posted about it before, maybe it was simply lost in traffic.
Please do try it, and comment if it helped you or not.
I have used this myself with small and medium models mostly (haven't tested with large-v3) and it was a dramatic difference.
Note that it will generate noise descriptions, but all that "Thank you for watching" etc. nonsense should be gone.

UPDATE: This method does NOT work with Turbo (and by extension, it probably won't work with large-v3 either).

4 replies

ryanheise Dec 12, 2023
Author

So I guess no one here has heard about the --suppress_tokens "" trick?

The purpose of this discussion is to work together to build a dataset of audio examples that induce hallucinations so that people can test their solutions on it. (For example, you could test your trick/solution on these 4 examples to see how it compares against other solutions.)

misutoneko Dec 12, 2023

Right, I don't mean to derail this too much. These samples are very useful, thank you anyone who donated :D
I tried the trick with the samples qaixerabbas has kindly provided, and it works well for those at least, probably others too. I just got the impression that this trick isn't very well known because it has always made such a big difference for me, yet seems like no-one uses it. But I'm also using Whisper in a bit unorthodox way.

EDIT: Here's what I got with the synthetic samples (this is with Whisper version 20230314):

$ CUDA_VISIBLE_DEVICES=-1 whisper --threads 1 --language en --device cpu --output_format srt --model small --suppress_tokens "" --output_dir . whisper-synthetic-5.wav
/home/z/.local/lib/python3.10/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
[00:00.000 --> 00:14.000] [Music]
[00:14.000 --> 00:26.000] [Music]
[00:26.000 --> 00:36.000] [Applause]
$ CUDA_VISIBLE_DEVICES=-1 whisper --threads 1 --language en --device cpu --output_format srt --model small --suppress_tokens "" --output_dir . Whisper-adversary52.wav
/home/z/.local/lib/python3.10/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
[00:00.000 --> 00:04.000] [typing]
[00:04.000 --> 00:06.000] [typing]
[00:06.000 --> 00:08.000] [typing]
[00:08.000 --> 00:10.000] [typing]
[00:10.000 --> 00:12.000] [typing]
[00:12.000 --> 00:14.000] [typing]
[00:14.000 --> 00:16.000] [typing]
[00:16.000 --> 00:24.000] [music]
[00:26.000 --> 00:29.000] [applause]

glangford Dec 13, 2023

@misutoneko Thank you for the comments and for trying the synthetic samples. With the 5-2 sample, I had been getting wildly different results each run. Using --suppress_tokens "" with large-v2 consistently works more predictably and is not fooled by these sound effects.

If I had to guess, what is happening here is that the default list of suppression tokens causes the identification of background sounds (typing, music, applause, etc) to be ignored - and decoding is allowed to continue until it hallucinates speech.

misutoneko Dec 13, 2023

Yes, it seems like it's a combination of bad training data and the way the default suppression works. The prediction is kind of "forced to take a wrong turn". Note that hallucinations happen in whisper.cpp too, but in a different way because they don't use the same code.
There also a Rust implementation (via candle), maybe others... I suspect they'll bump into similar problems.
So these samples should come in handy there, too.

Here's my original thread from July:
#1488

dgoryeo · 2023-12-13T01:55:38Z

dgoryeo
Dec 13, 2023

Hi @misutoneko , do the tokens need to be grouped/sequenced together when passed to --suppress_tokens?

For example below is a hallucination. Should the command be --suppress_tokens "50364, 7, 6847, 96, 7016, 32045, 8, 50464" ? Does this suppress any occurences of these tokens in results, or only if they are sequenced as given?

{
        "id": 0,
        "seek": 0,
        "start": 10052.738,
        "end": 10055.078,
        "text": "(泣き声)",
        "tokens": [
            50364,
            7,
            6847,
            96,
            7016,
            32045,
            8,
            50464
        ]

1 reply

misutoneko Dec 13, 2023

AFAICR there was a bug in command line handling of this parameter (only one token can be given), so it's either --suppress_tokens "" (as in, empty string) which means that nothing is suppressed. Or alternatively --suppress_tokens 50364 or --suppress_tokens 50363 depending on whether you have english-only or multilingual model. That will suppress the BEG token only, and that also worked for me with whisper-timestamped (I haven't tried it with regular/vanilla Whisper). It probably induces the "no timestamp" mode, but that's not necessarily a problem (if there's no actual speech in the clip, you probably don't care about the timing).

If you really want to give multiple tokens, I think that can be done via Python, but I'm not so sure that's a good idea.
Any and all occurrences of the tokens that are on the "suppression token list" are suppressed. Unless you write your own suppression code, of course.

dgoryeo · 2023-12-19T10:39:50Z

dgoryeo
Dec 19, 2023

Hi, here is a 3min audio with a combination of music and kitchen noise.
3min-kitchen-audio

The hallucination liness by large-v2:

[00:00.000 --> 00:30.000]  『Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Fall, Winter, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Spring
[00:30.000 --> 00:39.080]  Summer, Summer, Summer, Summer, Spring, Summer, Spring, Summer, Spring, Summer, Winter, Summer, Spring
[00:40.200 --> 00:49.200]  Summer, Summer, Summer, Summer, Summer, 19 years later
[00:49.200 --> 00:55.100]  Remove the rockets.
[00:56.080 --> 01:01.160]  Don't overcook.
[01:02.720 --> 01:08.400]  Turn it over right away if you think it's burning.
[01:09.020 --> 01:14.900]  It wasn't have a light.
[01:14.900 --> 01:18.900]  I'm going to make the sauce for the meatball.
[01:18.900 --> 01:20.900]  I'm going to make the sauce for the meatball.
[01:20.900 --> 01:22.900]  I'm going to make the sauce for the meatball.
[01:22.900 --> 01:24.900]  I'm going to make the sauce for the meatball.
[01:24.900 --> 01:26.900]  I'm going to make the sauce for the meatball.
[01:26.900 --> 01:28.900]  I'm going to make the sauce for the meatball.
[01:28.900 --> 01:30.900]  I'm going to make the sauce for the meatball.
[01:30.900 --> 01:32.900]  I'm going to make the sauce for the meatball.
[01:32.900 --> 01:34.900]  I'm going to make the sauce for the meatball.
[01:34.900 --> 01:36.900]  I'm going to make the sauce for the meatball.
[01:36.900 --> 01:38.900]  I'm going to make the sauce for the meatball.
[01:38.900 --> 01:40.900]  I'm going to make the sauce for the meatball.
[01:40.900 --> 01:42.900]  I'm going to make the sauce for the meatball.
[01:42.900 --> 01:44.900]  I'm going to make the sauce for the meatball.
[01:44.900 --> 01:46.900]  I'm going to make the sauce for the meatball.
[01:46.900 --> 01:48.900]  I'm going to make the sauce for the meatball.
[01:48.900 --> 01:50.900]  I'm going to make the sauce for the meatball.
[01:54.900 --> 01:58.900]  I closed my eyes for a while.
[01:59.900 --> 02:01.900]  I opened my eyes again.
[02:09.900 --> 02:11.900]  I've been hiding in this room for a while.
[02:42.900 --> 02:59.900]  Thank you for watching my video.

And, here is the hallucination by large-v3:


[00:00.000 --> 00:05.000]  This video is a derivative work of the Touhou Project.
[00:05.000 --> 00:10.000]  It is based on the Touhou Project.
[00:10.000 --> 00:15.000]  It is based on the Touhou Project.
[00:15.000 --> 00:20.000]  It is based on the Touhou Project.
[00:20.000 --> 00:25.000]  It is based on the Touhou Project.
[00:25.000 --> 00:30.000]  It is based on the Touhou Project.
[00:30.000 --> 00:35.000]  It is based on the Touhou Project.
[00:35.000 --> 00:40.000]  It is based on the Touhou Project.
[00:40.000 --> 00:44.000]  Give me Trap louder
[00:44.000 --> 00:49.000]  Give me Trap louder
[00:49.000 --> 00:54.000]  Give me Trap louder
[00:54.000 --> 00:59.000]  Give me Trap louder
[00:59.000 --> 01:29.000]  It's been a long time since I've been here, but it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's
[01:29.000 --> 01:36.120]  been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so it's been a long time since I've been here, so
[01:36.120 --> 01:41.120]  Thank you for watching until the end.
[02:06.120 --> 02:11.120]  Thank you for watching until the end.
[02:36.120 --> 02:41.120]  Thank you for watching until the end.

And, hallucination by large-v1:

[00:30.000 --> 00:39.000]  Add the chopped green onion and stir-fry.
[00:40.000 --> 00:45.000]  Add the chopped green onion and stir-fry.
[00:46.000 --> 00:55.000]  Add the chopped green onion and stir-fry.
[00:55.000 --> 01:01.000]  Add the chopped green onion and stir-fry.
[01:02.000 --> 01:09.000]  Add the chopped green onion and stir-fry.
[01:10.000 --> 01:16.000]  Add the chopped green onion and stir-fry.
[01:16.000 --> 01:22.000]  Add the chopped green onion and stir-fry.
[01:23.000 --> 01:29.000]  Add the chopped green onion and stir-fry.
[01:30.000 --> 01:37.000]  Add the chopped green onion and stir-fry.
[01:37.000 --> 01:42.000]  Add the chopped green onion and stir-fry.
[02:07.000 --> 02:15.000]  Add the chopped green onion and stir-fry.
[02:16.000 --> 02:22.000]  Add the chopped green onion and stir-fry.
[02:23.000 --> 02:29.000]  Add the chopped green onion and stir-fry.
[02:30.000 --> 02:36.000]  Add the chopped green onion and stir-fry.
[02:37.000 --> 02:41.000]  Add the chopped green onion and stir-fry.
[02:43.000 --> 02:50.000]  Add the chopped green onion and stir-fry.
[02:51.000 --> 02:58.000]  Add the chopped green onion and stir-fry.

1 reply

glangford Dec 19, 2023

With --suppress_tokens "" and large-v3 it produces this:

[00:00.000 --> 00:10.000]  ♪♪♪
[00:10.000 --> 00:20.000]  ♪♪♪
[00:20.000 --> 00:30.000]  ♪♪♪
[00:30.000 --> 00:40.000]  ♪♪♪
[00:40.000 --> 00:50.000]  ♪♪♪
[00:50.000 --> 01:00.000]  ♪♪♪
[01:00.000 --> 01:10.000]  ♪♪♪
[01:10.000 --> 01:20.000]  ♪♪♪
[01:20.000 --> 01:30.000]  ♪♪♪
[01:30.000 --> 01:40.000]  ♪♪♪
[01:46.000 --> 01:48.000]  ♪♪♪
[01:48.000 --> 01:50.000]  ♪♪♪
[01:50.000 --> 01:54.880]  *sneezes*
[01:54.880 --> 01:58.880]  *whisper*
[01:58.880 --> 02:04.880]  *whisper*
[02:04.880 --> 02:10.880]  *whisper*
[02:10.880 --> 02:14.880]  *whisper*
[02:14.880 --> 02:44.860]  All right, here we go.
[02:44.880 --> 03:00.000]  All right, here we go.

whicks1 · 2023-12-19T22:48:05Z

whicks1
Dec 19, 2023

FWIW, my shop has processed thousands of hours of English files with a pretty wide range of speakers and lots of silence/music hallucinations. (we stick to large-v2). All our outputs are VTTs which we post-process by manipulating the output with webvtt CaptionLists. I do dozens of other corrections for casing, time, named entities, and line length that are all particular to my org's use cases (currently in a private repo, sorry I can't share the whole thing ATM), but here's a handful of the things that are most common in what we see, YMMV obviously:

# List of common/known whisper hallucination words or phrases.
# any text cue that contains one of these ANYWHERE in the caption.text
# will be removed in it's entirety.
INLINE_HALLUCINATIONS = [
    "amara.org",
    "by Sir Edward Elgar",  # might kill a few legit lines in music lectures.
    "don't forget to subscribe",
    "like and subscribe",
    "mooji",
    "otter.ai",
    "Piano music continues",
    "please post them in the comments",
    "please subscribe to my channel",
    "Pomp and Circumstance March No.",
    "rev.com",
    "Transcribed by ESO",
    "© BF-WATCH TV",
    "© transcript",
]

# hallucination that are common where the entire caption.text must match
# and the entire text block gets removed. 
# Note: loop over terminating punctuations to created variants.
EXACT_HALLUCINATIONS = [
    "[\"dance of the sugar plum fairy\"]",
    "[\"my country tis of thee\"]",
    "[\"pomp and circumstance\"]",
    "[\"the star spangled banner\"]",
    "[\"the wizard of oz\"]",
    "ah",
    "ahh",
    "all right.",
    "allright",
    "and",
    "beep",
    "bye",
    "ha",
    "hmm",
    "I'm",
    "my country tis of thee",
    "oh",
    "Okay",
    "piano music",
    "pomp and circumstance",
    "relaxing piano music",
    "so",
    "Thank you for watching",
    "Thanks for watching",
    "the star spangled banner",
    "the",
    "uh",
    "um",
    "umm",
    "what",
    "Whoa",
    "woo"
    "yeah",
    "you",
]
# Extend EXACT_HALLUCINATIONS with same terms
# + these trailing characters.
TERMINATING_PUNCTUATIONS = [".", "?", "!"]

# When the following phrases occur
# in the final cue of the file, the entire cue will
# be dropped. Could probably simplify with regex r'thanks?'
# this will cause legit "thank you's" to be dropped.
REMOVE_FINAL_CUE_TEXT = [
    "thank y'all",
    "thank you again",
    "thank you all so much",
    "thank you all very much",
    "thank you all",
    "thank you everyone",
    "thank you for",
    "thank you so much",
    "thank you too",
    "thank you very much",
    "thank you",
    "thank you, bye",
    "thank you, sir",
    "thank you. bye",
    "thank you. thank",
    "thanks a lot",
    "thanks so much",
    "thanks very much",
    "thanks",
    "thanks, everyone",
]

We also filter to a limited subset of unicode characters to drop obvious non-latin strings. Rationale: Why would Asiatic characters ever legitimately appear in an English text?

# unicode ranges for filtering text.
# only active chracter ranges are allowed in the ouput.
UNICODE_RANGES = [
    (0x0020, 0x007F), # Basic Latin
    (0x00A0, 0x00FF), # Latin-1 Supplement
    (0x0100, 0x017F), # Latin Extended-A
    (0x0180, 0x024F), # Latin Extended-B
    (0x1E00, 0x1EFF),   # Latin Extended Additional

Additionally:

if a line contains only a single character, punctuation, or an ellipse, it's dropped.
if the line is the exact same text as the previous, it's dropped. Can be problematic for music.
if a sting contains a long sequence of unbroken text with no spaces, it's shortended (e.g. beeeeeeeep => beep)
if a phrase of 3-5 words repeats more than 3 times in a caption cue, it's cut to a max of 2 repeats.
remove lines that contain what appears to be this decoding error: re.compile(r'^(LSp\d|StSq|CCoSp)|^\d[A-Z][a-z]?\s\d')

None of these are prefect but they cut down a lot of the extra noise to a more reasonable looking output in many circumstances.

1 reply

ryanheise Dec 20, 2023
Author

Hi @whicks1 , have you seen #928 (comment) ? You might want to share your work with everyone there since that discussion is building a set of text outputs rather than the audio files themselves.

FlavioFS · 2024-02-06T00:12:45Z

FlavioFS
Feb 6, 2024

Now, I want to download the top XF SUPER SUS TOWER
Is it available already at the App Store?

0 replies

NielsMayer · 2024-02-28T04:59:00Z

NielsMayer
Feb 28, 2024

Source: https://rumble.com/v4g1ysf-ancient-pyramids-the-lost-city-of-atlantis-and-is-the-earth-actually-headed.html

First time I've seen this hallucination "KING OF THE R" during outro music of segment.

Unless of course the outro music is called "KING OF THE R" and it is being "labelled"

[01:21:16.460 --> 01:21:18.060]  Until then, guys, have a great day.
[01:21:18.120 --> 01:21:18.840]  Bye-bye.
[01:21:37.820 --> 01:21:48.820]  KING OF THE R curious
[01:21:20.160 --> 01:21:21.720] 小心
[01:21:48.840 --> 01:22:18.820]  KING OF THE R
[01:22:18.840 --> 01:22:48.820]  KING OF THE R
[01:22:48.840 --> 01:23:18.820]  KING OF THE R
[01:23:18.840 --> 01:23:48.820]  KING OF THE R
[01:23:48.840 --> 01:24:18.540]  KING OF THE R
[01:24:18.840 --> 01:24:48.820]  KING OF THE R
[01:24:48.840 --> 01:25:18.820]  KING OF THE R
[01:25:18.840 --> 01:25:48.820]  KING OF THE R
[01:25:48.840 --> 01:26:18.820]  KING OF THE R
[01:26:18.840 --> 01:26:48.820]  KING OF THE R
[01:27:14.220 --> 01:27:18.640]  KING OF THE R
[01:27:22.060 --> 01:27:22.140]  KING OF THE R

What do I mean by "labelled"?

When a segment is announced, and then it repeats the name of the segment: "The Wooden Prince by Bela Bartók is played now by Christian Maciolaru." (source https://www.bbc.co.uk/programmes/m001whrr )

Note that after that it completely goes off the rails. Where are the hallucinations of form "StSq2 2.60" and "3F 5.83 x (-1.00), FCSp4 3.20" coming from??

3672600	3677440	The wonderfully atmospheric opening takes us right into a fairy tale world.
3677580	3685840	A prince sees a princess from afar and he's immediately smitten but the forest fairy orders
3685840	3691980	her back into her castle and thrusts the forest into darkness so that the prince can't follow her.
3692900	3699440	To further stop him the stream overflows its banks, cue a scene of watery dances.
3700600	3705540	The prince takes his wooden staff and he puts his crown on it along with his cloak and a lock
3705540	3712660	of his hair. Seeing this the princess falls for this wooden prince and she dances with it.
3712960	3719120	The real prince is heartbroken. When the princess sees him though she transfers her affections
3719120	3726280	but by now the real prince is hurt and he turns away. The princess is upset and she cuts off her
3726280	3733460	hair and the prince rushes to her side and we hope they live happily ever after.
3733960	3740460	The ballet The Wooden Prince by Bela Bartók is played now by the WDR Symphony Orchestra Cologne
3740460	3742980	conducted by Christian Maciolaru.
3792700	3793160	The Wooden Prince by Bela Bartók is played now by Christian Maciolaru.
3793460	3823440	The Wooden Prince by Bela Bartók is played now by Christian Maciolaru.
3852560	3853140	The Wooden Prince by Bela Bartók is played now by Christian Maciolaru.
3853460	3883440	The Wooden Prince by Bela Bartók is played now by Christian Maciolaru.
3883460	3913440	The Wooden Prince by Bela Bartók is played now.
3913460	3943440	The Wooden Prince by Bela Bartók is played now.
3943460	3973440	The Wooden Prince is played now.
3973460	4003440	3F 5.83 x (-1.00), FCSp4 3.20
4032680	4033120	3F 5.83 x (-1.00), FCSp4 3.20
4033460	4063440	3F 5.83 x (-1.00), FCSp4 3.20
4063460	4093440	3F 5.83 x (-1.00), FCSp4 3.20
4093460	4123440	3F 5.83 x (-1.00), FCSp4 3.20
4123460	4153440	3F 5.83 x (-1.00), LSp4 2.70
4153460	4183440	CCoSp4 3.50
4183460	4213440	StSq2 2.60
4213440	4243420	CCoSp4 3.50
4243440	4273100	StSq2 2.60
4273100	4303080	CCoSp4 3.50
4303100	4333080	StSq2 2.60
4362200	4362700	CCoSp4 3.50
4363080	4393060	StSq2 2.60
4393060	4423040	CCoSp4 3.50
4423060	4453040	StSq2 2.60
4453040	4483020	CCoSp4 3.50
4483040	4513020	StSq2 2.60
4513040	4543020	CCoSp4 3.50
4543040	4573020	StSq2 2.60
4573040	4603020	CCoSp4 3.50
4603040	4633020	StSq2 2.60
4633040	4663020	CCoSp4 3.50
4663040	4693020	StSq2 2.60
4693020	4723000	CCoSp4 3.50
4723020	4753000	StSq2 2.60
4753000	4782980	CCoSp4 3.50
4783000	4812980	StSq2 2.60
4842320	4842700	CCoSp4 3.50
4842980	4872960	StSq2 2.60
4872980	4902960	CCoSp4 3.50
4902980	4932960	StSq2 2.60
4932960	4962940	CCoSp4 3.50
4962960	4992940	StSq2 2.60
4992940	5022920	CCoSp4 3.50
5052220	5052660	StSq2 2.60
5052660	5082380	CCoSp4 3.50
5082660	5112640	StSq2 2.60
5112640	5142620	CCoSp4 3.50
5142640	5172620	StSq2 2.60
5172620	5202600	CCoSp4 3.50
5202620	5232600	StSq2 2.60
5232620	5262600	CCoSp4 3.50
5262620	5292600	StSq2 2.60
5292620	5322600	CCoSp4 3.50
5322620	5352600	StSq2 2.60
5381620	5382260	CCoSp4 3.50
5382620	5412600	StSq2 2.60
5412600	5442320	CCoSp4 3.50
5442600	5472580	StSq2 2.60
5472580	5502560	CCoSp4 3.50
5502580	5532560	StSq2 2.60
5561680	5562240	CCoSp4 3.50
5562560	5592540	StSq2 2.60
5592540	5622520	CCoSp4 3.50
5622540	5652520	StSq2 2.60
5681900	5682040	CCoSp4 3.50
5682520	5712500	StSq2 2.60
5712500	5742480	CCoSp4 3.50
5742500	5772480	StSq2 2.60
5772500	5802200	CCoSp4 3.50
5802500	5832480	StSq2 2.60
5832480	5862460	CCoSp4 3.50
5862480	5892460	StSq2 2.60
5892460	5922440	CCoSp4 3.50
5922440	5952420	StSq2 2.60
5981560	5982140	CCoSp4 3.50
5982420	6012400	StSq2 2.60
6012400	6042380	CCoSp4 3.50
6042400	6072380	StSq2 2.60
6072380	6102360	CCoSp4 3.50
6102380	6132360	StSq2 2.60
6132360	6162340	CCoSp4 3.50
6162360	6192340	StSq2 2.60
6192340	6222320	CCoSp4 3.50
6222340	6252320	StSq2 2.60
6252320	6282300	CCoSp4 3.50
6282320	6312300	StSq2 2.60
6312300	6342280	CCoSp4 3.50
6342300	6372280	StSq2 2.60
6372280	6402260	CCoSp4 3.50
6402280	6432260	StSq2 2.60
6432280	6462260	CCoSp4 3.50
6462280	6492260	StSq2 2.60
6492260	6522240	CCoSp4 3.50
6522260	6552240	StSq2 2.60
6552240	6582220	CCoSp4 3.50
6582240	6611960	StSq2 2.60
6612240	6642220	CCoSp4 3.50
6642240	6672220	StSq2 2.60
6672240	6702220	CCoSp4 3.50
6731660	6731880	StSq2 2.60
6731880	6761860	CCoSp4 3.50
6761880	6791860	StSq2 2.60
6791860	6821840	CCoSp4 3.50
6821860	6851520	StSq2 2.60
6851520	6865700	1
6865700	6881500	2
6881500	6884900	Once upon a time, two happily ever after.
6885500	6890780	The love story of the Wooden Prince told in ballet music by Bela Bartok
6890780	6897740	and performed by the WDR Symphony Orchestra Cologne conducted by Christian Macielaru.

0 replies

NielsMayer · 2024-02-28T05:10:42Z

NielsMayer
Feb 28, 2024

Another fun hallucination is "counting"

(source: https://www.bbc.co.uk/programmes/m001qklp )

1183240	1200600	BBC Radio one one
1213240	1215240	One
1215240	1217240	Two
1217240	1219240	Three
1219240	1221240	Four
1221240	1223240	Five
1223240	1225240	Six
1225240	1227240	Seven
1227240	1229240	Eight
1229240	1231240	Nine
1231240	1233240	Ten
1233240	1235240	Eleven
1235240	1237240	Twelve
1237240	1239240	Thirteen
1239240	1241240	Twelve
1241240	1243240	Four
1243240	1245240	Five
1245240	1247240	Six
1247240	1249240	Seven
1249240	1251240	Eight
1251240	1253240	Nine
1253240	1255240	Ten
1255240	1257240	Four
1257240	1259240	Five
1259240	1261240	Six
1261240	1263240	Seven
1263240	1265240	Eight
1265240	1267240	Nine
1267240	1269240	Nine
1269240	1271240	Ten
1271240	1273240	Four
1273240	1275240	Five
1275240	1277240	Six
1277240	1279240	Seven
1279240	1281240	Eight
1281240	1283240	Nine
1283240	1285240	Nine
1285240	1287240	Eight
1287240	1289240	Nine
1289240	1291240	Nine
1291240	1293240	Eight
1293240	1295240	Nine
1295240	1297240	Eight
1297240	1299240	Ten
1299240	1301240	Four
1301240	1303240	Five
1303240	1305240	Six
1305240	1307240	Seven
1307240	1309240	Eight
1309240	1311240	Nine
1311240	1313240	Ten
1313240	1315240	Four
1315240	1317240	Five
1317240	1319240	Six
1319240	1321240	Seven
1321240	1323240	Eight
1323240	1325240	Seven
1325240	1327240	Ten
1327240	1329240	Four
1329240	1331240	Five
1331240	1333240	Six
1333240	1335240	Seven
1335240	1337240	Eight
1337240	1339240	Nine
1339240	1341240	Ten
1341240	1343240	Four
1343240	1345240	Five
1345240	1347240	Six
1347240	1349240	Seven
1349240	1351240	Eight
1351240	1353240	Nine
1353240	1355240	Ten
1355240	1357240	Four
1357240	1359240	Five
1359240	1361240	Six
1361240	1363240	Seven
1363240	1365240	Eight
1365240	1367240	Nine
1367240	1369240	Ten
1369240	1371240	Four
1371240	1373240	Five
1373240	1375240	Six
1375240	1377240	Seven
1377240	1379240	Eight
1379240	1381240	Nine
1381240	1383240	Ten
1383240	1385240	Four
1385240	1387240	Five
1387240	1389240	Six
1389240	1391240	Seven
1391240	1393240	Eight
1393240	1395240	Nine
1395240	1397240	Ten
1397240	1399240	Eight
1399240	1401240	Nine
1401240	1403240	In the mix
1403240	1405240	In the mix
1405240	1407240	In the mix
1407240	1409240	In the mix
1409240	1411240	In the mix
1411240	1413240	In the mix
1413240	1415240	In the mix
1415240	1417240	In the mix
1417240	1419240	In the mix
1419240	1421240	In the mix
1421240	1423240	In the mix
1423240	1425240	In the mix

Source: https://www.bbc.co.uk/programmes/m001rws1

also "En-za-mex" == "In the mix" ??

1843940	1848940	BBC Radio 1
1873940	1878940	BBC Radio 1
1903940	1908940	BBC Radio 1
1913940	1918940	BBC Radio 1
1933940	1938940	BBC Radio 1
1953940	1958940	BBC Radio 1
1963940	1968940	BBC Radio 1
1975940	1980940	BBC Radio 1
1993940	1998940	BBC Radio 1
2003940	2008940	BBC Radio 1
2012940	2017940	BBC Radio 1
2017940	2022940	BBC Radio 2
2028940	2033940	BBC Radio 3
2040940	2045940	BBC Radio 4
2077940	2082940	BBC Radio 5
2083940	2088940	BBC Radio 6
2089940	2094940	BBC Radio 7
2095940	2100940	BBC Radio 8
2100940	2105940	BBC Radio 9
2105940	2110940	BBC Radio 9
2111940	2115940	BBC Radio 10
2116940	2121940	BBC Radio 11
2122940	2127940	BBC Radio 11
2127940	2132940	BBC Radio 12
2133940	2137940	BBC Radio 11
2138940	2143940	BBC Radio 12
2144940	2149940	BBC Radio 12
2149940	2154940	BBC Radio 12
2155940	2160940	BBC Radio 12
2161940	2166980	BBC Radio 12
2167100	2172040	BBC Radio 12
2172840	2178900	BBC Radio 12
2179940	2184900	BBC Radio 12
2184940	2191900	BBC Radio 12
2191940	2197900	BBC Radio 12
2197940	2202900	BBC Radio 12
2202900	2207860	BBC Radio 12
2207900	2212860	BBC Radio 12
2212900	2217860	BBC Radio 12
2217900	2222860	BBC Radio 12
2222900	2227860	BBC Radio 12
2227860	2232820	BBC Radio 12
2232860	2237820	BBC Radio 12
2237860	2242820	BBC Radio 12
2242860	2247820	BBC Radio 12
2247860	2252820	BBC Radio 12
2252860	2257820	BBC Radio 12
2257860	2262820	BBC Radio 12
2262860	2267820	BBC Radio 12
2267860	2272820	BBC Radio 12
2272860	2277820	BBC Radio 12
2277860	2282820	BBC Radio 12
2282820	2287780	BBC Radio 12
2287820	2292780	BBC Radio 12
2292780	2297740	Bachelor appointment
2297780	2302740	BBC Radio 12
2302780	2307740	BBC Radio 12
2307780	2312740	BBC Radio 12
2312780	2317740	BBC Radio 12
2317780	2322740	BBC Radio 12
2322780	2327740	BBC Radio 1
2327780	2332740	BBC Radio 1
2332780	2335740	Ivey
2335780	2340740	En-za-mex
2340780	2345740	En-za-mex
2345780	2350740	En-za-mex
2350740	2356700	En-za-mex
2356740	2361700	En-za-mex
2365740	2370700	En-za-mex
2370740	2375700	En-za-mex
2375740	2380700	En-za-mex
2381740	2386700	En-za-mex
2386740	2391700	En-za-mex
2391740	2397700	En-za-mex
2397740	2402700	En-za-mex
2402740	2406700	En-za-mex
2406740	2410700	En-za-mex
2410740	2415700	En-za-mex
2415740	2420700	En-za-mex
2420740	2425700	En-za-mex
2425740	2430700	En-za-mex
2430740	2435700	En-za-mex

1 reply

NielsMayer Feb 28, 2024

FYI, although I'm not locating it right now, I've seen a "counting hallucination" go to near 100. In correct sequence, etc.

Which is quite an emergent "intelligence" of the model. How does it know to count, and why does it count during hallucinations?

libTorrentUser · 2024-03-11T18:05:20Z

libTorrentUser
Mar 11, 2024

https://archive.org/details/movie-1992-musketeers.twenty.years.after

Whisper surely loves you. Specially when playing 2.mpeg and 3.mpeg

0 replies

jonathanlal · 2024-07-29T21:03:44Z

jonathanlal
Jul 29, 2024

Here's one i just got - searched "openai CastingWords" on google and found this thread.

23
00:01:12,000 --> 00:01:15,000
You can't be stupid and get that. He's full of shit.

24
00:01:15,000 --> 00:01:19,000
Donald Trump, on the other hand, is fucking crazy.

25
00:01:26,000 --> 00:01:29,000
Transcription by CastingWords

"Transcription by CastingWords" Wtf?

1 reply

ryanheise Jul 30, 2024
Author

As explained in the initial post, can you please include:

The audio/video file itself

(already included) Timestamps where hallucinations occur (unless they occur everywhere)

RussZhang · 2024-07-30T06:45:24Z

RussZhang
Jul 30, 2024

Hi, I encounter a hallucination problem when using Whisper-large-v3. Please see here.
#2280 (comment)

0 replies

dontnet-wuenze · 2024-08-15T07:17:10Z

dontnet-wuenze
Aug 15, 2024

Hello, I am working on Whisper-large-v3 to transcribe Chinese audio and I meet the same hallucination problem as #2071.
I first do the segment of the audio, but some background music are not filtered by vad model and are input to the Whisper-large-v3.
These background music in audio segment are not recognized by Whisper-large-v3 and still be transcribed in the same manner.
For example, the results are ”'由 Amara.org 社群提供的字幕'“, "'中文字幕志愿者杨茜茜'", "'中文字幕——YK'"
Is there any advice on this hallucination problem?
I would appreciate it if someone could help.

2 replies

ryanheise Aug 15, 2024
Author

Can you share your mp3 file here?

dontnet-wuenze Aug 15, 2024

Here is the mp3 and complete result, thanks!
test.mp3.zip

{
"start": 20.657,
"end": 27.193,
"text": "由 Amara.org 社群提供的字幕",
"words": [],
"speaker": null
},
{
"start": 84.77,
"end": 113.916,
"text": "中文字幕志愿者杨茜茜",
"words": [],
"speaker": null
},
{
"start": 159.155,
"end": 164.497,
"text": "中文字幕志愿者杨茜茜",
"words": [],
"speaker": null
},
{
"start": 186.937,
"end": 192.807,
"text": "中文字幕——YK",
"words": [],
"speaker": null
},
]

NielsMayer · 2024-09-15T06:37:22Z

NielsMayer
Sep 15, 2024

So I finally figured out where this class of hallucination comes from -- figure skating: https://www.isuresults.com/results/season2425/jgpcze2024/FSKXPAIRS---JUNIOR----QUAL000100--_JudgesDetailsperSkater.pdf

[01:09:58.320 --> 01:10:28.300] StSq2 2.60
[01:10:28.320 --> 01:10:58.300] CCoSp4 3.50
[01:10:58.320 --> 01:11:28.300] StSq2 2.60
[01:11:28.320 --> 01:11:58.300] CCoSp4 3.50
[01:11:58.320 --> 01:12:28.300] StSq2 2.60
[01:12:28.300 --> 01:12:58.280] CCoSp4 3.50
[01:12:58.300 --> 01:13:28.280] StSq2 2.60
[01:13:28.280 --> 01:13:58.260] CCoSp4 3.50
[01:13:58.280 --> 01:14:28.000] StSq2 2.60
[01:14:28.000 --> 01:14:57.980] CCoSp4 3.50
[01:14:57.980 --> 01:15:27.960] StSq2 2.60
[01:15:27.960 --> 01:15:57.940] CCoSp4 3.50
[01:15:57.960 --> 01:16:27.940] StSq2 2.60
[01:16:27.940 --> 01:16:57.920] CCoSp4 3.50
[01:16:57.940 --> 01:17:27.920] StSq2 2.60
[01:17:27.920 --> 01:17:57.900] CCoSp4 3.50
[01:17:57.900 --> 01:18:27.880] StSq2 2.60
[01:18:27.880 --> 01:18:57.860] CCoSp4 3.50
[01:18:57.880 --> 01:19:27.860] StSq2 2.60
[01:19:27.860 --> 01:19:57.840] CCoSp4 3.50
[01:19:57.860 --> 01:20:27.840] StSq2 2.60
[01:20:27.840 --> 01:20:57.820] CCoSp4 3.50
[01:21:26.840 --> 01:21:27.480] StSq2 2.60
[01:21:56.800 --> 01:21:57.180] StSq2 2.60
[01:21:57.180 --> 01:22:27.160] CCoSp4 3.50
[01:22:27.180 --> 01:22:57.160] StSq2 2.60
[01:22:57.180 --> 01:23:27.160] CCoSp4 3.50
[01:23:27.180 --> 01:23:56.860] StSq2 2.60
[01:23:56.860 --> 01:24:26.840] CCoSp4 3.50
[01:24:56.140 --> 01:24:56.600] StSq2 2.60
[01:24:56.600 --> 01:25:26.580] CCoSp4 3.50
[01:25:26.600 --> 01:25:56.580] StSq2 2.60
[01:25:56.600 --> 01:26:26.580] CCoSp4 3.50
[01:26:26.600 --> 01:26:56.580] StSq2 2.60
[01:26:56.600 --> 01:27:26.580] CCoSp4 3.50
[01:27:26.600 --> 01:27:56.580] StSq2 2.60
[01:27:56.580 --> 01:28:26.560] CCoSp4 3.50
[01:28:26.580 --> 01:28:56.560] StSq2 2.60
[01:28:56.580 --> 01:29:26.560] CCoSp4 3.50
[01:29:26.580 --> 01:29:56.560] StSq2 2.60
[01:29:59.380 --> 01:30:04.200] Anton Bruckner's Second Symphony in a performance from Geneva's Victoria Hall
[01:30:04.200 --> 01:30:11.080] in March 2024. Jonathan Lott conducting the Swiss Romande Orchestra. Back to
[01:30:11.080 --> 01:30:14.220] Mozart here on Radio 3 through the Night Center when this next piece was
[01:30:14.220 --> 01:30:18.120] published in the 1790s, Beethoven immediately wrote his piano and wind
[01:30:18.120 --> 01:30:26.540] quintet in the same key and with a similar lineup of instruments. This is
[01:30:26.540 --> 01:30:31.360] finished it. Mozart said he thought it was the best thing he'd ever composed.
[01:54:45.000 --> 01:54:51.860] The quintet for piano and winds by Mozart from the Riese Summer Festival in Norway,
[01:54:51.860 --> 01:54:58.280] the oboist Douglas Boyd, clarinetist Hans Christian Breen, horn player Hjel Erik Arnersen,
[01:54:58.440 --> 01:55:02.600] bassoonist Per Hanisdal, and pianist Andreas Steier.
[01:55:04.060 --> 01:55:07.440] A little more Bruckner to follow on here on Radio 3 through the night,
[01:55:07.480 --> 01:55:10.780] and if the words Bruckner and Little seem mutually exclusive, well,
[01:55:11.120 --> 01:55:15.560] his motets are as concise as his symphonies are expansive.

(source https://www.bbc.co.uk/programmes/m0022ctz )

Suggestion: don't use train your language models on Ice Skating videos, or classical music will trigger transcription of figure skating notations. :-)

0 replies

Share your hallucinations here #1873

Replies: 17 comments · 21 replies

ryanheise Dec 8, 2023 Author

ryanheise Dec 8, 2023 Author

ryanheise Dec 11, 2023 Author

ryanheise Dec 12, 2023 Author

ryanheise Dec 12, 2023 Author

ryanheise Dec 12, 2023 Author

ryanheise Dec 20, 2023 Author

ryanheise Jul 30, 2024 Author

ryanheise Aug 15, 2024 Author

Replies: 17 comments 21 replies

ryanheise Dec 8, 2023
Author

ryanheise Dec 8, 2023
Author

ryanheise Dec 11, 2023
Author

ryanheise Dec 12, 2023
Author

ryanheise Dec 12, 2023
Author

ryanheise Dec 12, 2023
Author

ryanheise Dec 20, 2023
Author

ryanheise Jul 30, 2024
Author

ryanheise Aug 15, 2024
Author