The failure to break sentences correctly #45

despairTK · 2023-07-27T15:39:22Z

despairTK
Jul 27, 2023

Regarding the discussion about the problem of not being able to break sentences correctly when transcribing subtitles, I have tried many ways and still can't solve it. Please see the screenshot.

I have summarized the possible reasons for the failure to break sentences correctly in the screenshot. Generally, it only occurs in videos over 30 minutes long and the speaker speaks more frequently, so I usually only use medium.en, large-v2 and medium models for the sake of correct subtitle. However, it is often possible that a lower level model is needed to generate correctly broken subtitles for such videos.
And I've tried to use different example sentences to improve it and it doesn't always work 100% of the time.
That's why I wanted to open this discussion to share if there is anyone out there who can fix this situation.

Purfview · 2023-07-27T15:50:37Z

Purfview
Jul 27, 2023
Maintainer

Looks like it doesn't detect end of the sentence.
Sometimes you change some setting and it starts splitting lines.

Try different compute_type, beam_size, model, initial_prompt.

1 reply

despairTK Jul 31, 2023
Author

I tried to modify many parameters, but it was basically ineffective, only changing to a lower model had the possibility to work, but the accuracy dropped quite a bit. There is another method that is a bit trickier, which is to segment the audio and then recognize it one by one.

Purfview · 2023-08-07T04:16:43Z

Purfview
Aug 7, 2023
Maintainer

openai/whisper#625

0 replies

albino1 · 2023-09-03T02:43:36Z

albino1
Sep 3, 2023

large, aka large-v2, has always had lots of various problems in my experience, see ggerganov/whisper.cpp#675 for some examples. Maybe try large-v1 and see if still has this issue.

2 replies

wwaag76 Sep 3, 2023

Just ran WF on roughly 90 fairly short (less than 3 minutes) voice recording wav files using large-v2 and beam size of 5. The accuracy of the transcription was great given the noisy environment of the recording and a faulty Zoom recorder. Also some very unusual words such a "hyderized" or "Takkakaw" were transcribed correctly to my great surprise.

However, I'd say about a 20 to 25% were missing punctuation altogether--no commas, periods, or capitaliztion of the first word in sentence. In such instances, I reran using the Medium model and they worked OK.

Purfview Sep 3, 2023
Maintainer

But large-v1 has accuracy problems.

wwaag76 · 2023-09-03T03:59:00Z

wwaag76
Sep 3, 2023

Just ran a sample wav file that was about a minute and a half long with various models and beam sizes. For beam size = 5, punctuation was added only for the Base and Tiny models. However, setting beam size = 1, then large-v1, large-v2, and medium were also punctuated. So at least on my samples, settings beam size to 1 seems to be the key. Note that transcribing was correct for large-v2, large-v1 and medium models. Here are the actual results.

large-v1 beam size = 5
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.060 --> 00:07.280] all right well before I leave Valdez I thought I would just give a little wrap
[00:07.280 --> 00:12.120] up I spent the night at the Glacier Inn I thought they had a restaurant here
[00:12.120 --> 00:16.500] but they didn't so anyway I went to a place called the fat mermaid which
[00:16.500 --> 00:22.680] was quite good and sat outside with my jacket on even though I had shorts
[00:22.680 --> 00:29.140] on a number of other people were the same so the port city here is
[00:29.140 --> 00:34.960] just absolutely beautiful with the mountains surrounding it and it started
[00:35.440 --> 00:40.540] to rain early this morning and apparently it's supposed to continue for
[00:40.540 --> 00:48.080] the next few days so I don't know my Glacier tour tomorrow is probably going
[00:48.080 --> 00:53.720] to be in the rain but as long as you know the winds not blowing terribly and
[00:53.720 --> 00:59.320] the seas are rough it'll be alright so anyway well I'm off to Seward I want to
[00:59.320 --> 01:06.080] stop at the Worthington Glacier if I can see it in the rain and fog you
[01:06.640 --> 01:10.620] know once we get out of the the Chugach Mountains here it'll probably
[01:10.620 --> 01:15.680] be back up into the 70s like it was yesterday so anyway that's all I got
[01:15.680 --> 01:19.000] to say I'm going to drive around a bit get gas and then head out

Transcription speed: 1.04 audio seconds/s

laarage-v1 beamn size = 1
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.060 --> 00:07.640] All right, well, before I leave Valdez, I thought I would just give a little wrap-up.
[00:08.000 --> 00:13.160] I spent the night at the Glacier Inn. I thought they had a restaurant here, but they didn't.
[00:13.280 --> 00:17.200] So anyway, I went to a place called the Fat Mermaid, which was quite good.
[00:18.120 --> 00:23.200] Sat outside with my jacket on, even though I had shorts on.
[00:23.700 --> 00:25.920] A number of other people were the same.
[00:25.920 --> 00:33.080] The port city here is just absolutely beautiful with the mountains surrounding it.
[00:33.820 --> 00:41.400] It started to rain early this morning, and apparently it's supposed to continue for the next few days.
[00:42.140 --> 00:48.760] So I don't know. The Glacier tour tomorrow is probably going to be in the rain.
[00:48.760 --> 00:55.940] But as long as the winds are not blowing terribly and the seas are rough, it'll be all right.
[00:56.300 --> 01:05.000] So anyway, while I'm off to Seward, I want to stop at the Worthington Glacier, if I can see it in the rain and fog.
[01:06.480 --> 01:13.380] Once we get out of the Chugach Mountains here, it'll probably be back up into the 70s like it was yesterday.
[01:13.380 --> 01:18.960] So anyway, that's all I've got to say. I'm going to drive around a bit, get gas, and then head out.

Transcription speed: 1.04 audio seconds/s

large-v2
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.060 --> 00:07.620] all right well before I leave Valdez I thought I would just give a little wrap
[00:07.620 --> 00:12.320] up I spent the night at the Glacier Inn I thought they had a restaurant here
[00:12.320 --> 00:16.840] but they didn't so anyway I went to a place called the Fat Mermaid which was
[00:16.840 --> 00:24.800] quite good sat outside with my jacket on even though I had shorts on a number
[00:24.800 --> 00:30.020] of other people were the same so the port city here is just absolutely
[00:30.020 --> 00:38.040] beautiful with the mountains surrounding it and it's started to rain early this
[00:38.040 --> 00:43.220] morning and apparently it's supposed to continue for the next few days so I
[00:43.220 --> 00:49.480] don't know my Glacier tour tomorrow is probably going to be in the rain but as
[00:49.480 --> 00:55.520] long as you know the winds not blowing terribly and the seas are rough it'll
[00:55.520 --> 01:01.340] be all right so anyway well I'm off to Seward I want to stop at the Worthington
[01:01.340 --> 01:08.340] Glacier if I can see it in the rain and fog you know once we get out of the
[01:08.960 --> 01:13.320] Chugach mountains here it'll probably be back up into the 70s like it was
[01:13.320 --> 01:17.840] yesterday so anyway that's all I got to say I'm going to drive around a bit get
[01:17.840 --> 01:19.320] gas and then head out

Transcription speed: 0.83 audio seconds/s

large-v2 beam size = 1
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.060 --> 00:07.960] All right, well before I leave Valdez I thought I would just give a little wrap up.
[00:08.220 --> 00:10.640] I spent the night at the Glacier Inn.
[00:10.740 --> 00:13.320] I thought they had a restaurant here but they didn't.
[00:13.380 --> 00:17.540] So anyway I went to a place called the Fat Mermaid which was quite good.
[00:18.320 --> 00:23.580] Sat outside with my jacket on even though I had shorts on.
[00:24.240 --> 00:26.280] A number of other people were the same.
[00:26.280 --> 00:32.920] So the port city here is just absolutely beautiful with the mountains surrounding
[00:32.920 --> 00:33.360] it.
[00:33.760 --> 00:40.720] And it's started to rain early this morning and apparently it's supposed to continue for
[00:40.720 --> 00:41.900] the next few days.
[00:42.540 --> 00:49.480] So I don't know, the Glacier tour tomorrow is probably going to be in the rain but as
[00:49.480 --> 00:56.300] long as the winds not blowing terribly and the seas are rough it will be all right.
[00:56.600 --> 01:03.480] So anyway while I'm off to Seward I want to stop at the Worthington Glacier if I can see
[01:03.480 --> 01:05.480] it in the rain and fog.
[01:06.600 --> 01:12.060] You know once we get out of the Chugach Mountains here it will probably be back up into the
[01:12.060 --> 01:13.820] 70's like it was yesterday.
[01:14.260 --> 01:16.160] So anyway that's all I got to say.
[01:16.160 --> 01:19.320] I want to drive around a bit, get gas and then head out.

Transcription speed: 1.04 audio seconds/s

medium beam size=5
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.060 --> 00:07.760] all right well before I leave Valdez I thought I would just give a little wrap
[00:07.760 --> 00:12.680] up I spent the night at the Glacier Inn I thought they had a restaurant here but
[00:12.680 --> 00:16.940] they didn't so anyway I went to a place called the Fat Mermaid which was
[00:16.940 --> 00:23.460] quite good and sat outside with my jacket on even though I had shorts on
[00:24.140 --> 00:29.580] a number of other people were the same so the port city here is just
[00:29.580 --> 00:37.020] absolutely beautiful with the mountains surrounding it and it's started to rain
[00:37.020 --> 00:41.420] early this morning and apparently it's supposed to continue for the next few
[00:41.420 --> 00:48.540] days so I don't know my the glacier tour tomorrow is probably going to be
[00:48.540 --> 00:54.260] in the rain but as long as you know the winds not blowing terribly and the
[00:54.260 --> 00:59.420] seas are rough it'll be alright so anyway well I'm off to Seward I want
[00:59.420 --> 01:06.900] to stop at the Worthington Glacier if I can see it in the rain and fog you
[01:06.900 --> 01:11.460] know once we get out of the the Chugak Mountains here it'll probably be back
[01:11.460 --> 01:16.380] up into the 70s like it was yesterday so anyway that's all I got to say I
[01:16.380 --> 01:19.660] want to drive around a bit get gas and then head out

Transcription speed: 1.72 audio seconds/s

medium beam size=1
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:01.220 --> 00:08.020] Alright, well, before I leave Valdez, I thought I would just give a little wrap-up.
[00:08.580 --> 00:10.700] I spent the night at the Glacier Inn.
[00:10.780 --> 00:13.360] I thought they had a restaurant here, but they didn't.
[00:13.460 --> 00:17.980] So anyway, I went to a place called the Fat Mermaid, which was quite good.
[00:18.520 --> 00:23.880] Sat outside with my jacket on, even though I had shorts on.
[00:24.240 --> 00:26.400] A number of other people were the same.
[00:26.400 --> 00:33.480] So, the port city here is just absolutely beautiful with the mountains surrounding it.
[00:34.080 --> 00:42.000] And it's started to rain early this morning, and apparently it's supposed to continue for the next few days.
[00:42.840 --> 00:49.140] So I don't know, the glacier tour tomorrow is probably going to be in the rain,
[00:49.140 --> 00:56.340] but as long as the wind's not blowing terribly and the seas are rough, it'll be alright.
[00:57.320 --> 01:05.620] So anyway, while I'm off to Seward, I want to stop at the Worthington Glacier, if I can see it, in the rain and fog.
[01:06.800 --> 01:13.980] You know, once we get out of the Chugak Mountains here, it'll probably be back up into the 70s like it was yesterday.
[01:14.740 --> 01:19.440] Anyway, that's all I got to say. I want to drive around a bit, get gas, and then head out.

Transcription speed: 1.82 audio seconds/s

base
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.600 --> 00:08.920] All right. Well, before I leave Valdez, I thought I would just give a little wrap up. I spent
[00:08.920 --> 00:13.500] the night at the Glacier Inn. I thought they had a restaurant here, but they didn't. So,
[00:13.560 --> 00:20.000] anyway, I went to a place called the Fat Murmaid, which was quite good. I sat outside with my
[00:20.000 --> 00:28.840] jacket on, even though I had shorts on. A number of other people were the same. So, the Port City
[00:28.840 --> 00:37.980] air is just absolutely beautiful with the mountains surrounding it. And it started to rain early this
[00:37.980 --> 00:46.100] morning and apparently it's supposed to continue for the next few days. So, I don't know. My Glacier
[00:46.100 --> 00:53.840] tour tomorrow is probably going to be in the rain, but as long as the winds not blowing terribly in
[00:53.840 --> 00:59.700] the seas are rough, it'll be all right. So, anyway, while I'm off to Suord, I want to stop
[00:59.700 --> 01:07.760] at the Worthington Glacier if I can see it in the rain and fog. You know, once we get out
[01:07.760 --> 01:15.040] of the Chugac Mountains here, it'll probably be back up into the 70s, like it was yesterday. So, anyway,
[01:15.240 --> 01:19.260] that's all I got to say. I'm going to drive around a bit, get gas, and then head out.

Transcription speed: 11.32 audio seconds/s

tiny
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.660 --> 00:08.140] All right, well, before I leave out these, I thought I would just give a little wrap-up
[00:08.140 --> 00:13.400] I spent the night at the glacier and I thought they had a restaurant here, but they didn't.
[00:13.560 --> 00:18.240] So anyway, I went to a place called the Fatmermade, which was quite good.
[00:18.680 --> 00:25.880] Sad outside with my jacket on, even though I had shorts on, a number of other people were
[00:25.880 --> 00:26.400] the same.
[00:26.400 --> 00:33.840] So the port city here is just absolutely beautiful with the mountains and surrounding it.
[00:33.840 --> 00:40.560] And it started to rain early this morning and apparently it's supposed to continue
[00:40.560 --> 00:42.060] for the next few days.
[00:42.920 --> 00:49.880] So I don't know, my glacier tour tomorrow is probably going to be in the rain, but as long
[00:49.880 --> 00:56.400] as the wind's not blowing terribly and the seas are rough, it'll be all right.
[00:57.320 --> 01:03.300] So anyway, while I'm off to Suhr, I want to stop at the Worthington Glacier, if I can
[01:03.300 --> 01:11.440] see it in the rain and fog, once we get out of the Chugak Mountains here, it'll probably be back
[01:11.440 --> 01:17.260] up into the 70s, like it was yesterday, so anyway, that's all I got to say, I want to drive around
[01:17.260 --> 01:19.380] a bit, get gas and then head out.

1 reply

Purfview Sep 3, 2023
Maintainer

Try -m=large-v2 -prompt="Alright, well, before I leave Valdez, I thought I would just give a little wrap-up."

wwaag76 · 2023-09-03T04:22:53Z

wwaag76
Sep 3, 2023

Adding a prompt when running with beam size 5 for large-v2 does result in punctuation. However, from a practical standpoint (for me at least) adding a prompt for 90 files seems unreasonable. In any case, here are the 2 runs.

Model large-v2 beamsize = 5

Prompt: "Alright, well, before I leave Valdez, I thought I would just give a little wrap-up."
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.640 --> 00:07.940] Alright, well, before I leave Valdez, I thought I would just give a little wrap-up.
[00:08.200 --> 00:13.320] I spent the night at the Glacier Inn. I thought they had a restaurant here, but they didn't.
[00:13.380 --> 00:17.540] So anyway, I went to a place called the Fat Mermaid, which was quite good.
[00:18.460 --> 00:23.560] Sat outside with my jacket on, even though I had shorts on.
[00:24.180 --> 00:28.040] A number of other people were the same, so...
[00:28.040 --> 00:33.380] The Port City here is just absolutely beautiful with the mountains surrounding it.
[00:33.980 --> 00:41.920] And it's started to rain early this morning, and apparently it's supposed to continue for the next few days.
[00:42.780 --> 00:49.080] So I don't know. The Glacier tour tomorrow is probably going to be in the rain,
[00:49.200 --> 00:56.120] but as long as the wind's not blowing terribly and the seas are rough, it'll be alright.
[00:56.120 --> 01:05.500] So anyway, while I'm off to Seward, I want to stop at the Worthington Glacier, if I can see it in the rain and fog.
[01:06.480 --> 01:13.840] You know, once we get out of the Chugach Mountains here, it'll probably be back up into the 70s like it was yesterday.
[01:14.220 --> 01:19.280] So anyway, that's all I've got to say. I'm going to drive around a bit, get gas, and then head out.

Transcription speed: 0.84 audio seconds/s

Prompt: "Alright,"
Starting transcription on: I:\Alaska 2023\Zoom\230624-095338.WAV

[00:00.060 --> 00:07.620] All right, well, before I leave Valdez I thought I would just give a little wrap
[00:07.620 --> 00:12.600] up. I spent the night at the Glacier Inn. I thought they had a restaurant here, but
[00:12.600 --> 00:17.120] they didn't. So anyway, I went to a place called the Fat Mermaid, which was quite
[00:17.120 --> 00:25.000] good. I sat outside with my jacket on, even though I had shorts on. A number of
[00:25.000 --> 00:30.580] other people were the same. So the Port City here is just absolutely beautiful
[00:30.580 --> 00:38.440] with the mountains surrounding it. And it's started to rain early this morning
[00:38.440 --> 00:43.720] and apparently it's supposed to continue for the next few days. So I don't know.
[00:44.880 --> 00:50.160] The Glacier tour tomorrow is probably going to be in the rain, but as long as
[00:50.160 --> 00:56.300] the wind's not blowing terribly and the seas are rough, it'll be all right.
[00:56.600 --> 01:02.920] So anyway, while I'm off to Seward, I want to stop at the Worthington Glacier if I
[01:02.920 --> 01:09.500] can see it in the rain and fog. You know, once we get out of the Chugach
[01:09.500 --> 01:14.400] Mountains here, it'll probably be back up into the 70s like it was yesterday. So
[01:14.400 --> 01:18.660] anyway, that's all I got to say. I'm gonna drive around a bit, get gas, and then
[01:18.660 --> 01:19.320] head out.

Transcription speed: 0.78 audio seconds/s

9 replies

wwaag76 Sep 3, 2023

I just added a "?" to the other two and got the same results. I'll try it with some other source materials tomorrow. I'll also rerun the 90 wav files I have in this collection and see if punctuation is added on all files.

wwaag76 Sep 3, 2023

Quick question. Is there a way to output selected formats--e.g. srt, txt, and text. At the moment, I write "all" and then delete the unwanted ones. Thanks.

Purfview Sep 3, 2023
Maintainer

There is no such way.

wwaag76 Sep 3, 2023

Just re-ran the 89 wav files in my collection. Punctuation OK on all except 5. Of interest, the five wav files started with the word "just". For example, "just saw a bear". Tried adding "just" as a prompt, but the same result.

Purfview Sep 3, 2023
Maintainer

I've tested various samples with ",.?!" prompt, no negative impact, probably I'll add it as default in v149.1.

Punctuation OK on all except 5. Of interest, the five wav files started with the word "just".

Is transcription without prompt better on those 5 files? Try ",.?!".

wwaag76 · 2023-09-04T02:26:43Z

wwaag76
Sep 4, 2023

Just tried it on with the extra characters. No cigar. Same as before. However, I ran the same thing on my laptop and the initial "just'" was correctly capitalized. Here are the 2 runs

Desktop i7-8750K Win10
00:00:00,440 --> 00:00:08,460
just stopped at the Jade store in
Jade City for a look about all kinds of

2
00:00:08,460 --> 00:00:15,540
jewelry and stuff but boy it is
very very expensive so anyway I remember

3
00:00:15,540 --> 00:00:20,740
stopping here 20 years
ago so it hasn't changed much

Laptop--Win11
1
00:00:00,440 --> 00:00:08,480
Just stopped at the Jade store in
Jade City for a look about all kinds of

2
00:00:08,480 --> 00:00:15,540
jewelry and stuff but boy it is
very very expensive so anyway I remember

3
00:00:15,540 --> 00:00:20,720
stopping here 20 years
ago so it hasn't changed much

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The failure to break sentences correctly #45

{{title}}

Replies: 6 comments 13 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

The failure to break sentences correctly #45

Replies: 6 comments · 13 replies

Purfview Jul 27, 2023 Maintainer

despairTK Jul 31, 2023 Author

Purfview Aug 7, 2023 Maintainer

Purfview Sep 3, 2023 Maintainer

Purfview Sep 3, 2023 Maintainer

Purfview Sep 3, 2023 Maintainer

Purfview Sep 3, 2023 Maintainer

Replies: 6 comments 13 replies

Purfview
Jul 27, 2023
Maintainer

despairTK Jul 31, 2023
Author

Purfview
Aug 7, 2023
Maintainer

Purfview Sep 3, 2023
Maintainer

Purfview Sep 3, 2023
Maintainer

Purfview Sep 3, 2023
Maintainer

Purfview Sep 3, 2023
Maintainer