Replies: 6 comments 13 replies
-
Looks like it doesn't detect end of the sentence. Try different |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Just ran a sample wav file that was about a minute and a half long with various models and beam sizes. For beam size = 5, punctuation was added only for the Base and Tiny models. However, setting beam size = 1, then large-v1, large-v2, and medium were also punctuated. So at least on my samples, settings beam size to 1 seems to be the key. Note that transcribing was correct for large-v2, large-v1 and medium models. Here are the actual results. large-v1 beam size = 5 [00:00.060 --> 00:07.280] all right well before I leave Valdez I thought I would just give a little wrap Transcription speed: 1.04 audio seconds/s laarage-v1 beamn size = 1 [00:00.060 --> 00:07.640] All right, well, before I leave Valdez, I thought I would just give a little wrap-up. Transcription speed: 1.04 audio seconds/s large-v2 [00:00.060 --> 00:07.620] all right well before I leave Valdez I thought I would just give a little wrap Transcription speed: 0.83 audio seconds/s large-v2 beam size = 1 [00:00.060 --> 00:07.960] All right, well before I leave Valdez I thought I would just give a little wrap up. Transcription speed: 1.04 audio seconds/s medium beam size=5 [00:00.060 --> 00:07.760] all right well before I leave Valdez I thought I would just give a little wrap Transcription speed: 1.72 audio seconds/s medium beam size=1 [00:01.220 --> 00:08.020] Alright, well, before I leave Valdez, I thought I would just give a little wrap-up. Transcription speed: 1.82 audio seconds/s base [00:00.600 --> 00:08.920] All right. Well, before I leave Valdez, I thought I would just give a little wrap up. I spent Transcription speed: 11.32 audio seconds/s tiny [00:00.660 --> 00:08.140] All right, well, before I leave out these, I thought I would just give a little wrap-up |
Beta Was this translation helpful? Give feedback.
-
Adding a prompt when running with beam size 5 for large-v2 does result in punctuation. However, from a practical standpoint (for me at least) adding a prompt for 90 files seems unreasonable. In any case, here are the 2 runs. Model large-v2 beamsize = 5 Prompt: "Alright, well, before I leave Valdez, I thought I would just give a little wrap-up." [00:00.640 --> 00:07.940] Alright, well, before I leave Valdez, I thought I would just give a little wrap-up. Transcription speed: 0.84 audio seconds/s Prompt: "Alright," [00:00.060 --> 00:07.620] All right, well, before I leave Valdez I thought I would just give a little wrap Transcription speed: 0.78 audio seconds/s |
Beta Was this translation helpful? Give feedback.
-
Just tried it on with the extra characters. No cigar. Same as before. However, I ran the same thing on my laptop and the initial "just'" was correctly capitalized. Here are the 2 runs Desktop i7-8750K Win10 2 3 Laptop--Win11 2 3 |
Beta Was this translation helpful? Give feedback.
-
Regarding the discussion about the problem of not being able to break sentences correctly when transcribing subtitles, I have tried many ways and still can't solve it. Please see the screenshot.
I have summarized the possible reasons for the failure to break sentences correctly in the screenshot. Generally, it only occurs in videos over 30 minutes long and the speaker speaks more frequently, so I usually only use medium.en, large-v2 and medium models for the sake of correct subtitle. However, it is often possible that a lower level model is needed to generate correctly broken subtitles for such videos.
And I've tried to use different example sentences to improve it and it doesn't always work 100% of the time.
That's why I wanted to open this discussion to share if there is anyone out there who can fix this situation.
Beta Was this translation helpful? Give feedback.
All reactions