Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thank you very much for the improvements to whisper, could you have more clarity on whether there is an "hallucination" problem, i.e. an issue with duplicate output in other languages #61

Closed
isbn9877007 opened this issue Nov 20, 2023 · 2 comments

Comments

@isbn9877007
Copy link

isbn9877007 commented Nov 20, 2023

And can you give the amount of video memory required to run the model? Does the windows system require additional configuration?

@Vaibhavs10
Copy link
Owner

Vaibhavs10 commented Nov 20, 2023

It still suffers from hallucination a bit. However, we have a fix for that coming up shortly.
See here: huggingface/transformers#27492

The max GPU VRAM should be around 12GB.

It should work out of the box on Windows (as long as you have a GPU) - do check out the FAQs for more info: https://github.com/Vaibhavs10/insanely-fast-whisper#frequently-asked-questions

(I'm closing this issue for now, feel free to re-open)

@152334H
Copy link

152334H commented Dec 6, 2023

the code (on whisper v2/v3, with chunk_length_s=30) seems to still produce repeated text on long transcriptions, even on the latest transformers commit (i'll try to add examples later). behaviour doesn't occur on faster-whisper

adding repetition penalty kind of fixes this, but also still has unwanted consequences on the general quality of the transcript. beam search doesn't fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants