-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recognizing technical terms #38
Comments
We have intentionally kept the model simple, such as no timestamps and no other special tokens, no special prompts etc. However, industrial setting use case indeed is one of our targets, and recognizing technical terms, domain specific abbreviations etc., are indeed important in those settings. Prompting the decoder indeed is a generic solution that we plan on pursuing. Will post updates here when we have some. |
That's great to hear! I will wait for any updates also concerning your plans if it is possible so that we can exchange ideas |
@keveman My main usecase involves recognizing a small, pre-defined set of words and phrases, with a low false positive rate, so I'm glad to hear this is something you're already thinking about. |
When transcribing long audio files, we need to split them into chunks under 30 seconds each. The prompt feature helps smooth out these transitions by using the previous chunk's transcription as context. This makes the output more natural, especially at the boundaries between chunks. Here us an example when transcribing a long audio without previous text - you'll notice the text doesn't flow smoothly between chunks. |
First of all, great work!
Considering that technical jargon may be used in real scenarios (especially when working in industrial settings), is there a way to improve recognition of these terms without fine-tuning?
As an example, OpenAI Whisper enables the possibility of using a prompt (openai/whisper#963 (comment)) that is passed to the decoder before the actual audio. In this way, the inference time increases a bit, but the resulting performance improves noticeably.
Thanks
The text was updated successfully, but these errors were encountered: