Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recognizing technical terms #38

Open
cfasana opened this issue Oct 29, 2024 · 4 comments
Open

Recognizing technical terms #38

cfasana opened this issue Oct 29, 2024 · 4 comments

Comments

@cfasana
Copy link

cfasana commented Oct 29, 2024

First of all, great work!
Considering that technical jargon may be used in real scenarios (especially when working in industrial settings), is there a way to improve recognition of these terms without fine-tuning?

As an example, OpenAI Whisper enables the possibility of using a prompt (openai/whisper#963 (comment)) that is passed to the decoder before the actual audio. In this way, the inference time increases a bit, but the resulting performance improves noticeably.

Thanks

@keveman
Copy link
Contributor

keveman commented Oct 29, 2024

We have intentionally kept the model simple, such as no timestamps and no other special tokens, no special prompts etc. However, industrial setting use case indeed is one of our targets, and recognizing technical terms, domain specific abbreviations etc., are indeed important in those settings. Prompting the decoder indeed is a generic solution that we plan on pursuing. Will post updates here when we have some.

@cfasana
Copy link
Author

cfasana commented Oct 30, 2024

That's great to hear!
Indeed, the industrial setting is a very interesting field both for what concerns technical terms and noise.

I will wait for any updates also concerning your plans if it is possible so that we can exchange ideas

@curiositry
Copy link

@keveman My main usecase involves recognizing a small, pre-defined set of words and phrases, with a low false positive rate, so I'm glad to hear this is something you're already thinking about.

@ocavue
Copy link

ocavue commented Dec 25, 2024

When transcribing long audio files, we need to split them into chunks under 30 seconds each. The prompt feature helps smooth out these transitions by using the previous chunk's transcription as context. This makes the output more natural, especially at the boundaries between chunks.

Here us an example when transcribing a long audio without previous text - you'll notice the text doesn't flow smoothly between chunks.

CleanShot 2024-12-25 at 23 25 34@2x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants