-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix attention caching to make transcription run 30% faster #370
Conversation
All thanks to @ritheshkumar95 for finding this. |
Can repro on my end as well 17:38 wav file
Really elegant catch! 🙌 learnt something about Python today 🙂 |
Thank you very much! This was an oversight when I was factoring out the caching part for the open source repo. |
@vickianand @ritheshkumar95 |
Wow, this is great news, thank you to all involved. Title says 30% faster on GPU but the speed up is much more on CPU, at least for me. Ryzen 5 4500U / 6 core laptop CPU. 6m30s youtube video, English (https://www.youtube.com/watch?v=GFu64hnqzVo):
|
For a
dict.get(a, f())
call,f()
is run even when thedict
already hasa
. This is causing the bug in attention caching logic.So making this change fixes the caching logic, and gets a sweet (expected) speedup of close to 30%.
For a close to 5 min test audio, on an Nvidia Quadro RTX 8000 gpu:
large
model took 102.8s, and after this fix it takes 68.9smedium.en
model took 55.1s, and after this fix it takes 39.7s