Powered by 🤗 Transformers & Optimum and based on Vaibhavs10/insanely-fast-whisper.
TL;DR - 🎙️ Transcribe 300 minutes (5 hours) of audio in less than 10 minutes - with OpenAI's Whisper Large v2. Blazingly fast transcription is now a reality!⚡️
✨ ASR Model: Choose from different 🤗 Hugging Face ASR models, including all sizes of openai/whisper and even use an English-only variant (for non-large models).
🚀 Performance: Customizable optimizations ASR processing with options for batch size, data type, and BetterTransformer, all from the comfort of your terminal! 😎
📝 Timestamps: Get an SRT output file with accurate timestamps, allowing you to create subtitles for your audio or video content.
Coming soon.
insanely-fast-whisper --model openai/whisper-base --device cuda:0 --dtype float32 --batch-size 8 --better-transformer --chunk-length 30 your_audio_file.wav
model
: Specify the ASR model (default is "openai/whisper-base").device
: Choose the computation device (default is "cuda:0").dtype
: Set the data type for computation ("float32" or "float16").batch-size
: Adjust the batch size for processing (default is 8).better-transformer
: Use BetterTransformer for improved processing (flag).chunk-length
: Define audio chunk length in seconds (default is 30).
Transcribing an audio file with English-only Whisper model and returning timestamps:
insanely-fast-whisper --model openai/whisper-base.en your_audio_file.wav
The tool will save an SRT transcription of your audio file in the current working directory.
This project is licensed under the MIT License.
- This tool is powered by Hugging Face's ASR models, primarily Whisper by OpenAI.
- Optimizations are developed by Vaibhavs10/insanely-fast-whisper.
- Developed by @ochen1.
Have questions or feedback? Feel free to create an issue!
🌟 Star this repository if you find it helpful!
🚀 Happy transcribing with Insanely Fast Whisper! 🚀