Whisper Audio Transcription

This project utilizes OpenAI's Whisper model to transcribe audio in near real-time. It records audio from the user's microphone, segments the audio into 5-second chunks, and feeds these chunks to Whisper for transcription. This method enables continuous audio processing and transcription.

Users can start and pause the recording using the Space key and exit the application with the Esc key. Upon exiting, the application will either display the transcribed text on the screen or save it to a file. The output includes a word-by-word breakdown of the transcription with timestamps, confidence scores, and volume information for each word, where volume is calculated using the root mean square (RMS) of the audio chunks.

The application supports audio transcription in any language supported by the Whisper model and can translate the audio from any language into English. Users select the transcription task by setting the TASK variable in the config.py file.

The audio language can be hinted by setting the LANGUAGE_CODE variable in config.py, or the application will attempt to detect the language automatically.

Features

Near real-time audio recording and transcription.
Utilizes OpenAI's Whisper model for accurate transcription.
Provides detailed transcription including timestamps, confidence scores, and volume information (only for TASK set to transcribe).
Supports multiple languages with automatic language detection.
Translation option to transcribe audio in English.

Getting Started

Prerequisites

Python 3.9 or later.
pyenv and pyenv-virtualenv for managing Python versions and virtual environments.
pip for installing Python packages.

Installation

Clone the repository:

git clone https://github.com/hypercliq/audio-transcriber.git
cd audio-transcriber

Set up a Python virtual environment using pyenv:

pyenv virtualenv 3.9.0 audio-transcriber-3.9.0
pyenv local audio-transcriber-3.9.0

Install the required dependencies:

pip install -r requirements.txt

PyAudio may require additional dependencies to be installed on your system. Please refer to the PyAudio documentation for more information.

Usage

To start the transcription, run:

python main.py

After running the command, you will be prompted to choose the microphone to use from the list of available devices and to select the sample rate from the supported rates for your chosen microphone.
Press Space to toggle recording on and off. This allows you to control when the application is actively recording audio.
Press Esc to safely exit the application. Upon exiting, any remaining audio data will be processed, and the transcription results will either be displayed on the screen or saved to a file, based on your configuration settings.

Configuration

Adjust project settings, such as the Whisper model size and recording duration, in the config.py file.

Contributing

Contributions to improve the project are welcome. Please follow these steps to contribute:

Fork the repository.
Create a new branch: git checkout -b feature/your-feature-name.
Make your changes and commit them: git commit -am 'Add some feature'.
Push to the branch: git push origin feature/your-feature-name.
Submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Thanks to OpenAI for providing the Whisper model.
Gratitude to the Python community for its excellent ecosystem of tools and libraries.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Audio Transcription

Features

Getting Started

Prerequisites

Installation

Usage

Configuration

Contributing

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

hypercliq/audio-transcriber

Folders and files

Latest commit

History

Repository files navigation

Whisper Audio Transcription

Features

Getting Started

Prerequisites

Installation

Usage

Configuration

Contributing

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages