STT is a tool that converts speech in audio files into text. It utilizes whisper.cpp for transcription and ffmpeg for processing various audio/video formats, making it easy to transcribe spoken content into written text.
Follow these steps to set up STT on your local machine:
Clone the project to your local machine:
git clone https://github.com/miukyo/stt.git
cd stt
You’ll need ffmpeg and whisper.cpp for this tool to work properly.
- Install ffmpeg
- Build whisper.cpp (prebuilt binary is available on /lib folder, if you prefer building yourself please follow instruction from their repo)
Use pip to install required Python packages:
for running the program
pip install pywebview
for bundling the program into executable
pip install pyinstaller
After installing the dependencies, you can start transcribing your audio/video files.
Run the tool with the following command:
python main.py
To bundle the program run the following command:
pyinstaller main.spec
then copy lib
folder into dist
folder
- This is the initial release. Please report any bugs or issues you encounter.
Feel free to fork the repository, submit issues, or create pull requests to improve the tool. Contributions are welcome! License
This project is licensed under the MIT License - see the LICENSE file for details.