GitHub - matatonic/openedai-whisper: An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

OpenedAI Whisper

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

Compatible with the OpenAI audio/transcriptions and audio/translations API
Does not connect to the OpenAI API and does not require an OpenAI API Key
Not affiliated with OpenAI in any way

API Compatibility:

/v1/audio/transcriptions
/v1/audio/translations

Parameter Support:

Details:

CUDA or CPU support (automatically detected)
float32, float16 or bfloat16 support (automatically detected)

Tested whisper models:

openai/whisper-large-v2 (the default)
openai/whisper-large-v3
distil-whisper/distil-medium.en
openai/whisper-tiny.en
...

Version: 0.1.0, Last update: 2024-03-15

API Documentation

Usage

Installation instructions

You will need to install CUDA for your operating system if you want to use CUDA.

# Install the Python requirements
pip install -r requirements.txt
# install ffmpeg
sudo apt install ffmpeg

Usage

Usage: whisper.py [-m <model_name>] [-d <device>] [-t <dtype>] [-P <port>] [-H <host>] [--preload]


Description:
OpenedAI Whisper API Server

Options:
-h, --help            Show this help message and exit.
-m MODEL, --model MODEL
                      The model to use for transcription.
                      Ex. distil-whisper/distil-medium.en (default: openai/whisper-large-v2)
-d DEVICE, --device DEVICE
                      Set the torch device for the model. Ex. cuda:1 (default: auto)
-t DTYPE, --dtype DTYPE
                      Set the torch data type for processing (float32, float16, bfloat16) (default: auto)
-P PORT, --port PORT  Server tcp port (default: 8000)
-H HOST, --host HOST  Host to listen on, Ex. 0.0.0.0 (default: localhost)
--preload             Preload model and exit. (default: False)

Sample API Usage

You can use it like this:

curl -s http://localhost:8000/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F model="whisper-1" -F file="@audio.mp3" -F response_format=text

Or just like this:

curl -s http://localhost:8000/v1/audio/transcriptions -F model="whisper-1" -F file="@audio.mp3"

Or like this example from the OpenAI Speech to text guide Quickstart:

from openai import OpenAI
client = OpenAI(api_key='sk-1111', base_url='http://localhost:8000/v1')

audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
print(transcription.text)

Docker support

You can run the server via docker like so:

docker compose build
docker compose up

Options can be set via whisper.env.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
hf_home		hf_home
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
openedai.py		openedai.py
requirements.txt		requirements.txt
whisper.env		whisper.env
whisper.py		whisper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenedAI Whisper

API Documentation

Usage

Installation instructions

Usage

Sample API Usage

Docker support

About

Releases

Packages

Languages

License

matatonic/openedai-whisper

Folders and files

Latest commit

History

Repository files navigation

OpenedAI Whisper

API Documentation

Usage

Installation instructions

Usage

Sample API Usage

Docker support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages