Skip to content

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

License

Notifications You must be signed in to change notification settings

matatonic/openedai-whisper

Repository files navigation

OpenedAI Whisper

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

  • Compatible with the OpenAI audio/transcriptions and audio/translations API
  • Does not connect to the OpenAI API and does not require an OpenAI API Key
  • Not affiliated with OpenAI in any way

API Compatibility:

  • /v1/audio/transcriptions
  • /v1/audio/translations

Parameter Support:

  • file
  • model (only whisper-1 exists, so this is ignored)
  • language
  • prompt (not yet supported)
  • temperature
  • response_format:
    • json
    • text
    • srt
    • vtt
    • verbose_json *(partial support, some fields missing)

Details:

  • CUDA or CPU support (automatically detected)
  • float32, float16 or bfloat16 support (automatically detected)

Tested whisper models:

  • openai/whisper-large-v2 (the default)
  • openai/whisper-large-v3
  • distil-whisper/distil-medium.en
  • openai/whisper-tiny.en
  • ...

Version: 0.1.0, Last update: 2024-03-15

API Documentation

Usage

Installation instructions

You will need to install CUDA for your operating system if you want to use CUDA.

# Install the Python requirements
pip install -r requirements.txt
# install ffmpeg
sudo apt install ffmpeg

Usage

Usage: whisper.py [-m <model_name>] [-d <device>] [-t <dtype>] [-P <port>] [-H <host>] [--preload]


Description:
OpenedAI Whisper API Server

Options:
-h, --help            Show this help message and exit.
-m MODEL, --model MODEL
                      The model to use for transcription.
                      Ex. distil-whisper/distil-medium.en (default: openai/whisper-large-v2)
-d DEVICE, --device DEVICE
                      Set the torch device for the model. Ex. cuda:1 (default: auto)
-t DTYPE, --dtype DTYPE
                      Set the torch data type for processing (float32, float16, bfloat16) (default: auto)
-P PORT, --port PORT  Server tcp port (default: 8000)
-H HOST, --host HOST  Host to listen on, Ex. 0.0.0.0 (default: localhost)
--preload             Preload model and exit. (default: False)

Sample API Usage

You can use it like this:

curl -s http://localhost:8000/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F model="whisper-1" -F file="@audio.mp3" -F response_format=text

Or just like this:

curl -s http://localhost:8000/v1/audio/transcriptions -F model="whisper-1" -F file="@audio.mp3"

Or like this example from the OpenAI Speech to text guide Quickstart:

from openai import OpenAI
client = OpenAI(api_key='sk-1111', base_url='http://localhost:8000/v1')

audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
print(transcription.text)

Docker support

You can run the server via docker like so:

docker compose build
docker compose up

Options can be set via whisper.env.

About

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published