Mimics the Huggingface Inference API for a speech recognition model.
This is an example with an image which includes a Wav2Vec2 model for the catalan language
> docker run -p 8000:8000 -d ghcr.io/ccoreilly/wav2vec2-catala:0.1.0
> curl -X POST localhost:8000/recognize -F "[email protected]"
{"text":"bon vesprà a totes i tots donem començament al ple ordinari convocat per avui trenta de setembre de dos mil vint-i-u a les vuit hores en el saló de plens d'ací de l'ajuntament de massanassa"}
Using the ONNX model format results in an increase in inference speed when using a CPU. You can convert any Wav2Vec2ForCTC model from the huggingface model hub using the convert_torch_to_onnx.py
script:
> python3 -m venv .venv
> source .venv/bin/activate
> pip install -r requirements.txt
> python convert_torch_to_onnx.py --model ccoreilly/wav2vec2-large-xlsr-catala
You can also quantize the model to reduce its size
> python convert_torch_to_onnx.py --model ccoreilly/wav2vec2-large-xlsr-catala --quantize