From f4f4da2dcabd9baa69badb01063f39db64ba51c1 Mon Sep 17 00:00:00 2001 From: Sihan Chen <39623753+Spycsh@users.noreply.github.com> Date: Thu, 29 Aug 2024 22:01:45 +0800 Subject: [PATCH] add AudioQnA readme with supported model (#689) * add readme with supported model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add explaination --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- AudioQnA/README.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 AudioQnA/README.md diff --git a/AudioQnA/README.md b/AudioQnA/README.md new file mode 100644 index 0000000000..2ebf6162c3 --- /dev/null +++ b/AudioQnA/README.md @@ -0,0 +1,34 @@ +# AudioQnA Application + +AudioQnA is an example that demonstrates the integration of Generative AI (GenAI) models for performing question-answering (QnA) on audio files, with the added functionality of Text-to-Speech (TTS) for generating spoken responses. The example showcases how to convert audio input to text using Automatic Speech Recognition (ASR), generate answers to user queries using a language model, and then convert those answers back to speech using Text-to-Speech (TTS). + +## Deploy AudioQnA Service + +The AudioQnA service can be deployed on either Intel Gaudi2 or Intel XEON Scalable Processor. + +### Deploy AudioQnA on Gaudi + +Refer to the [Gaudi Guide](./docker/gaudi/README.md) for instructions on deploying AudioQnA on Gaudi. + +### Deploy AudioQnA on Xeon + +Refer to the [Xeon Guide](./docker/xeon/README.md) for instructions on deploying AudioQnA on Xeon. + +## Supported Models + +### ASR + +The default model is [openai/whisper-small](https://huggingface.co/openai/whisper-small). It also supports all models in the Whisper family, such as `openai/whisper-large-v3`, `openai/whisper-medium`, `openai/whisper-base`, `openai/whisper-tiny`, etc. + +To replace the model, please edit the `compose.yaml` and add the `command` line to pass the name of the model you want to use: + +```yml +services: + whisper-service: + ... + command: --model_name_or_path openai/whisper-tiny +``` + +### TTS + +The default model is [microsoft/SpeechT5](https://huggingface.co/microsoft/speecht5_tts). We currently do not support replacing the model. More models under the commercial license will be added in the future.