-
Notifications
You must be signed in to change notification settings - Fork 200
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add AudioQnA readme with supported model (#689)
* add readme with supported model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add explaination --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
1e47444
commit f4f4da2
Showing
1 changed file
with
34 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# AudioQnA Application | ||
|
||
AudioQnA is an example that demonstrates the integration of Generative AI (GenAI) models for performing question-answering (QnA) on audio files, with the added functionality of Text-to-Speech (TTS) for generating spoken responses. The example showcases how to convert audio input to text using Automatic Speech Recognition (ASR), generate answers to user queries using a language model, and then convert those answers back to speech using Text-to-Speech (TTS). | ||
|
||
## Deploy AudioQnA Service | ||
|
||
The AudioQnA service can be deployed on either Intel Gaudi2 or Intel XEON Scalable Processor. | ||
|
||
### Deploy AudioQnA on Gaudi | ||
|
||
Refer to the [Gaudi Guide](./docker/gaudi/README.md) for instructions on deploying AudioQnA on Gaudi. | ||
|
||
### Deploy AudioQnA on Xeon | ||
|
||
Refer to the [Xeon Guide](./docker/xeon/README.md) for instructions on deploying AudioQnA on Xeon. | ||
|
||
## Supported Models | ||
|
||
### ASR | ||
|
||
The default model is [openai/whisper-small](https://huggingface.co/openai/whisper-small). It also supports all models in the Whisper family, such as `openai/whisper-large-v3`, `openai/whisper-medium`, `openai/whisper-base`, `openai/whisper-tiny`, etc. | ||
|
||
To replace the model, please edit the `compose.yaml` and add the `command` line to pass the name of the model you want to use: | ||
|
||
```yml | ||
services: | ||
whisper-service: | ||
... | ||
command: --model_name_or_path openai/whisper-tiny | ||
``` | ||
### TTS | ||
The default model is [microsoft/SpeechT5](https://huggingface.co/microsoft/speecht5_tts). We currently do not support replacing the model. More models under the commercial license will be added in the future. |