Initiate "AvatarChatbot" (audio) example (#923)

Signed-off-by: Chun Tao <[email protected]> Signed-off-by: rbrugaro <[email protected]> Signed-off-by: ZePan110 <[email protected]> Signed-off-by: Louie Tsai <[email protected]> Signed-off-by: chen, suyue <[email protected]> Co-authored-by: rbrugaro <[email protected]> Co-authored-by: ZePan110 <[email protected]> Co-authored-by: kevinintel <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Louie Tsai <[email protected]> Co-authored-by: chen, suyue <[email protected]>
opea-project · Oct 23, 2024 · cfffb4c · cfffb4c
1 parent 41955f6
commit cfffb4c
Show file tree

Hide file tree

Showing 30 changed files with 1,776 additions and 0 deletions.
diff --git a/AvatarChatbot/.gitignore b/AvatarChatbot/.gitignore
@@ -0,0 +1,6 @@
+*.safetensors
+*.bin
+*.model
+*.log
+docker_compose/intel/cpu/xeon/data
+docker_compose/intel/hpu/gaudi/data
diff --git a/AvatarChatbot/Dockerfile b/AvatarChatbot/Dockerfile
@@ -0,0 +1,33 @@
+
+
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+FROM python:3.11-slim
+
+RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
+    libgl1-mesa-glx \
+    libjemalloc-dev \
+    vim \
+    git
+
+RUN useradd -m -s /bin/bash user && \
+    mkdir -p /home/user && \
+    chown -R user /home/user/
+
+WORKDIR /home/user/
+RUN git clone https://github.com/opea-project/GenAIComps.git
+WORKDIR /home/user/GenAIComps
+
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r /home/user/GenAIComps/requirements.txt
+
+COPY ./avatarchatbot.py /home/user/avatarchatbot.py
+
+ENV PYTHONPATH=$PYTHONPATH:/home/user/GenAIComps
+
+USER user
+
+WORKDIR /home/user
+
+ENTRYPOINT ["python", "avatarchatbot.py"]
diff --git a/AvatarChatbot/README.md b/AvatarChatbot/README.md
@@ -0,0 +1,105 @@
+# AvatarChatbot Application
+
+The AvatarChatbot service can be effortlessly deployed on either Intel Gaudi2 or Intel XEON Scalable Processors.
+
+## AI Avatar Workflow
+
+The AI Avatar example is implemented using both megaservices and the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different megaservices and microservices for this example.
+
+```mermaid
+---
+config:
+  flowchart:
+    nodeSpacing: 100
+    rankSpacing: 100
+    curve: linear
+  themeVariables:
+    fontSize: 42px
+---
+flowchart LR
+    classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef thistle fill:#D8BFD8,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
+    classDef invisible fill:transparent,stroke:transparent;
+    style AvatarChatbot-Megaservice stroke:#000000
+
+    subgraph AvatarChatbot-Megaservice["AvatarChatbot Megaservice"]
+        direction LR
+        ASR([ASR Microservice]):::blue
+        LLM([LLM Microservice]):::blue
+        TTS([TTS Microservice]):::blue
+        animation([Animation Microservice]):::blue
+    end
+    subgraph UserInterface["User Interface"]
+        direction LR
+        invis1[ ]:::invisible
+        USER1([User Audio Query]):::orchid
+        USER2([User Image/Video Query]):::orchid
+        UI([UI server<br>]):::orchid
+    end
+    GW([AvatarChatbot GateWay<br>]):::orange
+    subgraph .
+        direction LR
+        X([OPEA Microservice]):::blue
+        Y{{Open Source Service}}:::thistle
+        Z([OPEA Gateway]):::orange
+        Z1([UI]):::orchid
+    end
+
+    WHISPER{{Whisper service}}:::thistle
+    TGI{{LLM service}}:::thistle
+    T5{{Speecht5 service}}:::thistle
+    WAV2LIP{{Wav2Lip service}}:::thistle
+
+    %% Connections %%
+    direction LR
+    USER1 -->|1| UI
+    UI -->|2| GW
+    GW <==>|3| AvatarChatbot-Megaservice
+    ASR ==>|4| LLM ==>|5| TTS ==>|6| animation
+
+    direction TB
+    ASR <-.->|3'| WHISPER
+    LLM <-.->|4'| TGI
+    TTS <-.->|5'| T5
+    animation <-.->|6'| WAV2LIP
+
+    USER2 -->|1| UI
+    UI <-.->|6'| WAV2LIP
+```
+
+## Deploy AvatarChatbot Service
+
+The AvatarChatbot service can be deployed on either Intel Gaudi2 AI Accelerator or Intel Xeon Scalable Processor.
+
+### Deploy AvatarChatbot on Gaudi
+
+Refer to the [Gaudi Guide](./docker_compose/intel/hpu/gaudi/README.md) for instructions on deploying AvatarChatbot on Gaudi.
+
+### Deploy AvatarChatbot on Xeon
+
+Refer to the [Xeon Guide](./docker_compose/intel/cpu/xeon/README.md) for instructions on deploying AvatarChatbot on Xeon.
+
+## Supported Models
+
+### ASR
+
+The default model is [openai/whisper-small](https://huggingface.co/openai/whisper-small). It also supports all models in the Whisper family, such as `openai/whisper-large-v3`, `openai/whisper-medium`, `openai/whisper-base`, `openai/whisper-tiny`, etc.
+
+To replace the model, please edit the `compose.yaml` and add the `command` line to pass the name of the model you want to use:
+
+```yaml
+services:
+  whisper-service:
+    ...
+    command: --model_name_or_path openai/whisper-tiny
+```
+
+### TTS
+
+The default model is [microsoft/SpeechT5](https://huggingface.co/microsoft/speecht5_tts). We currently do not support replacing the model. More models under the commercial license will be added in the future.
+
+### Animation
+
+The default model is [Rudrabha/Wav2Lip](https://github.com/Rudrabha/Wav2Lip) and [TencentARC/GFPGAN](https://github.com/TencentARC/GFPGAN). We currently do not support replacing the model. More models under the commercial license such as [OpenTalker/SadTalker](https://github.com/OpenTalker/SadTalker) will be added in the future.
diff --git a/AvatarChatbot/assets/audio/eg3_ref.wav b/AvatarChatbot/assets/audio/eg3_ref.wav
diff --git a/AvatarChatbot/assets/audio/sample_minecraft.json b/AvatarChatbot/assets/audio/sample_minecraft.json
diff --git a/AvatarChatbot/assets/audio/sample_question.json b/AvatarChatbot/assets/audio/sample_question.json
diff --git a/AvatarChatbot/assets/audio/sample_whoareyou.json b/AvatarChatbot/assets/audio/sample_whoareyou.json
diff --git a/AvatarChatbot/assets/img/avatar1.jpg b/AvatarChatbot/assets/img/avatar1.jpg
diff --git a/AvatarChatbot/assets/img/avatar2.jpg b/AvatarChatbot/assets/img/avatar2.jpg
diff --git a/AvatarChatbot/assets/img/avatar3.png b/AvatarChatbot/assets/img/avatar3.png
diff --git a/AvatarChatbot/assets/img/avatar4.png b/AvatarChatbot/assets/img/avatar4.png
diff --git a/AvatarChatbot/assets/img/avatar5.png b/AvatarChatbot/assets/img/avatar5.png
diff --git a/AvatarChatbot/assets/img/avatar6.png b/AvatarChatbot/assets/img/avatar6.png
diff --git a/AvatarChatbot/assets/img/design.png b/AvatarChatbot/assets/img/design.png
diff --git a/AvatarChatbot/assets/img/flowchart.png b/AvatarChatbot/assets/img/flowchart.png
diff --git a/AvatarChatbot/assets/img/gaudi.png b/AvatarChatbot/assets/img/gaudi.png
diff --git a/AvatarChatbot/assets/img/opea_gh_qr.png b/AvatarChatbot/assets/img/opea_gh_qr.png
diff --git a/AvatarChatbot/assets/img/opea_qr.png b/AvatarChatbot/assets/img/opea_qr.png
diff --git a/AvatarChatbot/assets/img/xeon.jpg b/AvatarChatbot/assets/img/xeon.jpg
diff --git a/AvatarChatbot/assets/outputs/result_max_tokens_1024.mp4 b/AvatarChatbot/assets/outputs/result_max_tokens_1024.mp4
diff --git a/AvatarChatbot/assets/outputs/result_max_tokens_64.mp4 b/AvatarChatbot/assets/outputs/result_max_tokens_64.mp4
diff --git a/AvatarChatbot/avatarchatbot.py b/AvatarChatbot/avatarchatbot.py
@@ -0,0 +1,93 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import asyncio
+import os
+import sys
+
+from comps import AvatarChatbotGateway, MicroService, ServiceOrchestrator, ServiceType
+
+MEGA_SERVICE_HOST_IP = os.getenv("MEGA_SERVICE_HOST_IP", "0.0.0.0")
+MEGA_SERVICE_PORT = int(os.getenv("MEGA_SERVICE_PORT", 8888))
+ASR_SERVICE_HOST_IP = os.getenv("ASR_SERVICE_HOST_IP", "0.0.0.0")
+ASR_SERVICE_PORT = int(os.getenv("ASR_SERVICE_PORT", 9099))
+LLM_SERVICE_HOST_IP = os.getenv("LLM_SERVICE_HOST_IP", "0.0.0.0")
+LLM_SERVICE_PORT = int(os.getenv("LLM_SERVICE_PORT", 9000))
+TTS_SERVICE_HOST_IP = os.getenv("TTS_SERVICE_HOST_IP", "0.0.0.0")
+TTS_SERVICE_PORT = int(os.getenv("TTS_SERVICE_PORT", 9088))
+ANIMATION_SERVICE_HOST_IP = os.getenv("ANIMATION_SERVICE_HOST_IP", "0.0.0.0")
+ANIMATION_SERVICE_PORT = int(os.getenv("ANIMATION_SERVICE_PORT", 9066))
+
+
+def check_env_vars(env_var_list):
+    for var in env_var_list:
+        if os.getenv(var) is None:
+            print(f"Error: The environment variable '{var}' is not set.")
+            sys.exit(1)  # Exit the program with a non-zero status code
+    print("All environment variables are set.")
+
+
+class AvatarChatbotService:
+    def __init__(self, host="0.0.0.0", port=8000):
+        self.host = host
+        self.port = port
+        self.megaservice = ServiceOrchestrator()
+
+    def add_remote_service(self):
+        asr = MicroService(
+            name="asr",
+            host=ASR_SERVICE_HOST_IP,
+            port=ASR_SERVICE_PORT,
+            endpoint="/v1/audio/transcriptions",
+            use_remote_service=True,
+            service_type=ServiceType.ASR,
+        )
+        llm = MicroService(
+            name="llm",
+            host=LLM_SERVICE_HOST_IP,
+            port=LLM_SERVICE_PORT,
+            endpoint="/v1/chat/completions",
+            use_remote_service=True,
+            service_type=ServiceType.LLM,
+        )
+        tts = MicroService(
+            name="tts",
+            host=TTS_SERVICE_HOST_IP,
+            port=TTS_SERVICE_PORT,
+            endpoint="/v1/audio/speech",
+            use_remote_service=True,
+            service_type=ServiceType.TTS,
+        )
+        animation = MicroService(
+            name="animation",
+            host=ANIMATION_SERVICE_HOST_IP,
+            port=ANIMATION_SERVICE_PORT,
+            endpoint="/v1/animation",
+            use_remote_service=True,
+            service_type=ServiceType.ANIMATION,
+        )
+        self.megaservice.add(asr).add(llm).add(tts).add(animation)
+        self.megaservice.flow_to(asr, llm)
+        self.megaservice.flow_to(llm, tts)
+        self.megaservice.flow_to(tts, animation)
+        self.gateway = AvatarChatbotGateway(megaservice=self.megaservice, host="0.0.0.0", port=self.port)
+
+
+if __name__ == "__main__":
+    check_env_vars(
+        [
+            "MEGA_SERVICE_HOST_IP",
+            "MEGA_SERVICE_PORT",
+            "ASR_SERVICE_HOST_IP",
+            "ASR_SERVICE_PORT",
+            "LLM_SERVICE_HOST_IP",
+            "LLM_SERVICE_PORT",
+            "TTS_SERVICE_HOST_IP",
+            "TTS_SERVICE_PORT",
+            "ANIMATION_SERVICE_HOST_IP",
+            "ANIMATION_SERVICE_PORT",
+        ]
+    )
+
+    avatarchatbot = AvatarChatbotService(host=MEGA_SERVICE_HOST_IP, port=MEGA_SERVICE_PORT)
+    avatarchatbot.add_remote_service()