-
Notifications
You must be signed in to change notification settings - Fork 200
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add compose deploy example for DocSum on AMD ROCm
Signed-off-by: astafevav <[email protected]>
- Loading branch information
Showing
3 changed files
with
383 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
## 🚀 Start Microservices and MegaService | ||
|
||
### Required Models | ||
|
||
We set default model as "Intel/neural-chat-7b-v3-3", change "LLM_MODEL_ID" in following setting if you want to use other models. | ||
If use gated models, you also need to provide [huggingface token](https://huggingface.co/docs/hub/security-tokens) to "HUGGINGFACEHUB_API_TOKEN" environment variable. | ||
|
||
### Setup Environment Variables | ||
|
||
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below. | ||
|
||
```bash | ||
export DOCSUM_TGI_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm" | ||
export DOCSUM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3" | ||
export HOST_IP=${host_ip} | ||
export DOCSUM_TGI_SERVICE_PORT="18882" | ||
export DOCSUM_TGI_LLM_ENDPOINT="http://${HOST_IP}:${DOCSUM_TGI_SERVICE_PORT}" | ||
export DOCSUM_HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} | ||
export DOCSUM_LLM_SERVER_PORT="8008" | ||
export DOCSUM_BACKEND_SERVER_PORT="8888" | ||
export DOCSUM_FRONTEND_PORT="5173" | ||
``` | ||
|
||
Note: Please replace with `host_ip` with your external IP address, do not use localhost. | ||
|
||
Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered<node>, where <node> is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) | ||
|
||
Example for set isolation for 1 GPU | ||
|
||
``` | ||
- /dev/dri/card0:/dev/dri/card0 | ||
- /dev/dri/renderD128:/dev/dri/renderD128 | ||
``` | ||
|
||
Example for set isolation for 2 GPUs | ||
|
||
``` | ||
- /dev/dri/card0:/dev/dri/card0 | ||
- /dev/dri/renderD128:/dev/dri/renderD128 | ||
- /dev/dri/card1:/dev/dri/card1 | ||
- /dev/dri/renderD129:/dev/dri/renderD129 | ||
``` | ||
|
||
Pelase find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) | ||
|
||
### Start Microservice Docker Containers | ||
|
||
```bash | ||
cd GenAIExamples/DocSum/docker_compose/amd/gpu/rocm | ||
docker compose up -d | ||
``` | ||
|
||
### Validate Microservices | ||
|
||
1. TGI Service | ||
|
||
```bash | ||
curl http://${host_ip}:8008/generate \ | ||
-X POST \ | ||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
2. LLM Microservice | ||
|
||
```bash | ||
curl http://${host_ip}:9000/v1/chat/docsum \ | ||
-X POST \ | ||
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
3. MegaService | ||
|
||
```bash | ||
curl http://${host_ip}:8888/v1/docsum -H "Content-Type: application/json" -d '{ | ||
"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.","max_tokens":32, "language":"en", "stream":false | ||
}' | ||
``` | ||
|
||
## 🚀 Launch the Svelte UI | ||
|
||
Open this URL `http://{host_ip}:5173` in your browser to access the frontend. | ||
|
||
![project-screenshot](https://github.com/intel-ai-tce/GenAIExamples/assets/21761437/93b1ed4b-4b76-4875-927e-cc7818b4825b) | ||
|
||
Here is an example for summarizing a article. | ||
|
||
![image](https://github.com/intel-ai-tce/GenAIExamples/assets/21761437/67ecb2ec-408d-4e81-b124-6ded6b833f55) | ||
|
||
## 🚀 Launch the React UI (Optional) | ||
|
||
To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `docsum-rocm-ui-server` service with the `docsum-rocm-react-ui-server` service as per the config below: | ||
|
||
```yaml | ||
docsum-rocm-react-ui-server: | ||
image: ${REGISTRY:-opea}/docsum-react-ui:${TAG:-latest} | ||
container_name: docsum-rocm-react-ui-server | ||
depends_on: | ||
- docsum-rocm-backend-server | ||
ports: | ||
- "5174:80" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- DOC_BASE_URL=${BACKEND_SERVICE_ENDPOINT} | ||
``` | ||
Open this URL `http://{host_ip}:5175` in your browser to access the frontend. | ||
|
||
![project-screenshot](../../../../assets/img/docsum-ui-react.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
services: | ||
docsum-tgi-service: | ||
image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm | ||
container_name: docsum-tgi-service | ||
ports: | ||
- "${DOCSUM_TGI_SERVICE_PORT}:80" | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${DOCSUM_TGI_SERVICE_PORT}" | ||
HUGGINGFACEHUB_API_TOKEN: ${DOCSUM_HUGGINGFACEHUB_API_TOKEN} | ||
volumes: | ||
- "/var/opea/docsum-service/data:/data" | ||
shm_size: 1g | ||
devices: | ||
- /dev/kfd:/dev/kfd | ||
cap_add: | ||
- SYS_PTRACE | ||
group_add: | ||
- video | ||
security_opt: | ||
- seccomp:unconfined | ||
ipc: host | ||
command: --model-id ${DOCSUM_LLM_MODEL_ID} | ||
docsum-llm-server: | ||
image: ${REGISTRY:-opea}/llm-docsum-tgi:${TAG:-latest} | ||
container_name: docsum-llm-server | ||
depends_on: | ||
- docsum-tgi-service | ||
ports: | ||
- "${DOCSUM_LLM_SERVER_PORT}:9000" | ||
ipc: host | ||
group_add: | ||
- video | ||
security_opt: | ||
- seccomp:unconfined | ||
cap_add: | ||
- SYS_PTRACE | ||
devices: | ||
- /dev/kfd:/dev/kfd | ||
- /dev/dri/${DOCSUM_CARD_ID}:/dev/dri/${DOCSUM_CARD_ID} | ||
- /dev/dri/${DOCSUM_RENDER_ID}:/dev/dri/${DOCSUM_RENDER_ID} | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${DOCSUM_TGI_SERVICE_PORT}" | ||
HUGGINGFACEHUB_API_TOKEN: ${DOCSUM_HUGGINGFACEHUB_API_TOKEN} | ||
restart: unless-stopped | ||
docsum-backend-server: | ||
image: ${REGISTRY:-opea}/docsum:${TAG:-latest} | ||
container_name: docsum-backend-server | ||
depends_on: | ||
- docsum-tgi-service | ||
- docsum-llm-server | ||
ports: | ||
- "${DOCSUM_BACKEND_SERVER_PORT}:8888" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- MEGA_SERVICE_HOST_IP=${HOST_IP} | ||
- LLM_SERVICE_HOST_IP=${HOST_IP} | ||
ipc: host | ||
restart: always | ||
docsum-ui-server: | ||
image: ${REGISTRY:-opea}/docsum-ui:${TAG:-latest} | ||
container_name: docsum-ui-server | ||
depends_on: | ||
- docsum-backend-server | ||
ports: | ||
- "${DOCSUM_FRONTEND_PORT}:5173" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- DOC_BASE_URL="http://${HOST_IP}:${DOCSUM_BACKEND_PORT}/v1/docsum" | ||
ipc: host | ||
restart: always | ||
|
||
networks: | ||
default: | ||
driver: bridge |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
#!/bin/bash | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
set -xe | ||
IMAGE_REPO=${IMAGE_REPO:-"opea"} | ||
IMAGE_TAG=${IMAGE_TAG:-"latest"} | ||
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}" | ||
echo "TAG=IMAGE_TAG=${IMAGE_TAG}" | ||
export REGISTRY=${IMAGE_REPO} | ||
export TAG=${IMAGE_TAG} | ||
|
||
WORKPATH=$(dirname "$PWD") | ||
LOG_PATH="$WORKPATH/tests" | ||
ip_address=$(hostname -I | awk '{print $1}') | ||
|
||
function build_docker_images() { | ||
cd $WORKPATH/docker_image_build | ||
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ | ||
|
||
echo "Build all the images with --no-cache, check docker_image_build.log for details..." | ||
service_list="docsum docsum-ui llm-docsum-tgi" | ||
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log | ||
|
||
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm | ||
docker images && sleep 1s | ||
} | ||
|
||
function start_services() { | ||
cd $WORKPATH/docker_compose/amd/gpu/rocm | ||
|
||
export DOCSUM_TGI_IMAGE="ghcr.io/huggingface/text-generation-inference:2.3.1-rocm" | ||
export DOCSUM_LLM_MODEL_ID="Intel/neural-chat-7b-v3-3" | ||
export HOST_IP=${ip_address} | ||
export DOCSUM_TGI_SERVICE_PORT="8008" | ||
export DOCSUM_TGI_LLM_ENDPOINT="http://${HOST_IP}:8008" | ||
export DOCSUM_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} | ||
export DOCSUM_LLM_SERVER_PORT="9000" | ||
export DOCSUM_BACKEND_SERVER_PORT="8888" | ||
export DOCSUM_FRONTEND_PORT="5552" | ||
export MEGA_SERVICE_HOST_IP=${ip_address} | ||
export LLM_SERVICE_HOST_IP=${ip_address} | ||
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:8888/v1/docsum" | ||
|
||
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env | ||
|
||
# Start Docker Containers | ||
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log | ||
|
||
until [[ "$n" -ge 100 ]]; do | ||
docker logs docsum-tgi-service > ${LOG_PATH}/tgi_service_start.log | ||
if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then | ||
break | ||
fi | ||
sleep 5s | ||
n=$((n+1)) | ||
done | ||
} | ||
|
||
function validate_services() { | ||
local URL="$1" | ||
local EXPECTED_RESULT="$2" | ||
local SERVICE_NAME="$3" | ||
local DOCKER_NAME="$4" | ||
local INPUT_DATA="$5" | ||
|
||
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL") | ||
if [ "$HTTP_STATUS" -eq 200 ]; then | ||
echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..." | ||
|
||
local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log) | ||
|
||
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then | ||
echo "[ $SERVICE_NAME ] Content is as expected." | ||
else | ||
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT" | ||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log | ||
exit 1 | ||
fi | ||
else | ||
echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS" | ||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log | ||
exit 1 | ||
fi | ||
sleep 1s | ||
} | ||
|
||
function validate_microservices() { | ||
# Check if the microservices are running correctly. | ||
|
||
# tgi for llm service | ||
validate_services \ | ||
"${ip_address}:8008/generate" \ | ||
"generated_text" \ | ||
"tgi-llm" \ | ||
"tgi-service" \ | ||
'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' | ||
|
||
# llm microservice | ||
validate_services \ | ||
"${ip_address}:9000/v1/chat/docsum" \ | ||
"data: " \ | ||
"llm" \ | ||
"llm-docsum-server" \ | ||
'{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' | ||
} | ||
|
||
function validate_megaservice() { | ||
local SERVICE_NAME="mega-docsum" | ||
local DOCKER_NAME="docsum-backend-server" | ||
local EXPECTED_RESULT="embedding" | ||
local INPUT_DATA="messages=Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." | ||
local URL="${ip_address}:8888/v1/docsum" | ||
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -F "$INPUT_DATA" -H 'Content-Type: multipart/form-data' "$URL") | ||
if [ "$HTTP_STATUS" -eq 200 ]; then | ||
echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..." | ||
|
||
local CONTENT=$(curl -s -X POST -F "$INPUT_DATA" -H 'Content-Type: multipart/form-data' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log) | ||
|
||
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then | ||
echo "[ $SERVICE_NAME ] Content is as expected." | ||
else | ||
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT" | ||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log | ||
exit 1 | ||
fi | ||
else | ||
echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS" | ||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log | ||
exit 1 | ||
fi | ||
sleep 1s | ||
} | ||
|
||
function validate_frontend() { | ||
cd $WORKPATH/ui/svelte | ||
local conda_env_name="OPEA_e2e" | ||
export PATH=${HOME}/miniforge3/bin/:$PATH | ||
if conda info --envs | grep -q "$conda_env_name"; then | ||
echo "$conda_env_name exist!" | ||
else | ||
conda create -n ${conda_env_name} python=3.12 -y | ||
fi | ||
source activate ${conda_env_name} | ||
|
||
sed -i "s/localhost/$ip_address/g" playwright.config.ts | ||
|
||
conda install -c conda-forge nodejs -y | ||
npm install && npm ci && npx playwright install --with-deps | ||
node -v && npm -v && pip list | ||
|
||
exit_status=0 | ||
npx playwright test || exit_status=$? | ||
|
||
if [ $exit_status -ne 0 ]; then | ||
echo "[TEST INFO]: ---------frontend test failed---------" | ||
exit $exit_status | ||
else | ||
echo "[TEST INFO]: ---------frontend test passed---------" | ||
fi | ||
} | ||
|
||
function stop_docker() { | ||
cd $WORKPATH/docker_compose/amd/gpu/rocm | ||
docker compose stop && docker compose rm -f | ||
} | ||
|
||
function main() { | ||
|
||
stop_docker | ||
|
||
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi | ||
start_services | ||
|
||
validate_microservices | ||
validate_megaservice | ||
#validate_frontend | ||
|
||
stop_docker | ||
echo y | docker system prune | ||
|
||
} | ||
|
||
main |