-
Notifications
You must be signed in to change notification settings - Fork 198
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add docker_compose example for AMD ROCm deployment
- Loading branch information
Showing
2 changed files
with
190 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
## 🚀 Start Microservices and MegaService | ||
|
||
### Required Models | ||
|
||
We set default model as "meta-llama/Meta-Llama-3-8B-Instruct", change "LLM_MODEL_ID" in following Environment Variables setting if you want to use other models. | ||
|
||
If use gated models, you also need to provide [huggingface token](https://huggingface.co/docs/hub/security-tokens) to "HUGGINGFACEHUB_API_TOKEN" environment variable. | ||
|
||
### Setup Environment Variables | ||
|
||
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below. | ||
|
||
```bash | ||
export FAQGEN_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" | ||
export HOST_IP=${your_no_proxy} | ||
export FAQGEN_TGI_SERVICE_PORT=8008 | ||
export FAQGEN_LLM_SERVER_PORT=9000 | ||
export FAQGEN_HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} | ||
export FAQGEN_BACKEND_SERVER_PORT=8888 | ||
export FAGGEN_UI_PORT=5173 | ||
``` | ||
|
||
Note: Please replace with `host_ip` with your external IP address, do not use localhost. | ||
|
||
Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/renderD<node>, where <node> is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) | ||
|
||
Example for set isolation for 1 GPU | ||
|
||
``` | ||
- /dev/dri/card0:/dev/dri/card0 | ||
- /dev/dri/renderD128:/dev/dri/renderD128 | ||
``` | ||
|
||
Example for set isolation for 2 GPUs | ||
|
||
``` | ||
- /dev/dri/card0:/dev/dri/card0 | ||
- /dev/dri/renderD128:/dev/dri/renderD128 | ||
- /dev/dri/card0:/dev/dri/card0 | ||
- /dev/dri/renderD129:/dev/dri/renderD129 | ||
``` | ||
|
||
Pelase find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) | ||
|
||
### Start Microservice Docker Containers | ||
|
||
```bash | ||
cd GenAIExamples/FaqGen/docker_compose/amd/gpu/rocm/ | ||
docker compose up -d | ||
``` | ||
|
||
### Validate Microservices | ||
|
||
1. TGI Service | ||
|
||
```bash | ||
curl http://${host_ip}:8008/generate \ | ||
-X POST \ | ||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
2. LLM Microservice | ||
|
||
```bash | ||
curl http://${host_ip}:9000/v1/faqgen \ | ||
-X POST \ | ||
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
3. MegaService | ||
|
||
```bash | ||
curl http://${host_ip}:8888/v1/faqgen -H "Content-Type: application/json" -d '{ | ||
"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." | ||
}' | ||
``` | ||
|
||
Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service. | ||
|
||
## 🚀 Launch the UI | ||
|
||
Open this URL `http://{host_ip}:5173` in your browser to access the frontend. | ||
|
||
![project-screenshot](../../../../assets/img/faqgen_ui_text.png) | ||
|
||
## 🚀 Launch the React UI (Optional) | ||
|
||
To access the FAQGen (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `faqgen-xeon-ui-server` service with the `faqgen-xeon-react-ui-server` service as per the config below: | ||
|
||
```bash | ||
faqgen-xeon-react-ui-server: | ||
image: opea/faqgen-react-ui:latest | ||
container_name: faqgen-xeon-react-ui-server | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
ports: | ||
- 5174:80 | ||
depends_on: | ||
- faqgen-xeon-backend-server | ||
ipc: host | ||
restart: always | ||
``` | ||
|
||
Open this URL `http://{host_ip}:5174` in your browser to access the react based frontend. | ||
|
||
- Create FAQs from Text input | ||
![project-screenshot](../../../../assets/img/faqgen_react_ui_text.png) | ||
|
||
- Create FAQs from Text Files | ||
![project-screenshot](../../../../assets/img/faqgen_react_ui_text_file.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
services: | ||
faqgen-tgi-service: | ||
image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm | ||
container_name: faggen-tgi-service | ||
ports: | ||
- "${FAQGEN_TGI_SERVICE_PORT}:80" | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${FAQGEN_TGI_SERVICE_PORT}" | ||
HUGGINGFACEHUB_API_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN} | ||
HUGGING_FACE_HUB_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN} | ||
volumes: | ||
- "./data:/data" | ||
shm_size: 1g | ||
devices: | ||
- /dev/kfd:/dev/kfd | ||
- /dev/dri/ | ||
cap_add: | ||
- SYS_PTRACE | ||
group_add: | ||
- video | ||
security_opt: | ||
- seccomp:unconfined | ||
ipc: host | ||
command: --model-id ${FAQGEN_LLM_MODEL_ID} | ||
faqgen-llm-server: | ||
image: ${REGISTRY:-opea}/llm-faqgen-tgi:${TAG:-latest} | ||
container_name: faqgen-llm-server | ||
depends_on: | ||
- faqgen-tgi-service | ||
ports: | ||
- "${FAQGEN_LLM_SERVER_PORT}:9000" | ||
ipc: host | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${FAQGEN_TGI_SERVICE_PORT}" | ||
HUGGINGFACEHUB_API_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN} | ||
HUGGING_FACE_HUB_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN} | ||
restart: unless-stopped | ||
faqgen-backend-server: | ||
image: ${REGISTRY:-opea}/faqgen:${TAG:-latest} | ||
container_name: faqgen-backend-server | ||
depends_on: | ||
- faqgen-tgi-service | ||
- faqgen-llm-server | ||
ports: | ||
- "${FAQGEN_BACKEND_SERVER_PORT}:8888" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- MEGA_SERVICE_HOST_IP=${HOST_IP} | ||
- LLM_SERVICE_HOST_IP=${HOST_IP} | ||
ipc: host | ||
restart: always | ||
faqgen-ui-server: | ||
image: ${REGISTRY:-opea}/faqgen-ui:${TAG:-latest} | ||
container_name: faqgen-ui-server | ||
depends_on: | ||
- faqgen-backend-server | ||
ports: | ||
- "${FAGGEN_UI_PORT}:5173" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- DOC_BASE_URL="http://${HOST_IP}:${FAQGEN_BACKEND_SERVER_PORT}/v1/faqgen" | ||
ipc: host | ||
restart: always | ||
networks: | ||
default: | ||
driver: bridge |