Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen xeon update #282

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion examples/CodeGen/CodeGen_Guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,6 @@ Here are some deployment options, depending on your hardware and environment:

.. toctree::
:maxdepth: 1


Intel® Xeon® Scalable processor <deploy/xeon>
Gaudi AI Accelerator <deploy/gaudi>
369 changes: 369 additions & 0 deletions examples/CodeGen/deploy/xeon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,369 @@
# Single node on-prem deployment with TGI on Xeon

This deployment section covers single-node on-prem deployment of the CodeGen
example with OPEA comps to deploy using the TGI service. We will be showcasing how
to build an e2e CodeGen solution with the Qwen2.5-Coder-7B-Instruct,
deployed on Intel® Xeon® Scalable processors. To quickly learn about OPEA in just 5 minutes and set up the required hardware and software, please follow the instructions in the
[Getting Started](https://opea-project.github.io/latest/getting-started/README.html) section.

## Overview

The CodeGen use case uses a single microservice called LLM. In this tutorial, we
will walk through the steps on how on enable it from OPEA GenAIComps to deploy on
a single node TGI megaservice solution.

The solution is aimed to show how to use the Qwen2.5-Coder-7B-Instruct model on the Intel®
Xeon® Scalable processors. We will go through how to setup docker containers to start
the microservice and megaservice. The solution will then take text input as the
prompt and generate code accordingly. It is deployed with a UI with 2 modes to
choose from:

1. Basic UI
2. React-Based UI

The React-based UI is optional, but this feature is supported in this example if you
are interested in using it.

Below is the list of content we will be covering in this tutorial:

1. Prerequisites
2. Prepare (Building / Pulling) Docker images
3. Use case setup
4. Deploy the use case
5. Interacting with CodeGen deployment

## Prerequisites

The first step is to clone the GenAIExamples and GenAIComps. GenAIComps are
fundamental necessary components used to build examples you find in
GenAIExamples and deploy them as microservices.

```bash
git clone https://github.com/opea-project/GenAIComps.git
git clone https://github.com/opea-project/GenAIExamples.git
export TAG=1.2
```

The examples utilize model weights from HuggingFace and langchain.

Setup your [HuggingFace](https://huggingface.co/) account and generate
[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token).

Setup the HuggingFace token
```
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
```

The example requires you to set the `host_ip` to deploy the microservices on
endpoint enabled with ports. Set the host_ip env variable
```
export host_ip=$(hostname -I | awk '{print $1}')
```

Make sure to setup Proxies if you are behind a firewall
```
export no_proxy=${your_no_proxy},$host_ip
export http_proxy=${your_http_proxy}
export https_proxy=${your_http_proxy}
```

## Prepare (Building / Pulling) Docker images

This step will involve building/pulling relevant docker
images with step-by-step process along with sanity check in the end. For
CodeGen, the following docker images will be needed: LLM with TGI.
Additionally, you will need to build docker images for the
CodeGen megaservice, and UI (React UI is optional). In total,
there are **3 required docker images** and an optional docker image.

### Build/Pull Microservice image

::::::{tab-set}

:::::{tab-item} Pull
:sync: Pull

If you decide to pull the docker containers and not build them locally,
you can proceed to the next step where all the necessary containers will
be pulled in from dockerhub.

:::::
:::::{tab-item} Build
:sync: Build

From within the `GenAIComps` folder, checkout the release tag.
```
cd GenAIComps
git checkout tags/v${TAG}
```

#### Build LLM Image

```bash
docker build -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
```

### Build Mega Service images

The Megaservice is a pipeline that channels data through different
microservices, each performing varied tasks. The LLM microservice and
flow of data are defined in the `codegen.py` file. You can also add or
remove microservices and customize the megaservice to suit your needs.

Build the megaservice image for this use case

```bash
cd ..
cd GenAIExamples
git checkout tags/v${TAG}
cd CodeGen
```

```bash
docker build -t opea/codegen:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile .
cd ../..
```

### Build the UI Image

You can build 2 modes of UI

*Svelte UI*

```bash
cd GenAIExamples/CodeGen/ui/
docker build -t opea/codegen-ui:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
cd ../../..
```

*React UI (Optional)*
If you want a React-based frontend.

```bash
cd GenAIExamples/CodeGen/ui/
docker build --no-cache -t opea/codegen-react-ui:${TAG} --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
cd ../../..
```

### Sanity Check
Check if you have the following set of docker images by running the command `docker images` before moving on to the next step:

* `opea/llm-tgi:${TAG}`
* `opea/codegen:${TAG}`
* `opea/codegen-ui:${TAG}`
* `opea/codegen-react-ui:${TAG}` (optional)

:::::
::::::

## Use Case Setup

The use case will use the following combination of GenAIComps and tools

|Use Case Components | Tools | Model | Service Type |
|---------------- |--------------|-----------------------------|-------|
|LLM | TGI | meta-llama/CodeLlama-7b-hf | OPEA Microservice |
|UI | | NA | Gateway Service |

Tools and models mentioned in the table are configurable either through the
environment variables or `compose.yaml` file.

Set the necessary environment variables to setup the use case case by running the `set_env.sh` script.
Here is where the environment variable `LLM_MODEL_ID` is set, and you can change it to another model
by specifying the HuggingFace model card ID.

```bash
cd GenAIExamples/CodeGen/docker_compose/
source ./set_env.sh
cd ../../..
```

## Deploy the Use Case

In this tutorial, we will be deploying via docker compose with the provided
YAML file. The docker compose instructions should be starting all the
above mentioned services as containers.

```bash
cd GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon
docker compose up -d
```


### Checks to Ensure the Services are Running
#### Check Startup and Env Variables
Check the start up log by running `docker compose logs` to ensure there are no errors.
The warning messages print out the variables if they are **NOT** set.

Here are some sample messages if proxy environment variables are not set:

WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string.
WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string.

#### Check the Container Status

Check if all the containers launched via docker compose has started.

The CodeGen example starts 4 docker containers. Check that these docker
containers are all running, i.e, all the containers `STATUS` are `Up`.
You can do this with the `docker ps -a` command.

```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bbd235074c3d opea/codegen-ui:latest "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp codegen-xeon-ui-server
8d3872ca66fa opea/codegen:latest "python codegen.py" About a minute ago Up About a minute 0.0.0.0:7778->7778/tcp, :::7778->7778/tcp codegen-xeom-backend-server
b9fc39f51cdb opea/llm-tgi:latest "bash entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-xeon-server
39994e007f15 ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu "text-generation-lau…" About a minute ago Up About a minute 0.0.0.0:8028->80/tcp, :::8028->80/tcp tgi-server
```

## Interacting with CodeGen for Deployment

This section will walk you through the different ways to interact with
the microservices deployed. After a couple minutes, rerun `docker ps -a`
to ensure all the docker containers are still up and running. Then proceed
to validate each microservice and megaservice.

### TGI Service

```bash
curl http://${host_ip}:8028/generate \
-X POST \
-d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' \
-H 'Content-Type: application/json'
```

Here is the output:

```
{"generated_text":"Start with a user story. We will add story tests later. In this case, we'll choose a story about adding a TODO:\n ```ruby\n as a user,\n i want to add a todo,\n so that i can get a todo list.\n\n conformance:\n - a new todo is added to the list\n - if the todo text is empty, raise an exception\n ```\n\n1. Write the first test:\n ```ruby\n feature Testing the addition of a todo to the list\n\n given a todo list empty list\n when a user adds a todo\n the todo should be added to the list\n\n inputs:\n when_values: [[\"A\"]]\n\n output validations:\n - todo_list contains { text:\"A\" }\n ```\n\n1. Write the first step implementation in any programming language you like. In this case, we will choose Ruby:\n ```ruby\n def add_"}
```

### LLM Microservice

```bash
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
```

The output is given one character at a time. It is too long to show
here but the last item will be
```
data: [DONE]
```

### MegaService

```bash
curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{
"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."
}'
```

The output is given one character at a time. It is too long to show
here but the last item will be
```
data: [DONE]
```

## Launch UI
### Svelte UI
To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```bash
codegen-xeon-ui-server:
image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest}
...
ports:
- "5173:5173"
```

### React-Based UI (Optional)
To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `codegen-xeon-ui-server` service with the codegen-xeon-react-ui-server service as per the config below:
```bash
codegen-xeon-react-ui-server:
image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
container_name: codegen-xeon-react-ui-server
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- APP_CODE_GEN_URL=${BACKEND_SERVICE_ENDPOINT}
depends_on:
- codegen-xeon-backend-server
ports:
- "5174:80"
ipc: host
restart: always
```
Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below:
```bash
codegen-xeon-react-ui-server:
image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest}
...
ports:
- "80:80"
```

## Check Docker Container Logs

You can check the log of a container by running this command:

```bash
docker logs <CONTAINER ID> -t
```

You can also check the overall logs with the following command, where the
`compose.yaml` is the megaservice docker-compose configuration file.

Assumming you are still in this directory `GenAIExamples/CodeGen/docker_compose/intel/cpu/xeon`,
run the following command to check the logs:
```bash
docker compose -f compose.yaml logs
```

View the docker input parameters in `./CodeGen/docker_compose/intel/cpu/xeon/compose.yaml`

```yaml
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-server
ports:
- "8028:80"
volumes:
- "./data:/data"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
runtime: habana
cap_add:
- SYS_NICE
ipc: host
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
```

The input `--model-id` is `${LLM_MODEL_ID}`. Ensure the environment variable `LLM_MODEL_ID`
is set correctly. Check spelling. Whenever this is changed, restart the containers to use
the newly selected model.


## Stop the services

Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below:
```
docker compose down
```
Loading