Skip to content

Commit

Permalink
[DOC] update ChatQnA README (#201)
Browse files Browse the repository at this point in the history
Signed-off-by: Wang, Xigui <[email protected]>
  • Loading branch information
xiguiw authored May 29, 2024
1 parent eadaacc commit e48f2ab
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
6 changes: 3 additions & 3 deletions ChatQnA/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# ChatQnA Application

Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLM). The retrieval augmented generation (RAG) architecture is quickly becoming the industry standard for developing chatbots because it combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge.
Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLMs). The retrieval augmented generation (RAG) architecture is quickly becoming the industry standard for chatbots development. It combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge.

RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. At the heart of this architecture are vector databases, instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.
RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity.

ChatQnA architecture shows below:

![architecture](https://i.imgur.com/lLOnQio.png)

This ChatQnA use case performs RAG using LangChain, Redis vectordb and Text Generation Inference on Intel Gaudi2 or Intel XEON Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](https://habana.ai/products) for more details.
This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Generation Inference on Intel Gaudi2 or Intel XEON Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](https://habana.ai/products) for more details.

# Deploy ChatQnA Service

Expand Down
14 changes: 7 additions & 7 deletions ChatQnA/docker-composer/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ cd GenAIExamples/ChatQnA/ui/
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
```

Then run the command `docker images`, you will have the following 7 Docker Images:
Then run the command `docker images`, you will have the following 8 Docker Images:

1. `opea/embedding-tei:latest`
2. `opea/retriever-redis:latest`
Expand Down Expand Up @@ -109,7 +109,7 @@ export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
```

Note: Please replace with `host_ip` with you external IP address, do not use localhost.
Note: Please replace with `host_ip` with you external IP address, do **NOT** use localhost.

### Start all the services Docker Containers

Expand All @@ -132,7 +132,7 @@ curl ${host_ip}:8090/embed \
2. Embedding Microservice

```bash
curl http://${host_ip}:6000/v1/embeddings\
curl http://${host_ip}:6000/v1/embeddings \
-X POST \
-d '{"text":"hello"}' \
-H 'Content-Type: application/json'
Expand All @@ -149,10 +149,10 @@ embedding = [random.uniform(-1, 1) for _ in range(768)]
print(embedding)
```

Then substitute your mock embedding vector for the `${your_embedding}` in the following cURL command:
Then substitute your mock embedding vector for the `${your_embedding}` in the following `curl` command:

```bash
curl http://${host_ip}:7000/v1/retrieval\
curl http://${host_ip}:7000/v1/retrieval \
-X POST \
-d '{"text":"test", "embedding":${your_embedding}}' \
-H 'Content-Type: application/json'
Expand All @@ -170,7 +170,7 @@ curl http://${host_ip}:8808/rerank \
5. Reranking Microservice

```bash
curl http://${host_ip}:8000/v1/reranking\
curl http://${host_ip}:8000/v1/reranking \
-X POST \
-d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
-H 'Content-Type: application/json'
Expand All @@ -188,7 +188,7 @@ curl http://${host_ip}:8008/generate \
7. LLM Microservice

```bash
curl http://${host_ip}:9000/v1/chat/completions\
curl http://${host_ip}:9000/v1/chat/completions \
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
Expand Down

0 comments on commit e48f2ab

Please sign in to comment.