Skip to content

Commit

Permalink
Simplify the table
Browse files Browse the repository at this point in the history
Signed-off-by: srinarayan-srikanthan <[email protected]>
Signed-off-by: Chun Tao <[email protected]>
  • Loading branch information
srinarayan-srikanthan authored and ctao456 committed Sep 6, 2024
1 parent 83340e0 commit b5cbec9
Showing 1 changed file with 13 additions and 116 deletions.
129 changes: 13 additions & 116 deletions ChatQnA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,122 +97,19 @@ flowchart LR

This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Generation Inference on Intel Gaudi2 or Intel XEON Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details.

For a full list of framework, model, serving, and hardware choices available for each of the microservice components in the ChatQnA architecture, please refer to the below table:

<table>
<tbody>
<tr>
<th>Microservice</th>
<th>Port</th>
<th>Endpoint</th>
<th>Framework</th>
<th>Model</th>
<th>Serving</th>
<th>Hardware</th>
<th>Description</th>
</tr>
<tr>
<td rowspan="2"><a href="https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings">Embedding</a></td>
<td rowspan="2">6000</td>
<td rowspan="2">/v1/embeddings</td>
<td rowspan="2"><a href="https://www.langchain.com">LangChain</a>/<a href="https://www.llamaindex.ai">LlamaIndex</a></td>
<td rowspan="2"><a href="https://huggingface.co/BAAI/bge-large-en-v1.5">BAAI/bge-large-en-v1.5</a></td>
<td><a href="https://github.com/huggingface/tei-gaudi">TEI-Gaudi</a></td>
<td>Gaudi2</td>
<td>Embedding on Gaudi2</td>
</tr>
<tr>
<td><a href="https://github.com/huggingface/text-embeddings-inference">TEI</a></td>
<td>Xeon</td>
<td>Embedding on Xeon CPU</td>
</tr>
<tr>
<td><a href="https://github.com/opea-project/GenAIComps/tree/main/comps/retrievers">Retriever</a></td>
<td>7000</td>
<td>/v1/retrieval</td>
<td><a href="https://www.langchain.com">LangChain</a>/<a href="https://www.llamaindex.ai">LlamaIndex</a></td>
<td><a href="https://huggingface.co/BAAI/bge-base-en-v1.5">BAAI/bge-base-en-v1.5</a></td>
<td><a href="https://github.com/huggingface/text-embeddings-inference">TEI</a></td>
<td>Xeon</td>
<td>Retriever on Xeon CPU</td>
</tr>
<tr>
<td rowspan="2"><a href="https://github.com/opea-project/GenAIComps/tree/main/comps/reranks">Reranking</a></td>
<td rowspan="2">8000</td>
<td rowspan="2">/v1/reranking</td>
<td rowspan="2"><a href="https://www.langchain.com">LangChain</a>/<a href="https://www.llamaindex.ai">LlamaIndex</a></td>
<td ><a href="https://huggingface.co/BAAI/bge-reranker-large">BAAI/bge-reranker-large</a></td>
<td><a href="https://github.com/huggingface/tei-gaudi">TEI-Gaudi</a></td>
<td>Gaudi2</td>
<td>Reranking on Gaudi2</td>
</tr>
<tr>
<td><a href="https://huggingface.co/BAAI/bge-reranker-base">BBAAI/bge-reranker-base</a></td>
<td><a href="https://github.com/huggingface/text-embeddings-inference">TEI</a></td>
<td>Xeon</td>
<td>Reranking on Xeon CPU</td>
</tr>
<tr>
<td rowspan="6"><a href="https://github.com/opea-project/GenAIComps/blob/main/comps/llms">LLM</a></td>
<td rowspan="6">9000</td>
<td rowspan="6">/v1/chat/completions</td>
<td rowspan="6"><a href="https://www.langchain.com">LangChain</a>/<a href="https://www.llamaindex.ai">LlamaIndex</a></td>
<td rowspan="2"><a href="https://huggingface.co/Intel/neural-chat-7b-v3-3">Intel/neural-chat-7b-v3-3</a></td>
<td><a href="https://github.com/huggingface/tgi-gaudi">TGI Gaudi</a></td>
<td>Gaudi2</td>
<td>LLM on Gaudi2</td>
</tr>
<tr>
<td><a href="https://github.com/huggingface/text-generation-inference">TGI</a></td>
<td>Xeon</td>
<td>LLM on Xeon CPU</td>
</tr>
<tr>
<td rowspan="2"><a href="https://huggingface.co/Intel/neural-chat-7b-v3-3">Intel/neural-chat-7b-v3-3</a></td>
<td rowspan="2"><a href="https://github.com/ray-project/ray">Ray Serve</a></td>
<td>Gaudi2</td>
<td>LLM on Gaudi2</td>
</tr>
<tr>
<td>Xeon</td>
<td>LLM on Xeon CPU</td>
</tr>
<tr>
<td rowspan="2"><a href="https://huggingface.co/Intel/neural-chat-7b-v3-3">Intel/neural-chat-7b-v3-3</a></td>
<td rowspan="2"><a href="https://github.com/vllm-project/vllm/">vLLM</a></td>
<td>Gaudi2</td>
<td>LLM on Gaudi2</td>
</tr>
<tr>
<td>Xeon</td>
<td>LLM on Xeon CPU</td>
</tr>
<tr>
<td rowspan="4"><a href="https://github.com/opea-project/GenAIComps/blob/main/comps/dataprep">Dataprep</a></td>
<td rowspan="4">6007</td>
<td rowspan="4">/v1/dataprep</td>
<td rowspan="2"><a href="https://qdrant.tech/">Qdrant</td>
<td rowspan="2"><a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2">sentence-transformers/all-MiniLM-L6-v2</a></td>
<td rowspan="4">NA</td>
<td>Gaudi2</td>
<td>Dataprep on Gaudi2</td>
</tr>
<tr>
<td>Xeon</td>
<td>Dataprep on Xeon CPU</td>
</tr>
<tr>
<td rowspan="2"><a href="https://redis.io/">Redis</td>
<td rowspan="2"><a href="https://huggingface.co/BAAI/bge-base-en-v1.5">BAAI/bge-base-en-v1.5</a></td>
<td>Gaudi2</td>
<td>Dataprep on Gaudi2</td>
</tr>
<tr>
<td>Xeon</td>
<td>Dataprep on Xeon CPU</td>
</tr>
</tbody>
</table>
In the below, we provide a table that describes for each microservice component in the ChatQnA architecture, the default configuration of the open source project, hardware, port, and endpoint.
<details>
<summary> Gaudi default compose.yaml </summary>

| MicroService | Open Source Project | HW | Port | Endpoint |
|--------------|---------------------|------|------|---------------------|
| Embedding | Langchain | Gaudi| 6000 | /v1/embaddings |
| Retriever | Langchain | Xeon | 7000 | /v1/retrieval |
| Reranking | Langchain | Xeon | 8000 | /v1/reranking |
| LLM | Langchain | Gaudi| 9000 | /v1/chat/completions |
| Dataprep | Redis | Xeon | 6007 | /v1/dataprep |

</details>

## Deploy ChatQnA Service

Expand Down

0 comments on commit b5cbec9

Please sign in to comment.