opea-project · chickenrae · Sep 4, 2024 · Sep 3, 2024
@@ -1 +1 @@
-* [email protected] [email protected] [email protected] [email protected] [email protected]
+* [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
@@ -1 +1,2 @@
 _build
+.vscode
@@ -3,89 +3,193 @@
 ChatQnA Sample Guide
 ####################
 
+.. note:: This guide is in its early development and is a work-in-progress with
+   placeholder content.
+
 Introduction/Purpose
 ********************
 
-Tom to provide. 
+TODO: Tom to provide.
 
 Overview/Intro
 ==============
 
+Chatbots are a  widely adopted use case for leveraging the powerful chat and
+reasoning capabilities of large language models (LLMs).  The ChatQnA example
+provides the starting point for developers to begin working in the GenAI space.
+Consider it the “hello world” of GenAI applications and can be leveraged for
+solutions across wide enterprise verticals, both internally and externally.
+
 Purpose
 =======
 
-Preview 
+The ChatQnA example uses retrieval augmented generation (RAG) architecture,
+which is quickly becoming the industry standard for chatbot development. It
+combines the benefits of a knowledge base (via a vector store) and generative
+models to reduce hallucinations, maintain up-to-date information, and leverage
+domain-specific knowledge.
+
+RAG bridges the knowledge gap by dynamically fetching relevant information from
+external sources, ensuring that responses generated remain factual and current.
+The core of this architecture are vector databases, which are instrumental in
+enabling efficient and semantic retrieval of information. These databases store
+data as vectors, allowing RAG to swiftly access the most pertinent documents or
+data points based on semantic similarity.
+
+Central to the RAG architecture is the use of a generative model, which is
+responsible for generating responses to user queries. The generative model is
+trained on a large corpus of customized and relevant text data and is capable of
+generating human-like responses. Developers can easily swap out the generative
+model or vector database with their own custom models or databases. This allows
+developers to build chatbots that are tailored to their specific use cases and
+requirements. By combining the generative model with the vector database, RAG
+can provide accurate and contextually relevant responses specific to your users'
+queries.
+
+The ChatQnA example is designed to be a simple, yet powerful, demonstration of
+the RAG architecture. It is a great starting point for developers looking to
+build chatbots that can provide accurate and up-to-date information to users.
+
+GMC is GenAI Microservices Connector. GMC facilitates sharing of services across
+GenAI applications/pipelines, dynamic switching between models used in any stage
+of a GenAI pipeline, for instance in the ChatQnA GenAI pipeline, it supports
+changing the model used in the embedder, re-ranker, and/or the LLM.
+
+So one can use Upstream Vanilla Kubernetes or RHOCP, and one can use them with
+and without GMC. GMC as indicated provides additional features.
+
+The ChatQnA provides several deployment options, including single-node
+deployments on-premise or in a cloud environment using hardware such as Xeon
+Scalable Processors, Gaudi servers, NVIDIA GPUs, and even on AI PCs.  It also
+supports Kubernetes deployments with and without the GenAI Management Console
+(GMC), as well as cloud-native deployments using Red Hat OpenShift Container
+Platform (RHOCP).
+
+
+Preview
 =======
 
-AI catalog if applicable, or recorded demos. 
+To get a preview of the ChatQnA example, visit the
+`AI Explore site <https://aiexplorer.intel.com/explore>`_. The **ChatQnA Solution**
+provides a basic chatbot while the **ChatQnA with Augmented Context**
+allows you to upload your own files in order to quickly experiment with a RAG
+solution to see how a developer supplied corpus can provide relevant and up to
+date responses.
 
-Key Implementation Details 
+Key Implementation Details
 ==========================
 
-Tech Overview
-*************
+Embedding:
+  The process of transforming user queries into numerical representations called
+  embeddings.
+Vector Database:
+  The storage and retrieval of relevant data points using vector databases.
+RAG Architecture:
+  The use of the RAG architecture to combine knowledge bases and generative
+  models for development of chatbots with relevant and up to date query
+  responses.
+Large Language Models (LLMs):
+  The training and utilization of LLMs for generating responses.
+Deployment Options:
+  production ready deployment options for the ChatQnA
+  example, including single-node deployments and Kubernetes deployments.
 
 How It Works
 ============
 
-High level graphics to summarize the application.
+The ChatQnA Examples follows a basic flow of information in the chatbot system,
+starting from the user input and going through the retrieve, re-ranker, and
+generate components, ultimately resulting in the bot's output.
+
+.. figure:: /GenAIExamples/ChatQnA/assets/img/chatqna_architecture.png
+   :alt: ChatQnA Architecture Diagram
+
+   This diagram illustrates the flow of information in the chatbot system,
+   starting from the user input and going through the retrieve, analyze, and
+   generate components, ultimately resulting in the bot's output.
+
+The architecture follows a series of steps to process user queries and generate responses:
+
+1. **Embedding**: The user query is first transformed into a numerical
+   representation called an embedding. This embedding captures the semantic
+   meaning of the query and allows for efficient comparison with other
+   embeddings.
+#. **Vector Database**: The embedding is then used to search a vector database,
+   which stores relevant data points as vectors. The vector database enables
+   efficient and semantic retrieval of information based on the similarity
+   between the query embedding and the stored vectors.
+#. **Re-ranker**: Uses a model to rank the retrieved data on their saliency.
+   The vector database retrieves the most relevant data
+   points based on the query embedding. These data points can include documents,
+   articles, or any other relevant information that can help generate accurate
+   responses.
+#. **LLM**: The retrieved data points are then passed to large language models
+   (LLM) for further processing. LLMs are powerful generative models that have
+   been trained on a large corpus of text data. They can generate human-like
+   responses based on the input data.
+#. **Generate Response**: The LLMs generate a response based on the input data
+   and the user query. This response is then returned to the user as the
+   chatbot's answer.
 
 Expected Output
 ===============
 
 Validation Matrix and Prerequisites
 ***********************************
 
+See :doc:`/GenAIExamples/supported_examples`
+
 Architecture
 ************
 
-Includes microservice level graphics.
+TODO: Includes microservice level graphics.
 
-Need to include the architecture with microservices. Like the ones Xigui/Chun made and explain in a para or 2 on the highlights of the arch including Gateway, UI, mega service, how models are deployed and how the microservices use the deployment service. The architecture can be laid out as general as possible, maybe  calling out “for e.g” on variable pieces. Will also be good to include a linw or 2 on what the overall use case is. For e.g. This chatqna is setup to assist in ansewering question on OPEA. The microservices are set up with RAG and llm pipeline to query on OPEA pdf documents 
+TODO: Need to include the architecture with microservices. Like the ones
+Xigui/Chun made and explain in a paragraph or 2 on the highlights of the arch
+including Gateway, UI, mega service, how models are deployed and how the
+microservices use the deployment service. The architecture can be laid out as
+general as possible, maybe  calling out “for e.g” on variable pieces. Will also
+be good to include a line or 2 on what the overall use case is. For e.g. This
+chatqna is setup to assist in answering question on OPEA. The microservices are
+set up with RAG and LLM pipeline to query on OPEA PDF documents
 
 Microservice Outline and Diagram
 ================================
 
 Deployment
 **********
 
-+--------------------------------------------------------------------------------------+
-| Single Node                                                                          |
-|                                                                                      |
-+============================================+=========================================+
-| XEON Scalable Processors                   |Gaudi Servers                            |
-|                                            |                                         |
-+--------------------------------------------+-----------------------------------------+
-| NNIDIA GPUs                                | AI PC                                   |
-|                                            |                                         |
-+--------------------------------------------+-----------------------------------------+
-
-+--------------------------------------------------------------------------------------+
-| Kubernetes                                                                           |
-|                                                                                      |
-+============================================+=========================================+
-| Xeon & Gaudi with GMC                      |Xeon & Gaudi without GMC                 |
-|                                            |                                         |
-+--------------------------------------------+-----------------------------------------+
-| Using Helm Charts                          |                                         |
-|                                            |                                         |
-+--------------------------------------------+-----------------------------------------+
-
-+--------------------------------------------------------------------------------------+
-|Cloud Native                                                                          |
-|                                                                                      |
-+============================================+=========================================+
-| Red Hat OpenShift Container Platform       |                                         |
-| (RHOCP)                                    |                                         |
-+--------------------------------------------+-----------------------------------------+
+
+Single Node
+===========
+
+.. toctree::
+   :maxdepth: 1
+
+   deploy/xeon
+   deploy/gaudi
+   deploy/nvidia
+   deploy/AIPC
+
+Kubernetes
+==========
+
+* Xeon & Gaudi with GMC
+* Xeon & Gaudi without GMC
+* Using Helm Charts
+
+Cloud Native
+============
+
+* Red Hat OpenShift Container Platform (RHOCP)
 
 Troubleshooting
 ***************
 
-Monitoring 
+Monitoring
 **********
 
 Evaluate performance and accuracy
 
 Summary and Next Steps
-**********************
+**********************
@@ -0,0 +1,7 @@
+.. _ChatQnA_deploy_aiPC:
+
+
+Single Node On-Prem Deployment: AI PC
+#####################################
+
+TODO
@@ -0,0 +1,7 @@
+.. _ChatQnA_deploy_gaudi:
+
+
+Single Node On-Prem Deployment: Gaudi Servers
+#############################################
+
+TODO
@@ -0,0 +1,7 @@
+.. _ChatQnA_deploy_nvidia:
+
+
+Single Node On-Prem Deployment: NVIDIA GPUs
+###########################################
+
+TODO
@@ -1,42 +1,73 @@
 .. _ChatQnA_deploy_xeon:
 
 
-Single Node On-Prem Deployment
-##############################
+Single Node On-Prem Deployment: XEON Scalable Processors
+########################################################
 
 e.g use case:
 Should provide context for selecting between vLLM and TGI.
 
-.. tabs:: Deploy with docker compose with vLLM 
+.. tabs::
 
-   .. tab:: 
+   .. tab:: Deploy with Docker compose with vLLM
 
-    The section must cover how the above said archi can be implemented with vllm mode, or the serving model chosen. Show an Basic E2E end case set up with 1 type of DB for e.g Redis based on what is already covered in chatqna example( others can be called out or referenced to accordingly), Show how to use one SOTA model, for llama3 and others with a sample configuration. The use outcome must demonstrate on a real use case showing both productivity and performance. For consistency, lets use the OPEA documentation for RAG use cases
-    Sample titles:
-    1.	Overview
-    Talk a few lines of what is expected in this tutorial. Forer.g. Redis db used and llama3 model run to showcase an e2e use case using OPEA and vllm……..
-    #.	Pre-requisites
-    Includes cloning the repos, pulling the necessary containers if available (UI, pipeline ect), setting the env variables like proxys, getting access to model weights, get tokens on hf, lg etc. sanity checks if needed. Etc. 
-    #.	Prepare (Building / Pulling)  Docker images 
-        a)	This step will involve building/pulling ( maybe in future) relevant docker images with step-by-step process along with sanity check in the end
-        #)	If customization is needed, we show 1 case of how to do it
+      TODO: The section must cover how the above said archi can be implemented
+      with vllm mode, or the serving model chosen. Show an Basic E2E end case
+      set up with 1 type of DB for e.g Redis based on what is already covered in
+      chatqna example( others can be called out or referenced to accordingly),
+      Show how to use one SOTA model, for llama3 and others with a sample
+      configuration. The use outcome must demonstrate on a real use case showing
+      both productivity and performance. For consistency, lets use the OPEA
+      documentation for RAG use cases
 
-    #.	Use case setup
-        a)	This section will include how to  get the data and other dependencies needed, followed by all the micoservice envs ready. Use this section to also talk about how to set other models if needed, how to use other dbs etc
+      Sample titles:
 
-    #.	Deploy chatqna use case based on the docker_compose
-        a)	This should cover the steps involved in starting the microservices and megaservies, also explaining some key highlights of what’s covered in the docker compose. Include sanity checks as needed. Each microservice/megaservice start command along with what it does and the expected output will be good to add
+      1. Overview
+         Talk a few lines of what is expected in this tutorial. Forer.g. Redis
+         db used and llama3 model run to showcase an e2e use case using OPEA and
+         vllm.
+      #. Pre-requisites
+         Includes cloning the repos, pulling the necessary containers if
+         available (UI, pipeline ect), setting the env variables like proxys,
+         getting access to model weights, get tokens on hf, lg etc. sanity
+         checks if needed. Etc.
+      #. Prepare (Building / Pulling)  Docker images
+         a) This step will involve building/pulling ( maybe in future) relevant docker images with step-by-step process along with sanity check in the end
+         #) If customization is needed, we show 1 case of how to do it
 
-    #.	Interacting with ChatQnA deployment. ( or navigating chatqna workflow)
-
-    This section to cover how to use a different machine to interact and validate the microservice and walk through how to navigate each services. For e.g uploading local document for data prep and how to get answers? Customer will be interested in getting the output for a query, and a time also measure the quality of the model and the perf metrics( Health and Statistics to also be covered). Please check if these details can also be curled in the endpoints. Is uploading templates available now?. Custom  template is available today
-    Show all the customization available and features
+      #. Use case setup
 
-    #.	Additional Capabilities (optional)
-    Use case specific features to call out
+         This section will include how to  get the data and other
+         dependencies needed, followed by all the micoservice envs ready. Use
+         this section to also talk about how to set other models if needed, how
+         to use other dbs etc
 
-    #.	Launch the UI service
-    Show steps how to launch the UI and a sample screenshot of query and output
+      #. Deploy chatqna use case based on the docker_compose
+
+         This should cover the steps involved in starting the microservices
+         and megaservies, also explaining some key highlights of what’s covered
+         in the docker compose. Include sanity checks as needed. Each
+         microservice/megaservice start command along with what it does and the
+         expected output will be good to add
+
+      #. Interacting with ChatQnA deployment. ( or navigating chatqna workflow)
+
+         This section to cover how to use a different machine to interact and
+         validate the microservice and walk through how to navigate each
+         services. For e.g uploading local document for data prep and how to get
+         answers? Customer will be interested in getting the output for a query,
+         and a time also measure the quality of the model and the perf metrics(
+         Health and Statistics to also be covered). Please check if these
+         details can also be curled in the endpoints. Is uploading templates
+         available now?. Custom  template is available today
+
+         Show all the customization available and features
+
+      #. Additional Capabilities (optional)
+         Use case specific features to call out
+
+      #. Launch the UI service
+         Show steps how to launch the UI and a sample screenshot of query and output
 
 
    .. tab:: Deploy with docker compose with TGI
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		* [email protected] [email protected] [email protected] [email protected] [email protected]
		* [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]