sambanova · snova-feiwu · Sep 18, 2024 · Sep 18, 2024 · Sep 20, 2024 · Sep 20, 2024
diff --git a/router/README.md b/router/README.md
@@ -0,0 +1,146 @@
+<a href="https://sambanova.ai/">
+<picture>
+ <source media="(prefers-color-scheme: dark)" srcset="../images/SambaNova-light-logo-1.png" height="60">
+  <img alt="SambaNova logo" src="../images/SambaNova-dark-logo-1.png" height="60">
+</picture>
+</a>
+
+Router
+======================
+
+Questions? Just <a href="https://discord.gg/54bNAqRw" target="_blank">message us</a> on Discord <a href="https://discord.gg/54bNAqRw" target="_blank"><img src="https://github.com/sambanova/ai-starter-kit/assets/150964187/aef53b52-1dc0-4cbf-a3be-55048675f583" alt="Discord" width="22"/></a> or <a href="https://github.com/sambanova/ai-starter-kit/issues/new/choose" target="_blank">create an issue</a> in GitHub. We're happy to help live!
+
+Table of Contents:
+<!-- TOC -->
+- [Router](#Router)
+- [Overview](#overview)
+- [Before you begin](#before-you-begin)
+    - [Clone this repository](#clone-this-repository)
+    - [Set up the models, environment variables and config file](#set-up-the-models-environment-variables-and-config-file)
+        - [Set up the generative model](#set-up-the-generative-model)
+        - [Set up the embedding model](#set-up-the-embedding-model)
+    - [Install dependencies](#install-dependencies)
+    - [Windows requirements](#use-the-starter-kit)
+- [Use the starter kit](#use-the-starter-kit)
+- [Customizing the starter kit](#customizing-the-starter-kit)
+- [Third-party tools and data sources](#third-party-tools-and-data-sources)
+
+<!-- /TOC -->
+
+# Overview
+This AI Starter Kit is an example of routing a user query to different RAG pipeline or LLM based on keywords from the datasource.
+
+The Kit includes:
+- An implementation of a keyword extractor to extract keywords from documents
+- An implementation of a workflow to route user query to different pipeline 
+
+# Before you begin
+
+You have to set up your environment before you can run or customize the starter kit. 
+
+## Clone this repository
+
+Clone the starter kit repo.
+```bash
+git clone https://github.com/sambanova/ai-starter-kit.git
+```
+
+## Set up the models, environment variables and config file
+
+### Set up the generative model
+
+The next step is to set up your environment variables to use one of the inference models available from SambaNova. You can obtain a free API key through SambaNova Cloud. Alternatively, if you are a current SambaNova customer, you can deploy your models using SambaStudio.
+
+- **SambaNova Cloud (Option 1)**: Follow the instructions [here](../README.md#use-sambanova-cloud-option-1) to set up your environment variables.
+    Then, in the [config file](./config.yaml), set the llm `api` variable to `"sncloud"` and set the `select_expert` config depending on the model you want to use.
+
+- **SambaStudio (Option 2)**: Follow the instructions [here](../README.md#use-sambastudio-option-2) to set up your endpoint and environment variables.
+    Then, in the [config file](./config.yaml), set the llm `api` variable to `"sambastudio"`, and set the `CoE` and `select_expert` configs if you are using a CoE endpoint.
+
+### Set up the embedding model
+
+You have the following options to set up your embedding model:
+
+* **CPU embedding model (Option 1)**: In the [config file](./config.yaml), set the variable `type` in `embedding_model` to `"cpu"`.
+
+* **SambaStudio embedding model (Option 2)**: To increase inference speed, you can use a SambaStudio embedding model endpoint instead of using the default (CPU) Hugging Face embedding. Follow the instructions [here](../README.md#use-sambastudio-embedding-option-2) to set up your endpoint and environment variables. Then, in the [config file](./config.yaml), set the variable `type` in `embedding_model` to `"sambastudio"`, and set the configs `batch_size`, `coe` and `select_expert` according to your SambaStudio endpoint.
+
+## Install dependencies
+
+We recommend that you run the starter kit in a virtual environment.
+
+NOTE: python 3.9 or higher is required to use this kit.
+
+Install the python dependencies in your project environment.
+
+```bash
+cd ai_starter_kit/router
+python3 -m venv router_env
+source router_env/bin/activate
+pip  install  -r  requirements.txt
+```
+
+## Windows requirements
+
+- If you are using Windows, make sure your system has Microsoft Visual C++ Redistributable installed. You can install it from [Microsoft Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) and make sure to check all boxes regarding C++ section. (Compatible versions: 2015, 2017, 2019 or 2022)
+
+
+# Use the starter kit 
+
+After you've set up the environment, you can use the starter kit. Follow these steps:
+
+1. Put your documents under the [data](./data/) folder.
+
+2. Update the `keyword_path` under `router` in the [config file](./config.yaml).
+
+2. We provide an example to call the router and connect it with a RAG pipeline in [notebook/RAG_with_router.ipynb](./notebook/RAG_with_router.ipynb).
+
+# Customizing the starter kit
+You can further customize the starter kit based on the use case.
+
+## Customize the keyword extractor method
+
+The [keyword extractor](./src/keyword_extractor.py) provides two methods to extract keywords:
+
+* Use the [KeyBert](https://github.com/MaartenGr/KeyBERT) library. It uses BERT-embeddings and cosine similarity to find the sub-phrases in a document that are the most similar to the document itself.
+
+* Use a generative language model. It uses prompt engineering to guide the LLM model to find keywords from documents.
+
+*  Keywords can be extracted more efficiently by finding similarities between documents. We assume that highly similar documents will have the same keywords, so we extract keywords from only one document in each cluster and assign the keywords to all documents in the same cluster. To enble this feature, please set `use_clusters=True` under `router` in the [config file](./config.yaml).
+
+## Customize the embedding model
+
+By default, the keywords are exrtacted using a BERT-based embedding model. To change the embedding model, do the following:
+
+* If using CPU embedding (i.e., `type` in `embedding_model` is set to `"cpu"` in the [config file](./config.yaml)), [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) from HuggingFaceInstruct is used by default. If you want to use another model, you will need to manually modify the `EMBEDDING_MODEL` variable and the `load_embedding_model()` function in the [api_gateway.py](../utils/model_wrappers/api_gateway.py). 
+* If using SambaStudio embedding (i.e., `type` in `embedding_model` is set to `"sambastudio"` in the [config file](./config.yaml)), you will need to change the SambaStudio endpoint and/or the configs `batch_size`, `coe` and `select_expert` in the config file. 
+
+## Customize the LLM model and/or use it to extract keywords
+
+To change the LLM model or modify the parameters for calling the model, make changes to the `router` in [config file](./config.yaml).
+
+The prompt for the model can be customized in [prompts/rag_routing_prompt.yaml](./prompts/rag_routing_prompt.yaml).
+
+You can also use your own yaml file by placing the file under [prompts](./prompts) folder and changing the path of `router_prompt` in [config file](./config.yaml).
+
+This LLM model can be applied to extract keywords by setting `use_llm=True` and `use_bert=False` in [config file](./config.yaml)
+
+The prompt for the model can be customized in [prompts/keyword_extractor_prompt.yaml](./prompts/keyword_extractor_prompt.yaml)
+
+## Customize the keyphrase extraction
+
+The keyword extractor uses [KeyphraseVectorizers](https://github.com/TimSchopf/KeyphraseVectorizers) to extract keyphrase from documents. You can choose other keyphrase extration methods by changing the `vectorizer` in `extract_keywords` function in [keyword_extractor.py](./src/keyword_extractor.py).
+
+```bash
+if use_vectorizer:
+    vectorizer = KeyphraseTfidfVectorizer()
+    keyphrase_ngram_range = None
+```
+
+## Customize the RAG pipeline
+
+The RAG pipeline uses functions in [document_retrieval.py](../enterprise_knowledge_retriever/src/document_retrieval.py). Please refer to [enterprise_knowledge_retriever](../enterprise_knowledge_retriever/README.md) for how to customize the RAG.
+
+# Third-party tools and data sources
+
+All the packages/tools are listed in the `requirements.txt` file in the project directory.
diff --git a/router/config.yaml b/router/config.yaml
@@ -0,0 +1,41 @@
+api: "sncloud"  #  set either sambastudio or sncloud
+
+embedding_model: 
+    "type": "sambastudio" # set either sambastudio or cpu
+    "batch_size": 1 #set depending of your endpoint configuration (1 if CoE embedding expert)
+    "coe": False #set true if using Sambastudio embeddings in a CoE endpoint 
+    "select_expert": "e5-mistral-7b-instruct" #set if using SambaStudio CoE embedding expert
+
+router: 
+    "type": "sncloud" # set either sambastudio or sncloud
+    "temperature": 0.0
+    "do_sample": False
+    "max_tokens_to_generate": 1200
+    "coe": True #set as true if using Sambastudio CoE endpoint
+    "select_expert": "llama3-8b" #set if using sncloud, or SambaStudio CoE llm expert
+    "document_folder": "router/data" # path of documents
+    "keyword_path": "router/keywords/keywords_3.pkl" # path to save keywords
+    "use_clusters": False # set True if extract keywords from only one document in each cluster
+    "use_bert": True # set True if use embedding and cosine similarity to extract keywords
+    "use_llm": False # set True if use llm to extract keywords
+
+llm: 
+    "temperature": 0.0
+    "do_sample": False
+    "max_tokens_to_generate": 1200
+    "coe": True #set as true if using Sambastudio CoE endpoint
+    "select_expert": "llama3-8b" #set if using sncloud, or SambaStudio CoE llm expert
+    #sncloud CoE expert name -> "llama3-8b"
+
+retrieval:
+    "k_retrieved_documents": 15 #set if rerank enabled 
+    "score_threshold": 0.2
+    "rerank": False # set if you want to rerank retriever results 
+    "reranker": 'BAAI/bge-reranker-large' # set if you rerank enabled
+    "final_k_retrieved_documents": 5
+
+prompts: 
+    "router_prompt": "router/prompts/rag_routing_prompt.yaml"
+    "qa_prompt": "enterprise_knowledge_retriever/prompts/qa_prompt.yaml"
+    "kw_etr_prompt": "router/prompts/keyword_extractor_prompt.yaml"
+
diff --git a/router/data/sambatune_run-sambatune.txt b/router/data/sambatune_run-sambatune.txt
@@ -0,0 +1,183 @@
+# Run SambaTune and examine reports
+
+After installation, you can run SambaTune from the command line.
+
+__ |  You have to run SambaTune with the application yaml file as input before
+you can see results of the performance analysis.  
+---|---  
+
+## Overview
+
+Running Sambatune includes running `sambatune` and running `sambatune_ui`.
+
+  1. First you run `sambatune` and pass in a YAML file for your model. See Run the sample application.
+
+  2. Then you can run the `sambatune_ui` command. See Run the SambaTune GUI.
+
+  3. Finally, you can [explore with the SambaTune GUI](gui-index.html) and [examine SambaTune reports](reports-index.html).
+
+## Run the sample application
+
+A sample application, `linear_net.py` is included with your installation at
+`/opt/sambaflow/apps/micros/linear_net.py`. The application requires that the
+`sambaflow-apps-micros` package is installed.
+
+To run the `linear_net.py` sample application:
+
+  1. Log in to the Linux console of a host that is attached to the DataScale hardware.
+
+  2. Run the application. You have several options:
+
+    * Run the application in benchmarking mode (the default):
+
+                $ sambatune linear_net.yaml
+
+where `linear.yaml` is a user-specified configuration file that is included in
+the `sambaflow-apps-micros` package. Here's an example:
+
+
+                app: /path/to/linear.py
+        model-args: -b 128 -mb 64 --in-features 512 --out-features 128
+        compile-args: compile --plot
+        run-args: -n 10000
+
+    * Run the application in instrument-only mode. The space after `--` is required.
+
+                $ sambatune --modes instrument -- /opt/sambaflow/sambatune/configs/linear_net.yaml
+
+    * Run in all modes. The space after `--` is required.
+
+                $ sambatune --modes benchmark instrument run -- /opt/sambaflow/sambatune/configs/linear_net.yaml
+
+Run `sambatune --help` for a list of all options. See SambaTune input
+arguments for details on configuration options.
+
+## Understand how SambaTune collects data
+
+When you run the sample application:
+
+  1. SambaTune compiles the application with the user-specified `model-args` , `compile-args` and SambaFlow-supported instrumentation flags.
+
+  2. After successful compile, SambaTune:
+
+    1. Runs the application on the RDU and collects performance data.
+
+    2. Runs the application in benchmark mode with user-specified `run-args` to collect latency, throughput, and hardware utilization statistics.
+
+  3. At the end of a successful run, SambaTune:
+
+    1. Collates compile-time and run-time statistics.
+
+    2. Generates performance reports. See [Explore SambaTune Reports](reports-index.html).
+
+    3. Displays the reports in the SambaTune GUI to help you identify potential hotspots. See [Explore with the SambaTune GUI](gui-index.html).
+
+## SambaTune input arguments
+
+You can customize your SambaTune run with the following input arguments:
+
+Table 1. SambaTune input arguments Option | Description | Dependencies | Type  
+---|---|---|---  
+
+app
+
+Name of the application.
+
+string  
+
+compile-args
+
+Arguments to pass to the SambaFlow compiler. `compile-args` \+ `model-args`
+are used for compilation (generating the PEF file).
+
+app
+
+string  
+
+model-args
+
+Arguments to pass for running a specific model, like batch size. `compile-
+args` \+ `model-args` are used for compilation (generating the PEF file).
+
+string  
+
+run-args
+
+Arguments to pass when running the app that are used in addition to model-
+args, for example, learning rate. The `run-args` and `model-args` are both
+used when you run the model (represented by the PEF file).
+
+string  
+
+env
+
+Runtime environment variables (optional). See Table 2
+
+dict  
+
+For subprocesses that are created by SambaTune, you can configure the
+following environment variables:
+
+Table 2. Environment variables for SambaTune subprocesses Option | Description
+| Type  
+---|---|---  
+
+SF_RNT_FSM_POLL_BUSY_WAIT
+
+1 to enable Graph completion busy wait
+
+int  
+
+SF_RNT_DMA_POLL_BUSY_WAIT
+
+1 to enable DMA completion busy wait
+
+int  
+
+## Run the SambaTune GUI
+
+The SambaTune GUI allows you to read the reports that are generated by one or
+more SambaTune runs in a web browser.
+
+__ |  You install the SambaTune GUI on the **client** system where the web
+browser runs. Unlike the **host** system, the client does not have direct
+access to RDU.  
+---|---  
+
+For release 1.16 of SambaTune, contact SambaNova customer support through the
+SambaNova support portal at <https://support.sambanova.ai> for client install
+instructions.
+
+  1. On the machine where you installed the SambaTune GUI package, call `sambatune_ui`.
+
+You can specify some arguments to this command. Run `sambatune_ui --help` to
+see the list of arguments.
+
+  2. When the `sambatune_ui` command completes, you see a URL, username, and password for accessing the GUI. Note down the password, which changes each time you call the `sambatune_ui` command.
+
+You can now examine the results of the SambaTune run in the SambaTune GUI. See
+[Explore with the SambaTune GUI](gui-index.html).
+
+## Troubleshooting
+
+This section has troubleshooting information.
+
+**Symptom**
+
+SambaTune encountered an error during the run.
+
+**Explanation**
+
+A SambaTune run may encounter errors due to any number of reasons, ranging
+from incorrect input configuration to compile error to run or post-processing
+error.
+
+All run related information is saved to the output directory
+(`$DUMP_ROOT/artifact_root/sambatune_gen/<input_config_name>_<timestamp>`).
+The status of the run can be checked in run.log or status_summary.log. The
+details of a failed step can be checked in status_debug.log. For assistance,
+contact Customer Support and provide the compressed output directory for
+further diagnosis.
+
+
+