Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Router #346

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions router/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
<a href="https://sambanova.ai/">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="../images/SambaNova-light-logo-1.png" height="60">
<img alt="SambaNova logo" src="../images/SambaNova-dark-logo-1.png" height="60">
</picture>
</a>

Router
======================

Questions? Just <a href="https://discord.gg/54bNAqRw" target="_blank">message us</a> on Discord <a href="https://discord.gg/54bNAqRw" target="_blank"><img src="https://github.com/sambanova/ai-starter-kit/assets/150964187/aef53b52-1dc0-4cbf-a3be-55048675f583" alt="Discord" width="22"/></a> or <a href="https://github.com/sambanova/ai-starter-kit/issues/new/choose" target="_blank">create an issue</a> in GitHub. We're happy to help live!

Table of Contents:
<!-- TOC -->
- [Router](#Router)
- [Overview](#overview)
- [Before you begin](#before-you-begin)
- [Clone this repository](#clone-this-repository)
- [Set up the models, environment variables and config file](#set-up-the-models-environment-variables-and-config-file)
- [Set up the generative model](#set-up-the-generative-model)
- [Set up the embedding model](#set-up-the-embedding-model)
- [Install dependencies](#install-dependencies)
- [Windows requirements](#use-the-starter-kit)
- [Use the starter kit](#use-the-starter-kit)
- [Customizing the starter kit](#customizing-the-starter-kit)
- [Third-party tools and data sources](#third-party-tools-and-data-sources)

<!-- /TOC -->

# Overview
This AI Starter Kit is an example of routing a user query to different RAG pipeline or LLM based on keywords from the datasource.

The Kit includes:
- An implementation of a keyword extractor to extract keywords from documents
- An implementation of a workflow to route user query to different pipeline

# Before you begin

You have to set up your environment before you can run or customize the starter kit.

## Clone this repository

Clone the starter kit repo.
```bash
git clone https://github.com/sambanova/ai-starter-kit.git
```

## Set up the models, environment variables and config file

### Set up the generative model

The next step is to set up your environment variables to use one of the inference models available from SambaNova. You can obtain a free API key through SambaNova Cloud. Alternatively, if you are a current SambaNova customer, you can deploy your models using SambaStudio.

- **SambaNova Cloud (Option 1)**: Follow the instructions [here](../README.md#use-sambanova-cloud-option-1) to set up your environment variables.
Then, in the [config file](./config.yaml), set the llm `api` variable to `"sncloud"` and set the `select_expert` config depending on the model you want to use.

- **SambaStudio (Option 2)**: Follow the instructions [here](../README.md#use-sambastudio-option-2) to set up your endpoint and environment variables.
Then, in the [config file](./config.yaml), set the llm `api` variable to `"sambastudio"`, and set the `CoE` and `select_expert` configs if you are using a CoE endpoint.

### Set up the embedding model

You have the following options to set up your embedding model:

* **CPU embedding model (Option 1)**: In the [config file](./config.yaml), set the variable `type` in `embedding_model` to `"cpu"`.

* **SambaStudio embedding model (Option 2)**: To increase inference speed, you can use a SambaStudio embedding model endpoint instead of using the default (CPU) Hugging Face embedding. Follow the instructions [here](../README.md#use-sambastudio-embedding-option-2) to set up your endpoint and environment variables. Then, in the [config file](./config.yaml), set the variable `type` in `embedding_model` to `"sambastudio"`, and set the configs `batch_size`, `coe` and `select_expert` according to your SambaStudio endpoint.

## Install dependencies

We recommend that you run the starter kit in a virtual environment.

NOTE: python 3.9 or higher is required to use this kit.

Install the python dependencies in your project environment.

```bash
cd ai_starter_kit/router
python3 -m venv router_env
source router_env/bin/activate
pip install -r requirements.txt
```

## Windows requirements

- If you are using Windows, make sure your system has Microsoft Visual C++ Redistributable installed. You can install it from [Microsoft Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) and make sure to check all boxes regarding C++ section. (Compatible versions: 2015, 2017, 2019 or 2022)


# Use the starter kit

After you've set up the environment, you can use the starter kit. Follow these steps:

1. Put your documents under the [data](./data/) folder.

2. Update the `keyword_path` under `router` in the [config file](./config.yaml).

2. We provide an example to call the router and connect it with a RAG pipeline in [notebook/RAG_with_router.ipynb](./notebook/RAG_with_router.ipynb).

# Customizing the starter kit
You can further customize the starter kit based on the use case.

## Customize the keyword extractor method

The [keyword extractor](./src/keyword_extractor.py) provides two methods to extract keywords:

* Use the [KeyBert](https://github.com/MaartenGr/KeyBERT) library. It uses BERT-embeddings and cosine similarity to find the sub-phrases in a document that are the most similar to the document itself.

* Use a generative language model. It uses prompt engineering to guide the LLM model to find keywords from documents.

* Keywords can be extracted more efficiently by finding similarities between documents. We assume that highly similar documents will have the same keywords, so we extract keywords from only one document in each cluster and assign the keywords to all documents in the same cluster. To enble this feature, please set `use_clusters=True` under `router` in the [config file](./config.yaml).

## Customize the embedding model

By default, the keywords are exrtacted using a BERT-based embedding model. To change the embedding model, do the following:

* If using CPU embedding (i.e., `type` in `embedding_model` is set to `"cpu"` in the [config file](./config.yaml)), [e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) from HuggingFaceInstruct is used by default. If you want to use another model, you will need to manually modify the `EMBEDDING_MODEL` variable and the `load_embedding_model()` function in the [api_gateway.py](../utils/model_wrappers/api_gateway.py).
* If using SambaStudio embedding (i.e., `type` in `embedding_model` is set to `"sambastudio"` in the [config file](./config.yaml)), you will need to change the SambaStudio endpoint and/or the configs `batch_size`, `coe` and `select_expert` in the config file.

## Customize the LLM model and/or use it to extract keywords

To change the LLM model or modify the parameters for calling the model, make changes to the `router` in [config file](./config.yaml).

The prompt for the model can be customized in [prompts/rag_routing_prompt.yaml](./prompts/rag_routing_prompt.yaml).

You can also use your own yaml file by placing the file under [prompts](./prompts) folder and changing the path of `router_prompt` in [config file](./config.yaml).

This LLM model can be applied to extract keywords by setting `use_llm=True` and `use_bert=False` in [config file](./config.yaml)

The prompt for the model can be customized in [prompts/keyword_extractor_prompt.yaml](./prompts/keyword_extractor_prompt.yaml)

## Customize the keyphrase extraction

The keyword extractor uses [KeyphraseVectorizers](https://github.com/TimSchopf/KeyphraseVectorizers) to extract keyphrase from documents. You can choose other keyphrase extration methods by changing the `vectorizer` in `extract_keywords` function in [keyword_extractor.py](./src/keyword_extractor.py).

```bash
if use_vectorizer:
vectorizer = KeyphraseTfidfVectorizer()
keyphrase_ngram_range = None
```

## Customize the RAG pipeline

The RAG pipeline uses functions in [document_retrieval.py](../enterprise_knowledge_retriever/src/document_retrieval.py). Please refer to [enterprise_knowledge_retriever](../enterprise_knowledge_retriever/README.md) for how to customize the RAG.

# Third-party tools and data sources

All the packages/tools are listed in the `requirements.txt` file in the project directory.
41 changes: 41 additions & 0 deletions router/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
api: "sncloud" # set either sambastudio or sncloud

embedding_model:
"type": "sambastudio" # set either sambastudio or cpu
"batch_size": 1 #set depending of your endpoint configuration (1 if CoE embedding expert)
"coe": False #set true if using Sambastudio embeddings in a CoE endpoint
"select_expert": "e5-mistral-7b-instruct" #set if using SambaStudio CoE embedding expert

router:
"type": "sncloud" # set either sambastudio or sncloud
"temperature": 0.0
"do_sample": False
"max_tokens_to_generate": 1200
"coe": True #set as true if using Sambastudio CoE endpoint
"select_expert": "llama3-8b" #set if using sncloud, or SambaStudio CoE llm expert
"document_folder": "router/data" # path of documents
"keyword_path": "router/keywords/keywords_3.pkl" # path to save keywords
"use_clusters": False # set True if extract keywords from only one document in each cluster
"use_bert": True # set True if use embedding and cosine similarity to extract keywords
"use_llm": False # set True if use llm to extract keywords

llm:
"temperature": 0.0
"do_sample": False
"max_tokens_to_generate": 1200
"coe": True #set as true if using Sambastudio CoE endpoint
"select_expert": "llama3-8b" #set if using sncloud, or SambaStudio CoE llm expert
#sncloud CoE expert name -> "llama3-8b"

retrieval:
"k_retrieved_documents": 15 #set if rerank enabled
"score_threshold": 0.2
"rerank": False # set if you want to rerank retriever results
"reranker": 'BAAI/bge-reranker-large' # set if you rerank enabled
"final_k_retrieved_documents": 5

prompts:
"router_prompt": "router/prompts/rag_routing_prompt.yaml"
"qa_prompt": "enterprise_knowledge_retriever/prompts/qa_prompt.yaml"
"kw_etr_prompt": "router/prompts/keyword_extractor_prompt.yaml"

183 changes: 183 additions & 0 deletions router/data/sambatune_run-sambatune.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# Run SambaTune and examine reports

After installation, you can run SambaTune from the command line.

__ | You have to run SambaTune with the application yaml file as input before
you can see results of the performance analysis.
---|---

## Overview

Running Sambatune includes running `sambatune` and running `sambatune_ui`.

1. First you run `sambatune` and pass in a YAML file for your model. See Run the sample application.

2. Then you can run the `sambatune_ui` command. See Run the SambaTune GUI.

3. Finally, you can [explore with the SambaTune GUI](gui-index.html) and [examine SambaTune reports](reports-index.html).

## Run the sample application

A sample application, `linear_net.py` is included with your installation at
`/opt/sambaflow/apps/micros/linear_net.py`. The application requires that the
`sambaflow-apps-micros` package is installed.

To run the `linear_net.py` sample application:

1. Log in to the Linux console of a host that is attached to the DataScale hardware.

2. Run the application. You have several options:

* Run the application in benchmarking mode (the default):

$ sambatune linear_net.yaml

where `linear.yaml` is a user-specified configuration file that is included in
the `sambaflow-apps-micros` package. Here's an example:


app: /path/to/linear.py
model-args: -b 128 -mb 64 --in-features 512 --out-features 128
compile-args: compile --plot
run-args: -n 10000

* Run the application in instrument-only mode. The space after `--` is required.

$ sambatune --modes instrument -- /opt/sambaflow/sambatune/configs/linear_net.yaml

* Run in all modes. The space after `--` is required.

$ sambatune --modes benchmark instrument run -- /opt/sambaflow/sambatune/configs/linear_net.yaml

Run `sambatune --help` for a list of all options. See SambaTune input
arguments for details on configuration options.

## Understand how SambaTune collects data

When you run the sample application:

1. SambaTune compiles the application with the user-specified `model-args` , `compile-args` and SambaFlow-supported instrumentation flags.

2. After successful compile, SambaTune:

1. Runs the application on the RDU and collects performance data.

2. Runs the application in benchmark mode with user-specified `run-args` to collect latency, throughput, and hardware utilization statistics.

3. At the end of a successful run, SambaTune:

1. Collates compile-time and run-time statistics.

2. Generates performance reports. See [Explore SambaTune Reports](reports-index.html).

3. Displays the reports in the SambaTune GUI to help you identify potential hotspots. See [Explore with the SambaTune GUI](gui-index.html).

## SambaTune input arguments

You can customize your SambaTune run with the following input arguments:

Table 1. SambaTune input arguments Option | Description | Dependencies | Type
---|---|---|---

app

Name of the application.

string

compile-args

Arguments to pass to the SambaFlow compiler. `compile-args` \+ `model-args`
are used for compilation (generating the PEF file).

app

string

model-args

Arguments to pass for running a specific model, like batch size. `compile-
args` \+ `model-args` are used for compilation (generating the PEF file).

string

run-args

Arguments to pass when running the app that are used in addition to model-
args, for example, learning rate. The `run-args` and `model-args` are both
used when you run the model (represented by the PEF file).

string

env

Runtime environment variables (optional). See Table 2

dict

For subprocesses that are created by SambaTune, you can configure the
following environment variables:

Table 2. Environment variables for SambaTune subprocesses Option | Description
| Type
---|---|---

SF_RNT_FSM_POLL_BUSY_WAIT

1 to enable Graph completion busy wait

int

SF_RNT_DMA_POLL_BUSY_WAIT

1 to enable DMA completion busy wait

int

## Run the SambaTune GUI

The SambaTune GUI allows you to read the reports that are generated by one or
more SambaTune runs in a web browser.

__ | You install the SambaTune GUI on the **client** system where the web
browser runs. Unlike the **host** system, the client does not have direct
access to RDU.
---|---

For release 1.16 of SambaTune, contact SambaNova customer support through the
SambaNova support portal at <https://support.sambanova.ai> for client install
instructions.

1. On the machine where you installed the SambaTune GUI package, call `sambatune_ui`.

You can specify some arguments to this command. Run `sambatune_ui --help` to
see the list of arguments.

2. When the `sambatune_ui` command completes, you see a URL, username, and password for accessing the GUI. Note down the password, which changes each time you call the `sambatune_ui` command.

You can now examine the results of the SambaTune run in the SambaTune GUI. See
[Explore with the SambaTune GUI](gui-index.html).

## Troubleshooting

This section has troubleshooting information.

**Symptom**

SambaTune encountered an error during the run.

**Explanation**

A SambaTune run may encounter errors due to any number of reasons, ranging
from incorrect input configuration to compile error to run or post-processing
error.

All run related information is saved to the output directory
(`$DUMP_ROOT/artifact_root/sambatune_gen/<input_config_name>_<timestamp>`).
The status of the run can be checked in run.log or status_summary.log. The
details of a failed step can be checked in status_debug.log. For assistance,
contact Customer Support and provide the compressed output directory for
further diagnosis.



Loading