Skip to content

Retrieval augmented generation demos with open-source DeepSeek, Llama, Qwen, Mistral, Gemma

License

Notifications You must be signed in to change notification settings

kesamet/retrieval-augmented-generation

Repository files navigation

Retrieval augmented generation with quantized LLM

Retrieval augmented generation (RAG) demos with DeepSeek, Qwen, Aya-Expanse, Mistral, Gemma, Llama, Phi

The demos use quantized models and run on CPU with acceptable inference time. They can run offline without Internet access, thus allowing deployment in an air-gapped environment.

The demos also allow user to

  • apply propositionizer to document chunks
  • perform reranking upon retrieval
  • perform hypothetical document embedding (HyDE)

🔧 Getting Started

You will need to set up your development environment using conda, which you can install directly.

conda env create --name rag python=3.11
conda activate rag
pip install -r requirements.txt

We shall use unstructured to process PDFs. Refer to nstallation Instructions for Local Development.

You would also need to download punkt_tab and averaged_perceptron_tagger_eng from nltk.

import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

Note that we shall only use strategy="fast" in this demo. WIP for extraction of tables from PDFs.

Activate the environment.

conda activate rag

🧠 Use different LLMs

Download and save the models in ./models and update config.yaml. The models used in this demo are:

The LLMs can be loaded directly in the app, or they can be first deployed with Ollama.

Add prompt format

Since each model type has its own prompt format, include the format in ./src/prompt_templates.py.

🤖 Tracing

We shall use Phoenix for LLM tracing. Phoenix is an open-source observability library designed for experimentation, evaluation, and troubleshooting. Before running the app, start a phoenix server

python3 -m phoenix.server.main serve

The traces can be viewed at http://localhost:6006.

💻 App

We use Streamlit as the interface for the demos. There are three demos:

  • Conversational Retrieval QA
streamlit run app_conv.py
  • Retrieval QA
streamlit run app_qa.py
  • Conversational Retrieval QA using ReAct

Create vectorstore first and update config.yaml

python -m vectorize --filepaths <your-filepath>

Run the app

streamlit run app_react.py

🔍 Usage

To get started, upload a PDF and click on Build VectorDB. Creating vector DB will take a while.

screenshot

About

Retrieval augmented generation demos with open-source DeepSeek, Llama, Qwen, Mistral, Gemma

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages