GitHub - kesamet/retrieval-augmented-generation: Retrieval augmented generation demos with open-source DeepSeek, Llama, Qwen, Mistral, Gemma

Retrieval augmented generation with quantized LLM

Retrieval augmented generation (RAG) demos with DeepSeek, Qwen, Aya-Expanse, Mistral, Gemma, Llama, Phi

The demos use quantized models and run on CPU with acceptable inference time. They can run offline without Internet access, thus allowing deployment in an air-gapped environment.

The demos also allow user to

apply propositionizer to document chunks
perform reranking upon retrieval
perform hypothetical document embedding (HyDE)

🔧 Getting Started

You will need to set up your development environment using conda, which you can install directly.

conda env create --name rag python=3.11
conda activate rag
pip install -r requirements.txt

We shall use unstructured to process PDFs. Refer to nstallation Instructions for Local Development.

You would also need to download punkt_tab and averaged_perceptron_tagger_eng from nltk.

import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

Note that we shall only use strategy="fast" in this demo. WIP for extraction of tables from PDFs.

Activate the environment.

conda activate rag

🧠 Use different LLMs

Download and save the models in ./models and update config.yaml. The models used in this demo are:

The LLMs can be loaded directly in the app, or they can be first deployed with Ollama.

Add prompt format

Since each model type has its own prompt format, include the format in ./src/prompt_templates.py.

🤖 Tracing

We shall use Phoenix for LLM tracing. Phoenix is an open-source observability library designed for experimentation, evaluation, and troubleshooting. Before running the app, start a phoenix server

python3 -m phoenix.server.main serve

The traces can be viewed at http://localhost:6006.

💻 App

We use Streamlit as the interface for the demos. There are three demos:

Conversational Retrieval QA

streamlit run app_conv.py

Retrieval QA

streamlit run app_qa.py

Conversational Retrieval QA using ReAct

Create vectorstore first and update config.yaml

python -m vectorize --filepaths <your-filepath>

Run the app

streamlit run app_react.py

🔍 Usage

To get started, upload a PDF and click on Build VectorDB. Creating vector DB will take a while.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github/workflows		.github/workflows
assets		assets
data		data
experimental		experimental
models		models
src		src
streamlit_app		streamlit_app
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app_conv.py		app_conv.py
app_qa.py		app_qa.py
app_react.py		app_react.py
app_vlm.py		app_vlm.py
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
vectorize.py		vectorize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval augmented generation with quantized LLM

🔧 Getting Started

🧠 Use different LLMs

Add prompt format

🤖 Tracing

💻 App

🔍 Usage

About

Releases

Packages

Languages

License

kesamet/retrieval-augmented-generation

Folders and files

Latest commit

History

Repository files navigation

Retrieval augmented generation with quantized LLM

🔧 Getting Started

🧠 Use different LLMs

Add prompt format

🤖 Tracing

💻 App

🔍 Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages