This repository contains a full Q&A pipeline using LangChain framework, Qdrant as vector database and Azure Function with HttpTrigger. The data used are research papers that can be loaded into the vector database, and the Azure Functions processes the request using the retrieval and generation logic. Therefore it can use any other research paper from Arxiv.
The main steps taken to build the RAG pipeline can be summarized as follows:
-
Data Ingestion: load data from https://arxiv.org
-
Indexing: RecursiveCharacterTextSplitter for indexing in chunks
-
Vector Store: Qdrant inserting metadata
-
QA Chain Retrieval: RetrievalQA
-
Azure Function: Process the request with HttpTrigger
Feel free to ⭐ and clone this repo 😉
The project has been structured with the following files:
.env_sample
: sample environmental variablesMakefile
: install requirements, formating, linting, and clean uppyproject.toml
: linting and formatting using ruffrequirements.txt:
project requirementscreate_vector_store.py:
script to create the collection in Qdrantfunction_app.py:
Azure function apprag_app.py:
RAG Logichost.json:
metadata file with configuration options for the function app for deploymentlocal.settings.json
: metadata file with configuration options for the function app locallydeploy.sh:
script to create and publish the function
The Python version used for this project is Python 3.10. You can follow along the medium article.
-
Clone the repo (or download it as a zip file):
git clone https://github.com/benitomartin/rag-azure-http.git
-
Create the virtual environment named
main-env
using Conda with Python version 3.10:conda create -n main-env python=3.10 conda activate main-env
-
Execute the
Makefile
script and install the project dependencies included in the requirements.txt:pip install -r requirements.txt or make install
-
Create Azure Account and install Azure CLI and Functions Core Tools.
-
Test the function locally
func start
curl -X POST "http://localhost:7071/api/req" \
-H "Content-Type: application/json" \
-d '{"query": "positional encoding"}'
-
Create and publish the App: Make sure the
.env
file is complete and run thedeploy.sh
scriptchmod +x deploy.sh ./deploy.sh
curl -X POST https://app-rag-function.azurewebsites.net/api/req \ -H "Content-Type: application/json" \ -d '{"query": "Positional Encoding"}'