These scripts will help you generate an example FAISS vector database for use in your flask application.
We have provided scripts to convert an Arm learning path into a series of chunk.yaml
files for use in our RAG application.
It is recommended to use a virtual environment to manage dependencies.
To create a new conda environment, use the following command:
conda create --name vectorstore python=3.11
Once set up is complete, activate the new environment:
conda activate vectorstore
Install the required packages:
conda install --file vectorstore-requirements.txt
To generate chunks, use the following command:
python chunk_a_learning_path.py --url <LEARNING_PATH_URL>
Replace <LEARNING_PATH_URL>
with the URL of the learning path you want to process. If no URL is provided, the script will default to a known learning path URL.
The script will process the specified learning path and save the chunks as YAML files in a ./chunks/
directory.
Once you have a ./chunks/
directory full of yaml files, we now need to use FAISS to create our vector database.
Ensure your local environment has your AZURE_OPENAI_KEY
and AZURE_OPENAI_ENDPOINT
set.
-
Create an OpenAI Resource:
- Go to the Azure Portal.
- Click on "Create a resource".
- Search for "OpenAI" and select "Azure OpenAI Service".
- Click "Create".
-
Configure the OpenAI Resource:
- Fill in the required details such as Subscription, Resource Group, Region, and Name.
- Click "Review + create" and then "Create" to deploy the resource.
-
Generate API Key and Endpoint:
- Once the resource is created, navigate to the resource page.
- Under the "Resource Management->Keys and Endpoint" section, you will find the key and endpoint values.
- Copy these values and set them in your local environment.
export AZURE_OPENAI_KEY="<your_openai_key>" export AZURE_OPENAI_ENDPOINT="https://<your_openai_endpoint>.openai.azure.com/"
You now have the necessary keys to use Azure OpenAI in your application.
-
Deploy text-embedding-ada-002 model
- Go inside Azure AI Foundry for your new deployment
- Under "Deployments", ensure you have a deployment for "text-embedding-ada-002"
Run the python script to create the FAISS index .bin
and .json
files.
NOTE: This assumes the chunk files are located in a chunks
subfolder, as they should automatically be.
python local_vectorstore_creation.py
Copy the generated bin
and json
files to the root directory where you deploy your Flask application.