OllamaPDFInsight is a small RAG (Retrieval-Augmented Generation) system that leverages Ollama to create embeddings from a PDF file, stores these embeddings in a Weaviate vector store, and uses Ollama to answer questions regarding the PDF content.
- Use LangChain document loader to turn the PDF into a set of documents.
- Create a collection from these documents in the Weaviate vector store.
- Use Ollama to generate embeddings from the documents, and store the embeddings in the collection.
- Query and retrieve information from the PDF using Ollama.
- Create a virtual environment:
pyenv install 3.8.10
pyenv virtualenv 3.8.10 ollama-pdf-insight-env
pyenv activate ollama-pdf-insight-env
- Once the virtual environment is activated, install the requirements from the
requirements.txt
file.
pip install -r requirements.txt
- Clone the repository:
git clone https://github.com/yourusername/OllamaPDFInsight.git
cd OllamaPDFInsight
- Prepare the data and the weaviate vector store.
python load_data.py
- Prepare the template, llm, and run the prompt.
python retrieve_context.py
I essentially followed the steps in this article and added my own touch to it.