This a template project, to demonstrate using docker compose to create a Retrieval-Augmented Generation (RAG) application using Langchain, chainlit, qdrant and ollama on a specified knowledge base.
- Install python packages used for the project
pip install -r requirements.txt
- Start Qdrant vector search database via docker
docker run -p 6333:6333 -p 6334:6334 \
-v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
qdrant/qdrant
- Start Ollama and download large language models needed, waiting for the download to complete
ollama run deepseek-r1:1.5b
- Ingest data into the Qdrant database
python utils/ingest.py
-
Confirm Qdrant collection has been created with data ingested via the Web UI @ http://localhost:6333/dashboard
-
Start Chainlit application
chainlit run main.py
The following environment variables are used by this project.
Environment Variable | Description | Default Value |
---|---|---|
QDRANT_DATABASE_URL | The Qdrant Database URL | http://localhost:6333 |
QDRANT_COLLECTION_NAME | The name of the Qdrant collection | template |
OLLAMA_URL | The Ollama host URL | http://localhost:11434 |
OLLAMA_LLM_MODEL | The Ollama model to use | deepseek-r1:1.5b |
DATA_INGESTION_LOCATION | The file path for data to be ingested | |
HUGGING_FACE_EMBED_MODEL_ID | The Hugging Face embeddings name | sentence-transformers/all-MiniLM-L6-v2 |
An alternative way of running the stack involves using docker compose, the docker-compose.yaml
contains the services needed to run this project, such as starting chainlit, qdrant and ollama.
- In the root directory start all the services.
docker compose up -d
- Access the services on the following endpoint in your browser. chainlit (http://localhost:8000/) and qdrant (http://localhost:6333/dashboard)
- An optional step to run is enabling GPU usage via docker compose, you will need to uncomment out the following lines in the yaml found under the Ollama service, providing better performance with large language models (LLM) models.
...
# Enable GPU support using host machine
# https://docs.docker.com/compose/how-tos/gpu-support/
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [ gpu ]
100% Local RAG Using LangChain, DeepSeek, Ollama, Qdrant, Docling, Huggingface & Chainlit by Data Science Basics