This example demonstrates how work with multimodal data. It showcases multimodal parsing of documents - images, tables, text through multimodal LLM APIs residing in Nvidia API Catalog. The example generates image descriptions using VLMs as shown in the diagram below. The example works with PDF, PPTX, and PNG files. The chain server extracts information from the files such as graphs and plots, as well as text and tables.
Model | Embedding | Framework | Vector Database | File Types |
---|---|---|---|---|
meta/llama3-8b-instruct for response generation, google/Deplot for graph to text convertion and Neva-22B for image to text convertion | nvidia/nv-embedqa-e5-v5 | LangChain | Milvus | PDF, PPTX, PNG |
Complete the common prerequisites.
-
Export your NVIDIA API key as an environment variable:
export NVIDIA_API_KEY="nvapi-<...>"
-
Start the containers:
cd RAG/examples/advanced_rag/multimodal_rag/ docker compose up -d --build
Example Output
✔ Network nvidia-rag Created ✔ Container rag-playground Started ✔ Container milvus-minio Started ✔ Container chain-server Started ✔ Container milvus-etcd Started ✔ Container milvus-standalone Started
-
Confirm the containers are running:
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
Example Output
CONTAINER ID NAMES STATUS 39a8524829da rag-playground Up 2 minutes bfbd0193dbd2 chain-server Up 2 minutes ec02ff3cc58b milvus-standalone Up 3 minutes 6969cf5b4342 milvus-minio Up 3 minutes (healthy) 57a068d62fbb milvus-etcd Up 3 minutes (healthy)
-
Open a web browser and access http://localhost:8090 to use the RAG Playground.
Refer to Using the Sample Web Application for information about uploading documents and using the web interface.
- Vector Database Customizations
- Stop the containers by running
docker compose down
. - Use the RAG Application: Multimodal Chatbot Helm chart to deploy this example in Kubernetes.