-
Notifications
You must be signed in to change notification settings - Fork 1
LangChain
LangChain is a framework designed to streamline the development of applications powered by large language models (LLMs). It provides the tools and abstractions to build applications for tasks such as natural language processing, question answering, information retrieval, and generative AI. LangChain emphasizes modularity, making it easier to integrate the various components like models, prompts, tools, and memory to create robust applications.
Further installation instructions can be found on the LangChain
Create a requirements.txt file to list all project dependencies
-
Start by creating a requirements.txt file to specify all the essential libraries that are required for the project
-
Include dependencies like LangChain, FAISS, and Hugging Face, along with any other required packages
-
Save the following content in a file named requirements.txt located in your project directory
streamlit
jupyter
langchain
langchain-core
langchain-community
langchain-huggingface
sentence-transformers
langchain-text-splitters
langchain-mistralai
sentence-transformers
faiss-cpu
mistralai
pymilvus
pydantic==2.5.2
yake
pandas
numpy
Copy the requirements
- Copy the
requirements.txt
into the Docker Container - To make the
requirements.txt
file accessible within the Docker container, include it using the following command in your Dockerfile
COPY requirements.txt /app/requirements.txt
Install Python dependencies listed in the requirements.txt
- After adding the
requirements.txt
file to the container, run the following command to install the specified dependencies
RUN mamba install --yes --file requirements.txt && mamba clean --all -f -y
Langchain installation using pip
- Install Langchain using pip
pip install langchain
- Install additional dependencies using pip
pip install transformers
pip install torch
from dotenv import load_dotenv
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.schema import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_mistralai.chat_models import ChatMistralAI
from langchain_milvus import Milvus
from langchain_community.document_loaders import WebBaseLoader, RecursiveUrlLoader
from bs4 import BeautifulSoup
from sentence_transformers import SentenceTransformer
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
load_dotenv()
MISTRAL_API_KEY = os.environ.get("MISTRAL_API_KEY")
MILVUS_URI = "/app/milvus/milvus_vector.db"
MODEL_NAME = "sentence-transformers/all-MiniLM-L12-v2"
MAX_TEXT_LENGTH = 5000
The create_stuff_documents_chain is used to combine retrieved documents for generating the AI responses
document_chain = create_stuff_documents_chain(chat_model, prompt)
LangChain's RecursiveCharacterTextSplitter
splits the large documents into smaller chunks for processing
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=200,
is_separator_regex=False
)
docs = text_splitter.split_documents(documents)
def create_prompt():
"""
Create a prompt template for the RAG model
Returns:
PromptTemplate: The prompt template for the RAG model
"""
# Define the prompt template
PROMPT_TEMPLATE = """
You are an AI assistant that provides answers strictly based on the provided context. Adhere to these guidelines:
- Only answer questions based on the content within the <context> tags.
- If the <context> does not contain information related to the question, respond only with: "I don't have enough information to answer this question."
- For unclear questions or questions that lack specific context, request clarification from the user.
- Provide specific, concise ansewrs. Where relevant information includes statistics or numbers, include them in the response.
- Avoid adding any information, assumption, or external knowledge. Answer accurately within the scope of the given context and do not guess.
- If information is missing, respond only with: "I don't have enough information to answer this question."
"""
prompt = ChatPromptTemplate.from_messages([
("system", PROMPT_TEMPLATE),
("human", "<question>{input}</question>\n\n<context>{context}</context>"),
])
print("Prompt Created")
return prompt
Documents are preprocessed using the RecursiveCharacterTextSplitter
from the LangChain to ensure they are manageable for retrieval
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=200)
docs = text_splitter.split_documents(documents)
A custom retriever (ScoreThresholdRetriever) is built using LangChain's BaseRetriever. It extends the functionality for document retrieval
class ScoreThresholdRetriever(BaseRetriever)
Retrieved documents are passed into a document chain to generate context-aware responses
response = document_chain.invoke({
"input": query,
"context": retrieved_documents
})
LangChain's RecursiveCharacterTextSplitter automatically handles document splitting
split_docs = split_documents(documents)
Use the ScoreThresholdRetriever
to retrieve documents based on similarity scores
retrieved_docs = retriever.get_related_documents(query_embedding, collection)
-
Schema Mismatch: Make sure that the vector dimensionality in the Milvus collection matches the vectors being inserted.
-
MistralAI API Key Errors: Verify that the MistralAI API key is set correctly and has the necessary permissions.
-
Document loading issues:
- The document loading and embedding process may fail if the file path is incorrect or inaccessible, such as with document_path = "data/textbook".
- Troubleshoot by verifying the existence and accessibility of the data/textbook directory and ensuring the files are in a supported format for loading and embedding.
import os
print(os.listdir(document_path))
- Environment Variables Not Loaded Correctly:
- The Mistral API key may not load correctly if the
.env
file is not properly configured or found, causing os.getenv("MISTRAL_API_KEY") to return None and raise a ValueError. - To troubleshoot, verify that the .env file exists with the correct API key, confirm its location and the script's working directory, ensure proper file permissions, and print the API key for debugging.
print(f"Loaded API key: {os.getenv('MISTRAL_API_KEY')}")
- General Error Logging: To handle any unforeseen errors in the workflow, wrap all critical sections with a try-except block.