GitHub - g-hano/ChatLM: This is a RAG app that can run on multiple gpus simulteniously. I used Llama-Index framework for storing and retrieving documents, vLLM library for multiple GPU support.

This project implements a Retrieval-Augmented Generation (RAG) application utilizing a hybrid search mechanism, combining keyword and vector search for document retrieval. It uses the LlamaIndex framework and integrates language models and embedding models from LangChain and Hugging Face. Additionally, a Flask app is provided for running the application.

Features

Hybrid Search: Combines BM25 keyword search and vector search to retrieve the most relevant documents.
Language Models: Utilizes models from LangChain and Hugging Face for generating responses.
Flask Integration: Provides a Flask app for easy deployment and usage.

python -m vllm.entrypoints.api_server --model=mistralai/Mistral-7B-Instruct-v0.3 --dtype=half --tensor-parallel-size=4 --gpu-memory-utilization=0.5 --max-model-len=27000

ChatEngine Class

The ChatEngine class is responsible for handling the chat interactions with the language model.

class ChatEngine:
    def __init__(self, retriever):
        self.chat_history = []

    def ask_question(self, question, llm):
        question = "[INST]" + question + "[/INST]"
        results = self.retriever.best_docs(question)
        document = [doc.text for doc, sc in results]

        self.chat_history.append(ChatMessage(role=MessageRole.USER, content=f"Question: {question}"))
        self.chat_history.append(ChatMessage(role=MessageRole.ASSISTANT, content=f"Document: {document}"))

        response = llm.chat(self.chat_history)
        return response.message.content

HybridRetriever Class

The HybridRetriever class combines BM25 and vector search methods to retrieve relevant documents.

class HybridRetriever:
    def __init__(self, bm25_retriever: BM25Retriever, vector_retriever: VectorIndexRetriever):
        self.top_k = vector_retriever._similarity_top_k + bm25_retriever._similarity_top_k

    def retrieve(self, query: str):
        query = "[INST] " + " [/INST]"
        bm25_results = self.bm25_retriever.retrieve(query)
        vector_results = self.vector_retriever.retrieve(query)

        combined_results = {}
        for result in bm25_results:
            combined_results[result.node.text] = {'score': result.score}

        for result in vector_results:
            if result.node.text in combined_results:
                combined_results[result.node.text]['score'] += result.score
            else:
                combined_results[result.node.text] = {'score': result.score}

        combined_results_list = sorted(combined_results.items(), key=lambda item: item[1]['score'], reverse=True)
        return combined_results_list

    def best_docs(self, query: str):
        top_results = self.retrieve(query)
        return [(Document(text=text), score) for text, score in top_results]

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
static		static
templates		templates
ChatEngine.py		ChatEngine.py
HybridRetriever.py		HybridRetriever.py
README.md		README.md
app.py		app.py
configs.py		configs.py
core.py		core.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

ChatEngine Class

HybridRetriever Class

About

Releases

Packages

Languages

g-hano/ChatLM

Folders and files

Latest commit

History

Repository files navigation

Features

ChatEngine Class

HybridRetriever Class

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages