Skip to content

Create long documents through RAG and Chain of thought by using Langchain, OpenAI, Pinecone.

Notifications You must be signed in to change notification settings

mominalix/Create-long-documents-using-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Create Long Documents using AI

This project demonstrates how to generate a cohesive multi-section document using Large Language Models (LLMs) and a Retrieval-Augmented Generation (RAG) approach. The flow involves:

  1. Generating Sections: The program first takes a user prompt and sends it to the LLM to create an outline of sections.
  2. Expanding Sections: Each section is then expanded one by one. The content is carefully generated to avoid repeating ideas while maintaining continuity.
  3. Retrieval-Augmented Generation (RAG): For each section, the system uses Pinecone to retrieve relevant information from referenced documents. This retrieved context is injected into the LLM prompt to provide accurate and rich content.

How It Works

  1. Prompt and Outline

    • You provide a prompt describing the document you want to generate.
    • The LLM produces an outline (sections) in JSON format (via the SectionGenerator).
    • You can choose to keep or regenerate this outline.
  2. Content Expansion

    • Each section from the outline is passed to the LLM along with any retrieved context (via RAGprocessor).
    • The LLM expands the section, ensuring minimal repetition and logical flow.
  3. Document Assembly and Saving

    • The expanded sections are combined into one Markdown document.
    • If a file with the same name exists, the script automatically renames the output file.
    • Finally, the Markdown file is converted to a DOCX, and the Pinecone index is cleaned up.

Setup

  1. Clone or Download

    • Get the repository onto your machine.
  2. Install Dependencies

    pip install -r requirements.txt
  3. Create a .env File

    • At the project root, create a file named .env containing your API keys:
    OPENAI_API=your_openai_api_key
    PINECONE_API_KEY=your_pinecone_api_key
    LLM_MODEL=gpt-4  # or another available model
    
  4. Install Pandoc

    • This project uses pypandoc, so ensure you have Pandoc installed or downloaded automatically by install_pypandoc.py.

How to Run

  1. Add Refernce Documents

    • Add refrence documents in documents folder
  2. Launch the Script

    • From the project's root folder, run:
    python main.py
  3. Enter Prompt

    • You will be prompted to enter the main topic or description for the document.
  4. Section Generation

    • The program will show you generated sections. You can confirm or regenerate them.
  5. Content Expansion

    • Once confirmed, each section is expanded with additional context retrieved by Pinecone (if available).
  6. Output

    • A Markdown file is saved to the output folder.
    • The script then converts it to a DOCX file in the same folder.

About

Create long documents through RAG and Chain of thought by using Langchain, OpenAI, Pinecone.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages