This project demonstrates how to generate a cohesive multi-section document using Large Language Models (LLMs) and a Retrieval-Augmented Generation (RAG) approach. The flow involves:
- Generating Sections: The program first takes a user prompt and sends it to the LLM to create an outline of sections.
- Expanding Sections: Each section is then expanded one by one. The content is carefully generated to avoid repeating ideas while maintaining continuity.
- Retrieval-Augmented Generation (RAG): For each section, the system uses Pinecone to retrieve relevant information from referenced documents. This retrieved context is injected into the LLM prompt to provide accurate and rich content.
-
Prompt and Outline
- You provide a prompt describing the document you want to generate.
- The LLM produces an outline (sections) in JSON format (via the SectionGenerator).
- You can choose to keep or regenerate this outline.
-
Content Expansion
- Each section from the outline is passed to the LLM along with any retrieved context (via RAGprocessor).
- The LLM expands the section, ensuring minimal repetition and logical flow.
-
Document Assembly and Saving
- The expanded sections are combined into one Markdown document.
- If a file with the same name exists, the script automatically renames the output file.
- Finally, the Markdown file is converted to a DOCX, and the Pinecone index is cleaned up.
-
Clone or Download
- Get the repository onto your machine.
-
Install Dependencies
- Use the requirements.txt to install needed packages:
pip install -r requirements.txt
-
Create a .env File
- At the project root, create a file named
.env
containing your API keys:
OPENAI_API=your_openai_api_key PINECONE_API_KEY=your_pinecone_api_key LLM_MODEL=gpt-4 # or another available model
- At the project root, create a file named
-
Install Pandoc
- This project uses pypandoc, so ensure you have Pandoc installed or downloaded automatically by
install_pypandoc.py
.
- This project uses pypandoc, so ensure you have Pandoc installed or downloaded automatically by
-
Add Refernce Documents
- Add refrence documents in documents folder
-
Launch the Script
- From the project's root folder, run:
python main.py
-
Enter Prompt
- You will be prompted to enter the main topic or description for the document.
-
Section Generation
- The program will show you generated sections. You can confirm or regenerate them.
-
Content Expansion
- Once confirmed, each section is expanded with additional context retrieved by Pinecone (if available).
-
Output
- A Markdown file is saved to the
output
folder. - The script then converts it to a DOCX file in the same folder.
- A Markdown file is saved to the