This project shortens books or long texts using an LLM (Large Language Model).
The LLM Book Shortener is a tool designed to condense long texts or books into shorter, more manageable versions. By leveraging Large Language Models such as OpenAI's GPT-4o or others, this tool ensures that the essence of the original text is preserved while reducing its length significantly. The primary goal is to facilitate faster reading and comprehension, making it ideal for students, researchers, and avid readers.
I wrote this to read significantly more and it worked.
test@MacBook-Pro llm-book-shortener % wc -c waldenrewrite.txt
231074 waldenrewrite.txt
test@MacBook-Pro llm-book-shortener % wc -c waldenprogress.txt
636980 waldenprogress.txt
The rewritten version is approximately ~36% from the original. That's a significant reduction.
- Python 3.10
- OpenAI library (install via pip or similar)
- OpenAI API key
- Add a Text File: Add a text file containing the book or long text. Optionally, clean it by removing the references section.
- Split the Text: Split the text file using the delimiter
------------
. This can be done around chapter names to divide the book into chapters. - Count Tokens:
- Run the first cell labeled "count tokens" in the Jupyter notebook, adjusting the filename to match your text file.
- The cell will inform you if any chunks exceed 4096 tokens.
- Adjust Chunks:
- If chunks are larger than 4096 tokens, the script will copy these chunks to a file named
makeshorter.txt
. - Go back to the original text file and further split these large chunks using additional
------------
delimiters.
- If chunks are larger than 4096 tokens, the script will copy these chunks to a file named
- Repeat:
- Re-run the "count tokens" cell until it only outputs the number of chunks (e.g., 148 for
walden.txt
).
- Re-run the "count tokens" cell until it only outputs the number of chunks (e.g., 148 for
- Clean Up:
- Once you have correctly split the text, you can delete the contents of
makeshorter.txt
or adjust the chunk sizes as needed.
- Once you have correctly split the text, you can delete the contents of
- Add Your OpenAI API Key: Insert your OpenAI API key in the designated cell.
- Set Document Name: Change the
documentName
variable to match your current filename. - Run the Rewriter: Execute the cell to start the rewriter and monitor the output.
- The rewriter skips parts that are less than 15 tokens, typically chapter titles or similar, to maintain the book's structure.
- The rewriter retries each request up to 5 times before moving on. Adjust this setting if necessary.
- You can stop and resume the rewriter without losing progress, provided you do not alter the original text file and the two files it creates.
- Progress is printed while the rewriter is running.
- The rewritten output file may contain many newlines. You can clean this up by searching for
\n\n\n
and replacing it with\n\n
until all extra newlines are removed. - The rewriter uses GPT-4o but can be configured to use any OpenAI model or a compatible LLM that adheres to the OpenAI API.
- Most books fall between a reduction of 40-60%, depending on the content type. I've also managed a 85% reduction for a self help book(without changing anything)
- For getting .txt files use any online converter from .pdf or .epub etc to text - and don't worry about page numbers, text alignment or other non-content artefacts
Contributions are welcome! Please open an issue or submit a pull request with your improvements or suggestions.
This project is licensed under the MIT License. See the LICENSE file for details.
- Henry David Thoreau for his timeless work, "Walden".
- The open-source community for their valuable tools and libraries.