Extracts book infos from the humble bundle pages: title, author, description, year, keywords, etc.
- Finds title, author, description and formats from the file "bundle-info.html"
- Uses the OpenAI API to find the keywords using the book info
- Uses the Tavily Search to find websites with the book info. The year of the book is extracted from the websites using the OpenAI API
- Saves the book info in a tab separated values file after each step
- The output is a tab separated values file with the following columns:
- Author
- Title
- Year
- Description
- Keywords (e.g. "Machine Learning, No-Code AI, Data Analysis")
- Account "Humble Bundle"
- Formats (e.g. "PDF, EPUB, MOBI")
- Purchase date (date when the tool was run)
- OpenAI API key
- Tavily Search API key
- Install the requirements with
pip install -r requirements.txt
- Configure the OpenAI API key and the model name in the file ".env". See the file ".env_example" for an example
- Configure the Tavily Search API key in the file ".env". See the file ".env_example" for an example
- Download the humble bundle page (Save as "Webpage, Complete") and save it as "bundle-info.html" in the project folder
- Adjust flags in main.py to run the desired steps: stage_2_label_books_with_openai, stage_3_find_years_with_tavily
- Run main.py
- The book info will be saved in "book-info-before-labeling.tsv", "book-info-after-labeling.tsv" and "book-info-after-year-finding.tsv"
- Use Langchain instead of OpenAI client everywhere
- Automate the Humble Bundle Page download