humble-books-bundle-extractor

Extracts book infos from the humble bundle pages: title, author, description, year, keywords, etc.

Finds title, author, description and formats from the file "bundle-info.html"
Uses the OpenAI API to find the keywords using the book info
Uses the Tavily Search to find websites with the book info. The year of the book is extracted from the websites using the OpenAI API
Saves the book info in a tab separated values file after each step

Output format

Install the requirements with pip install -r requirements.txt
Configure the OpenAI API key and the model name in the file ".env". See the file ".env_example" for an example
Configure the Tavily Search API key in the file ".env". See the file ".env_example" for an example

Download the humble bundle page (Save as "Webpage, Complete") and save it as "bundle-info.html" in the project folder
Adjust flags in main.py to run the desired steps: stage_2_label_books_with_openai, stage_3_find_years_with_tavily
Run main.py
The book info will be saved in "book-info-before-labeling.tsv", "book-info-after-labeling.tsv" and "book-info-after-year-finding.tsv"

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.vscode		.vscode
extractor		extractor
tests		tests
.env_example		.env_example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt