This project aims to develop a web-scraper designed to extract data from game wiki's. The collected data will be structured into a dataset suitable for training large language models or what ever you want to do with it. The scraper will navigate through the all the sections of the wiki, capturing information about game mechanics, items, mobs, crafting recipes, etc. It can be used for other game wiki's too.
The scraper operates by first extracting all the links to the relevant web pages that contain information. It then retrieves the HTML content from these pages, cleans and passes it to a large language model for further processing.
git clone https://github.com/Parkourer10/CraftData.git
cd CraftData
PYTHON:
pip install -r requirements.txt
OLLAMA:
- https://ollama.com/download
- Install llama3.2 1b
ollama pull llama3.2:1b
Change the variables to scrape the approriate wiki.
python main.py
[
{
"url": "example.com"
"question": "question",
"answer": "answer"
}
]