CraftData

This project aims to develop a web-scraper designed to extract data from game wiki's. The collected data will be structured into a dataset suitable for training large language models or what ever you want to do with it. The scraper will navigate through the all the sections of the wiki, capturing information about game mechanics, items, mobs, crafting recipes, etc. It can be used for other game wiki's too.

How Does It Work?

The scraper operates by first extracting all the links to the relevant web pages that contain information. It then retrieves the HTML content from these pages, cleans and passes it to a large language model for further processing.

Installation

Clone the github repo:

 git clone https://github.com/Parkourer10/CraftData.git

Navigate to the project directory:

 cd CraftData

Install all the dependencies:

PYTHON:

 pip install -r requirements.txt

OLLAMA:

https://ollama.com/download
Install llama3.2 1b

 ollama pull llama3.2:1b

Change the config file: (IMPORTANT!)

Change the variables to scrape the approriate wiki.

Run the project:

 python main.py

Dataset structure:

[
    {
        "url": "example.com"
        "question": "question",
        "answer": "answer"
    }
]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
config.py		config.py
main.py		main.py
parse.py		parse.py
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CraftData

How Does It Work?

Installation

Clone the github repo:

Navigate to the project directory:

Install all the dependencies:

Change the config file: (IMPORTANT!)

Run the project:

Dataset structure:

About

Releases

Packages

Languages

Parkourer10/CraftData

Folders and files

Latest commit

History

Repository files navigation

CraftData

How Does It Work?

Installation

Clone the github repo:

Navigate to the project directory:

Install all the dependencies:

Change the config file: (IMPORTANT!)

Run the project:

Dataset structure:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages