Skip to content

This project aims to scrapegame wiki's and other websites using llms.

Notifications You must be signed in to change notification settings

Parkourer10/CraftData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CraftData

This project aims to develop a web-scraper designed to extract data from game wiki's. The collected data will be structured into a dataset suitable for training large language models or what ever you want to do with it. The scraper will navigate through the all the sections of the wiki, capturing information about game mechanics, items, mobs, crafting recipes, etc. It can be used for other game wiki's too.

How Does It Work?

The scraper operates by first extracting all the links to the relevant web pages that contain information. It then retrieves the HTML content from these pages, cleans and passes it to a large language model for further processing.

Installation

Clone the github repo:

 git clone https://github.com/Parkourer10/CraftData.git

Navigate to the project directory:

 cd CraftData

Install all the dependencies:

PYTHON:

 pip install -r requirements.txt

OLLAMA:

 ollama pull llama3.2:1b

Change the config file: (IMPORTANT!)

Change the variables to scrape the approriate wiki.


Run the project:

 python main.py

Dataset structure:

[
    {
        "url": "example.com"
        "question": "question",
        "answer": "answer"
    }
]

About

This project aims to scrapegame wiki's and other websites using llms.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages