Open Data Badger (ODBG)

Overview

Open Data Badger (ODBG) is a powerful data batching utility designed for OpenAI endpoints, with plans for future support for local language models (LLMs). ODB provides robust tools for managing, submitting, and processing large batches of data efficiently.

Features

BatchProcessor:
- Initializes a SQLite database to store job details.
- Reads system prompts from a string or file.
- Creates batch files from input data.
- Submits batch jobs to OpenAI API.
- Estimates and processes batches based on maximum batch size and request limits.
- Supports CSV and Parquet input files.
- Handles random sampling and dry runs.
- Verbose mode for detailed batch information.
ResultDownloader:
- Retrieves the status of batch jobs.
- Downloads and saves results from completed batch jobs.
- Supports output in CSV and Parquet formats.

Requirements

Python 3.6+
Poetry for dependency management

Installation

Clone the repository:

git clone https://github.com/ashim-mahara/odbg.git
cd odbg

Install the required dependencies using Poetry:
```
poetry install
```

Usage

BatchProcessor

Use the BatchProcessor to create and submit batch jobs to the OpenAI API.

Example

poetry run python batch_processor.py \
    --base_url https://your-openai-proxy.com \
    --system_prompt "Your system prompt here" \
    --input_file data/input.csv \
    --data_path data/output \
    --text_field text \
    --task_name example_task \
    --model text-davinci-003 \
    --id_field id \
    --description "Example batch processing task" \
    --random_samples 100 \
    --dry_run \
    --verbose

ResultDownloader

Use the ResultDownloader to retrieve and save results from completed batch jobs.

Example

poetry run python result_downloader.py \
    --task_name example_task \
    --task_run_id 1 \
    --output_format csv

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
odbg		odbg
tests		tests
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Data Badger (ODBG)

Overview

Features

Requirements

Installation

Usage

BatchProcessor

Example

ResultDownloader

Example

Contributing

About

Releases

Packages

Languages

ashim-mahara/odbg

Folders and files

Latest commit

History

Repository files navigation

Open Data Badger (ODBG)

Overview

Features

Requirements

Installation

Usage

BatchProcessor

Example

ResultDownloader

Example

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages