Skip to content

Commit

Permalink
Merge pull request #132 from roboflow/release/maestro-1.0.0
Browse files Browse the repository at this point in the history
florence_2.md, paligemma_2.md, qwen_2_5_vl.md docs added + maestro_qwen2_5_vl_json_extraction cookbook
  • Loading branch information
SkalskiP authored Feb 5, 2025
2 parents aa78e00 + 61f6e5b commit 296969f
Show file tree
Hide file tree
Showing 14 changed files with 816 additions and 1,510 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ repos:
- id: detect-private-key
- id: pretty-format-json
args: ['--autofix', '--no-sort-keys', '--indent=4']
exclude: ".*\\.ipynb$"
exclude: /.*\.ipynb
- id: end-of-file-fixer
- id: mixed-line-ending

Expand All @@ -35,7 +35,7 @@ repos:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
types_or: [ python, pyi, jupyter ]
types_or: [ python, pyi, jupyter]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v1.14.1'
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

<h1>maestro</h1>

<h3>VLM fine-tuning for everyone</h1>

<br>

<div>
Expand Down
559 changes: 0 additions & 559 deletions cookbooks/maestro_florence2_object_detection.ipynb

This file was deleted.

729 changes: 0 additions & 729 deletions cookbooks/maestro_florence2_visual_question_answering.ipynb

This file was deleted.

568 changes: 568 additions & 0 deletions cookbooks/maestro_qwen2_5_vl_json_extraction.ipynb

Large diffs are not rendered by default.

199 changes: 0 additions & 199 deletions docs/florence-2.md

This file was deleted.

2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

<h1>maestro</h1>

<h3>VLM fine-tuning for everyone</h1>

<br>

<div>
Expand Down
5 changes: 0 additions & 5 deletions docs/metrics.md

This file was deleted.

76 changes: 76 additions & 0 deletions docs/models/florence_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
## Overview

Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license. It offers strong zero-shot and fine-tuning capabilities for tasks such as image captioning, object detection, visual grounding, and segmentation. Despite its compact size, training on the extensive FLD-5B dataset (126 million images and 5.4 billion annotations) enables Florence-2 to perform on par with much larger models like Kosmos-2. You can try out the model via HF Spaces, Google Colab, or our interactive playground.

## Install

```bash
pip install maestro[florence_2]
```

## Train

The training routines support various optimization strategies such as LoRA, and freezing the vision encoder. Customize your fine-tuning process via CLI or Python to align with your dataset and task requirements.

### CLI

Kick off training from the command line by running the command below. Be sure to replace the dataset path and adjust the hyperparameters (such as epochs and batch size) to suit your needs.

```bash
maestro florence_2 train \
--dataset "dataset/location" \
--epochs 10 \
--batch-size 4 \
--optimization_strategy "lora" \
--metrics "edit_distance"
```

### Python

For more control, you can fine-tune Florence-2 using the Python API. Create a configuration dictionary with your training parameters and pass it to the train function to integrate the process into your custom workflow.

```python
from maestro.trainer.models.florence_2.core import train

config = {
"dataset": "dataset/location",
"epochs": 10,
"batch_size": 4,
"optimization_strategy": "lora",
"metrics": ["edit_distance"],
}

train(config)
```

## Load

Load a pre-trained or fine-tuned Florence-2 model along with its processor using the load_model function. Specify your model's path and the desired optimization strategy.

```python
from maestro.trainer.models.florence_2.checkpoints import (
OptimizationStrategy, load_model)

processor, model = load_model(
model_id_or_path="model/location",
optimization_strategy=OptimizationStrategy.NONE
)
```

## Predict

Perform inference with Florence-2 using the predict function. Supply an image and a text prefix to obtain predictions, such as object detection outputs or captions.

```python
from maestro.trainer.common.datasets import RoboflowJSONLDataset
from maestro.trainer.models.florence_2.inference import predict

ds = RoboflowJSONLDataset(
jsonl_file_path="dataset/location/test/annotations.jsonl",
image_directory_path="dataset/location/test",
)

image, entry = ds[0]

predict(model=model, processor=processor, image=image, prefix=entry["prefix"])
```
77 changes: 77 additions & 0 deletions docs/models/paligemma_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
## Overview

PaliGemma 2 is an updated and significantly enhanced version of the original PaliGemma vision-language model (VLM). By combining the efficient SigLIP-So400m vision encoder with the robust Gemma 2 language model, PaliGemma 2 processes images at multiple resolutions and fuses visual and textual inputs to deliver strong performance across diverse tasks such as captioning, visual question answering (VQA), optical character recognition (OCR), object detection, and instance segmentation. Fine-tuning enables users to adapt the model to specific tasks while leveraging its scalable architecture.

## Install

```bash
pip install maestro[paligemma_2]
```

## Train

The training routines support various optimization strategies such as LoRA, QLoRA, and freezing the vision encoder. Customize your fine-tuning process via CLI or Python to align with your dataset and task requirements.

### CLI

Kick off training from the command line by running the command below. Be sure to replace the dataset path and adjust the hyperparameters (such as epochs and batch size) to suit your needs.

```bash
maestro paligemma_2 train \
--dataset "dataset/location" \
--epochs 10 \
--batch-size 4 \
--optimization_strategy "qlora" \
--metrics "edit_distance"
```

### Python

For more control, you can fine-tune PaliGemma 2 using the Python API. Create a configuration dictionary with your training parameters and pass it to the train function to integrate the process into your custom workflow.

```python
from maestro.trainer.models.paligemma_2.core import train

config = {
"dataset": "dataset/location",
"epochs": 10,
"batch_size": 4,
"optimization_strategy": "qlora",
"metrics": ["edit_distance"],
}

train(config)
```

## Load

Load a pre-trained or fine-tuned PaliGemma 2 model along with its processor using the load_model function. Specify your model's path and the desired optimization strategy.

```python
from maestro.trainer.models.paligemma_2.checkpoints import (
OptimizationStrategy, load_model
)

processor, model = load_model(
model_id_or_path="model/location",
optimization_strategy=OptimizationStrategy.NONE
)
```

## Predict

Perform inference with PaliGemma 2 using the predict function. Supply an image and a text prefix to obtain predictions, such as object detection outputs or captions.

```python
from maestro.trainer.common.datasets import RoboflowJSONLDataset
from maestro.trainer.models.paligemma_2.inference import predict

ds = RoboflowJSONLDataset(
jsonl_file_path="dataset/location/test/annotations.jsonl",
image_directory_path="dataset/location/test",
)

image, entry = ds[0]

predict(model=model, processor=processor, image=image, prefix=entry["prefix"])
```
Loading

0 comments on commit 296969f

Please sign in to comment.