-
Notifications
You must be signed in to change notification settings - Fork 180
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #132 from roboflow/release/maestro-1.0.0
florence_2.md, paligemma_2.md, qwen_2_5_vl.md docs added + maestro_qwen2_5_vl_json_extraction cookbook
- Loading branch information
Showing
14 changed files
with
816 additions
and
1,510 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,8 @@ | |
|
||
<h1>maestro</h1> | ||
|
||
<h3>VLM fine-tuning for everyone</h1> | ||
|
||
<br> | ||
|
||
<div> | ||
|
This file was deleted.
Oops, something went wrong.
729 changes: 0 additions & 729 deletions
729
cookbooks/maestro_florence2_visual_question_answering.ipynb
This file was deleted.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,8 @@ | |
|
||
<h1>maestro</h1> | ||
|
||
<h3>VLM fine-tuning for everyone</h1> | ||
|
||
<br> | ||
|
||
<div> | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
## Overview | ||
|
||
Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license. It offers strong zero-shot and fine-tuning capabilities for tasks such as image captioning, object detection, visual grounding, and segmentation. Despite its compact size, training on the extensive FLD-5B dataset (126 million images and 5.4 billion annotations) enables Florence-2 to perform on par with much larger models like Kosmos-2. You can try out the model via HF Spaces, Google Colab, or our interactive playground. | ||
|
||
## Install | ||
|
||
```bash | ||
pip install maestro[florence_2] | ||
``` | ||
|
||
## Train | ||
|
||
The training routines support various optimization strategies such as LoRA, and freezing the vision encoder. Customize your fine-tuning process via CLI or Python to align with your dataset and task requirements. | ||
|
||
### CLI | ||
|
||
Kick off training from the command line by running the command below. Be sure to replace the dataset path and adjust the hyperparameters (such as epochs and batch size) to suit your needs. | ||
|
||
```bash | ||
maestro florence_2 train \ | ||
--dataset "dataset/location" \ | ||
--epochs 10 \ | ||
--batch-size 4 \ | ||
--optimization_strategy "lora" \ | ||
--metrics "edit_distance" | ||
``` | ||
|
||
### Python | ||
|
||
For more control, you can fine-tune Florence-2 using the Python API. Create a configuration dictionary with your training parameters and pass it to the train function to integrate the process into your custom workflow. | ||
|
||
```python | ||
from maestro.trainer.models.florence_2.core import train | ||
|
||
config = { | ||
"dataset": "dataset/location", | ||
"epochs": 10, | ||
"batch_size": 4, | ||
"optimization_strategy": "lora", | ||
"metrics": ["edit_distance"], | ||
} | ||
|
||
train(config) | ||
``` | ||
|
||
## Load | ||
|
||
Load a pre-trained or fine-tuned Florence-2 model along with its processor using the load_model function. Specify your model's path and the desired optimization strategy. | ||
|
||
```python | ||
from maestro.trainer.models.florence_2.checkpoints import ( | ||
OptimizationStrategy, load_model) | ||
|
||
processor, model = load_model( | ||
model_id_or_path="model/location", | ||
optimization_strategy=OptimizationStrategy.NONE | ||
) | ||
``` | ||
|
||
## Predict | ||
|
||
Perform inference with Florence-2 using the predict function. Supply an image and a text prefix to obtain predictions, such as object detection outputs or captions. | ||
|
||
```python | ||
from maestro.trainer.common.datasets import RoboflowJSONLDataset | ||
from maestro.trainer.models.florence_2.inference import predict | ||
|
||
ds = RoboflowJSONLDataset( | ||
jsonl_file_path="dataset/location/test/annotations.jsonl", | ||
image_directory_path="dataset/location/test", | ||
) | ||
|
||
image, entry = ds[0] | ||
|
||
predict(model=model, processor=processor, image=image, prefix=entry["prefix"]) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
## Overview | ||
|
||
PaliGemma 2 is an updated and significantly enhanced version of the original PaliGemma vision-language model (VLM). By combining the efficient SigLIP-So400m vision encoder with the robust Gemma 2 language model, PaliGemma 2 processes images at multiple resolutions and fuses visual and textual inputs to deliver strong performance across diverse tasks such as captioning, visual question answering (VQA), optical character recognition (OCR), object detection, and instance segmentation. Fine-tuning enables users to adapt the model to specific tasks while leveraging its scalable architecture. | ||
|
||
## Install | ||
|
||
```bash | ||
pip install maestro[paligemma_2] | ||
``` | ||
|
||
## Train | ||
|
||
The training routines support various optimization strategies such as LoRA, QLoRA, and freezing the vision encoder. Customize your fine-tuning process via CLI or Python to align with your dataset and task requirements. | ||
|
||
### CLI | ||
|
||
Kick off training from the command line by running the command below. Be sure to replace the dataset path and adjust the hyperparameters (such as epochs and batch size) to suit your needs. | ||
|
||
```bash | ||
maestro paligemma_2 train \ | ||
--dataset "dataset/location" \ | ||
--epochs 10 \ | ||
--batch-size 4 \ | ||
--optimization_strategy "qlora" \ | ||
--metrics "edit_distance" | ||
``` | ||
|
||
### Python | ||
|
||
For more control, you can fine-tune PaliGemma 2 using the Python API. Create a configuration dictionary with your training parameters and pass it to the train function to integrate the process into your custom workflow. | ||
|
||
```python | ||
from maestro.trainer.models.paligemma_2.core import train | ||
|
||
config = { | ||
"dataset": "dataset/location", | ||
"epochs": 10, | ||
"batch_size": 4, | ||
"optimization_strategy": "qlora", | ||
"metrics": ["edit_distance"], | ||
} | ||
|
||
train(config) | ||
``` | ||
|
||
## Load | ||
|
||
Load a pre-trained or fine-tuned PaliGemma 2 model along with its processor using the load_model function. Specify your model's path and the desired optimization strategy. | ||
|
||
```python | ||
from maestro.trainer.models.paligemma_2.checkpoints import ( | ||
OptimizationStrategy, load_model | ||
) | ||
|
||
processor, model = load_model( | ||
model_id_or_path="model/location", | ||
optimization_strategy=OptimizationStrategy.NONE | ||
) | ||
``` | ||
|
||
## Predict | ||
|
||
Perform inference with PaliGemma 2 using the predict function. Supply an image and a text prefix to obtain predictions, such as object detection outputs or captions. | ||
|
||
```python | ||
from maestro.trainer.common.datasets import RoboflowJSONLDataset | ||
from maestro.trainer.models.paligemma_2.inference import predict | ||
|
||
ds = RoboflowJSONLDataset( | ||
jsonl_file_path="dataset/location/test/annotations.jsonl", | ||
image_directory_path="dataset/location/test", | ||
) | ||
|
||
image, entry = ds[0] | ||
|
||
predict(model=model, processor=processor, image=image, prefix=entry["prefix"]) | ||
``` |
Oops, something went wrong.