Bounding boxes #138

abrichr · 2025-02-05T18:40:36Z

Search before asking

I have searched the Multimodal Maestro issues and found no similar feature requests.

Description

As far as I know, Qwen2.5-VL is the first open source multimodal model that can extract bounding boxes.

e.g. from https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb:

It would be great to support this so that other models can support this as well.

Use case

We would use this for generative process automation in https://github.com/OpenAdaptAI/OpenAdapt

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

SkalskiP · 2025-02-06T12:06:23Z

Hi @abrichr 👋🏻 Qwen2.5-VL is not the first VLM to do this. Both Florence-2 and PaliGemma 2 support object detection. maestro-1.1.0 will be all about object detection. I'm planning to:

Expand the list of supported datasets to include COCO and YOLO, to allow easy use of traditional object detection datasets. This will of course require methods to pair boxes from the matrix representation to a representation understandable by the VLM.
Add metrics classically used in object detection such as mAP.

SkalskiP · 2025-02-07T14:57:27Z

BTW I have a question: I'm thinking about launching a Discord server dedicated to VLM fine-tuning with Maestro) and while talking about current issues I'm trying to understand if people would like such a server to be created.

abrichr added the enhancement New feature or request label Feb 5, 2025

SkalskiP added the model Request to add / extend support for the model. label Feb 6, 2025

SkalskiP mentioned this issue Feb 7, 2025

Qwen_2_5_vl Object Detection Support? #147

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounding boxes #138

Bounding boxes #138

abrichr commented Feb 5, 2025

SkalskiP commented Feb 6, 2025

SkalskiP commented Feb 7, 2025

Bounding boxes #138

Bounding boxes #138

Comments

abrichr commented Feb 5, 2025

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

SkalskiP commented Feb 6, 2025

SkalskiP commented Feb 7, 2025