You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @abrichr 👋🏻 Qwen2.5-VL is not the first VLM to do this. Both Florence-2 and PaliGemma 2 support object detection. maestro-1.1.0 will be all about object detection. I'm planning to:
Expand the list of supported datasets to include COCO and YOLO, to allow easy use of traditional object detection datasets. This will of course require methods to pair boxes from the matrix representation to a representation understandable by the VLM.
Add metrics classically used in object detection such as mAP.
BTW I have a question: I'm thinking about launching a Discord server dedicated to VLM fine-tuning with Maestro) and while talking about current issues I'm trying to understand if people would like such a server to be created.
Search before asking
Description
As far as I know, Qwen2.5-VL is the first open source multimodal model that can extract bounding boxes.
e.g. from https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb:
It would be great to support this so that other models can support this as well.
Use case
We would use this for generative process automation in https://github.com/OpenAdaptAI/OpenAdapt
Additional
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: