Skip to content

Saiyan-World/grounded-segment-any-parts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

991c091 · May 19, 2023

History

16 Commits
Apr 14, 2023
Apr 11, 2023
May 9, 2023
Apr 11, 2023
Apr 11, 2023
Apr 14, 2023
Apr 13, 2023
Apr 11, 2023
Apr 14, 2023
Apr 11, 2023
May 19, 2023
Apr 11, 2023
Apr 11, 2023
Apr 14, 2023
Apr 11, 2023
Apr 11, 2023

Repository files navigation

Cheems Seminar

Grounded Segment Anything: From Objects to Parts

In this repo, we expand Segment Anything Model (SAM) to support text prompt input. The text prompt could be object-level:full_moon: (eg, dog) and part-level:last_quarter_moon: (eg, dog head). Furthermore,we build a Visual ChatGPT-based dialogue system 🤖💬 that flexibly calls various segmentation models when receiving instructions in the form of natural language.

News

  • 2023/04/14: Edit anything at more fine-grained part-level.
  • 2023/04/11: Initial code release.

🚀New🚀 Edit on Part-Level

Part Prompt: "dog body"; Edit Prompt: "zebra" p Part Prompt: "cat head"; Edit Prompt: "tiger" p Part Prompt: "chair seat"; Edit Prompt: "cholocate" p Part Prompt: "person head"; Edit Prompt: "combover hairstyle" p

✨✨ Highlights ✨✨

Beyond class-agnostic mask segmentation, this repo contains:

  • Grounded segment anything at both object level and part level.
  • Interacting with models in the form of natural language.

These abilities come from a series of models, including:

Model Function
Segment Anything Segment anything from prompt
GLIP Grounded language-image pre-training
Visual ChatGPT Connects ChatGPT and segmentation foundation models
VLPart Going denser with open-vocabulary part segmentation

FAQ

Q: When will VLPart paper be released ?

A: VLPart paper has been released. 🚀🚀🚀

Q: What is the difference between Grounded SAM and this project ?

A: Grounded SAM is Grounded DINO + SAM, and this project is GLIP/VLPart + SAM. We believe any open-vocabulary (text prompt) object detection model can be used to combine with SAM.

Usage

Install

See installation instructions.

Edit

python demo_part_edit.py

🤖💬 Integration with Visual ChatGPT

# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
python chatbot.py --load "ImageCaptioning_cuda:0, SegmentAnything_cuda:1, PartPromptSegmentAnything_cuda:1, ObjectPromptSegmentAnything_cuda:0"

🌗 Prompt Segment Anything at Part Level

wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/swinbase_part_0a0000.pth
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

python demo_vlpart_sam.py --input_image assets/twodogs.jpeg --output_dir outputs_demo --text_prompt "dog head"

Result:

🌕 Prompt Segment Anything at Object Level

wget https://github.com/Cheems-Seminar/grounded-segment-any-parts/releases/download/v1.0/glip_large.pth

python demo_glip_sam.py --input_image assets/demo2.jpeg --output_dir outputs_demo --text_prompt "frog"

Result:

🍭 Multi-Prompt

For multiple prompts, seperate each prompt with ., for example, --text_prompt "dog head. dog nose"

Model Checkpoints

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Acknowledgement

A large part of the code is borrowed from segment-anything, EditAnything, CLIP, GLIP, Grounded-Segment-Anything, Visual ChatGPT. Many thanks for their wonderful works.

Citation

If you find this project helpful for your research, please consider citing the following BibTeX entry.

@misc{segrec2023,
  title =        {Grounded Segment Anything: From Objects to Parts},
  author =       {Sun, Peize and Chen, Shoufa and Luo, Ping},
  howpublished = {\url{https://github.com/Cheems-Seminar/grounded-segment-any-parts}},
  year =         {2023}
}

@article{vlpart2023,
  title   =  {Going Denser with Open-Vocabulary Part Segmentation},
  author  =  {Sun, Peize and Chen, Shoufa and Zhu, Chenchen and Xiao, Fanyi and Luo, Ping and Xie, Saining and Yan, Zhicheng},
  journal =  {arXiv preprint arXiv:2305.11173},
  year    =  {2023}
}