This repo aims to include materials (papers, codes, slides) about SAM2 (segment anything in images and videos), a vision foundation model released by Meta AI . We are continuously improving the project. Welcome to PR the works (papers, repos) that are missed.
- SAM2 [code, demo, Explaination]
- SAM [code, demo, Explaination)
- Survey
- Medical Video or 3D Segmentation
- Medical Image Segmentation
- Image Segmentation
- Tracking or Video Object Segmentation
- Video Camouflage Object Detection
- Image Camouflage Object Detection
- Remote Sensing
- 3D Mesh or Point Cloud Segmentation
- Image or Video Editing
- Simultaneous Localization and Mapping
- Light Field Segmentation
- Applications
- Segment Anything for Videos: A Systematic Survey [code]
- Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey [code]
- On Efficient Variants of Segment Anything Model: A Survey
- Segment anything in medical images and videos: Benchmark and deployment [code]
- SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation [code]
- A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation [code]
- Is SAM 2 Better than SAM in Medical Image Segmentation? [code]
- Performance and Non-adversarial Robustness of the Segment Anything Model 2 in Surgical Video Segmentation [code]
- Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2 [code]
- SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology [code]
- SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images [code]
- Medical SAM 2: Segment Medical Images As Video Via Segment Anything Model 2 [code]
- Interactive 3D Medical Image Segmentation [code]
- Biomedical sam 2: Segment anything in biomedical images and videos [code]
- Polyp SAM 2: Advancing Zero shot Polyp Segmentation in Colorectal Cancer Detection [code]
- Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning [code]
- SAM-OCTA2: Layer Sequence OCTA Segmentation with Fine-tuned Segment Anything Model 2 [code]
- Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery [code]
- A-MFST: Adaptive Multi-Flow Sparse Tracker for Real-Time Tissue Tracking Under Occlusion [code]
- SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [code]
- SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More [code]
- Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models [code]
- Self-Prompting Polyp Segmentation in Colonoscopy using Hybrid Yolo-SAM 2 Model [code]
- A multi-task learning model for clinically interpretable sesamoiditis grading [code]
- Combination of detector and SAM2 for image instance segmentation of industrial pelletized ore [code]
- Zero-shot capability of SAM-family models for bone segmentation in CT scans [code]
- SAM-I2I: Unleash the Power of Segment Anything Model for Medical Image Translation [code]
- SAM-Swin: SAM-Driven Dual-Swin Transformers with Adaptive Lesion Enhancement for Laryngo-Pharyngeal Tumor Detection [code]
- CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation [code]
- Towards Natural Image Matting in the Wild via Real-Scenario Prior [code]
- Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track [code]
- The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation [code]
- LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS [code]
- Masks and Boxes: Combining the Best of Both Worlds for Multi-Object Tracking [code]
- Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation [code]
- A Distractor-Aware Memory for Visual Object Tracking with SAM2 [code]
- SAMWISE: Infusing wisdom in SAM2 for Text-Driven Video Segmentation [code]
- There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks [code]
- Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking [code]
- SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory [code]
- SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree [code]
- Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images [code]
- Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2 [code]
- When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation[code]
- SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [code]
- SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More [code]
- Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2 [code]
- DED-SAM: Adapting Segment Anything Model 2 for Dual Encoder-Decoder Change Detection
- Segment Any Mesh: Zero-shot Mesh Part Segmentation via Lifting Segment Anything 2 to 3D [code]
- Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting [code]
- A Pipeline for Segmenting and Structuring RGB-D Data for Robotics Applications [code]
- VideoDirector: Precise Video Editing via Text-to-Video Models [code]
- VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing [code]
- MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis [code]
- AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing [code]
- Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models [code]
- Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models [code]
- ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting [code]
- GRS: Generating Robotic Simulation Tasks from Real-World Images [code]
- Point of Interest Recognition and Tracking in Aerial Video during Live Cycling Broadcasts [code]
- Iterative Optimization Annotation Pipeline and ALSS-YOLO-Seg for Efficient Banana Plantation Segmentation in UAV Imagery [code]
- Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting [code]