Awesome-Open-Vocabulary-Perception

Papers and codes for open-vocabulary perception (3D&2D). 😎

This repo mainly focuses on the open-vocabulary perception tasks (both 3D and 2D). Please pull requests or email me by [email protected] if you want to recommend papers.

3D

Open-Vocabulary 3D Object Detection

[CoDAv2] Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection, Arxiv2024. [Code]
[ImOV3D] ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images, NeurIPS2024. [Code]
[INHA] Unlocking textual and visual wisdom: Open-vocabulary 3d object detection enhanced by comprehensive guidance from text and image, ECCV2024.
[GLIS] Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection, ECCV2024. [Code]
[CoDA] Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection, NeurIPS2023. [Code]
[OV-3DET] Open-Vocabulary Point-Cloud Object Detection without 3D Annotation, CVPR2023. [Code]
[FM-OV3D] FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, AAAI2024. [Code]

Open-Vocabulary 3D Segmentation

[OpenMask3D] OpenMask3D: Open-Vocabulary 3D Instance Segmentation, NeurIPS2023. [Code]
[OpenScene] OpenScene: 3D Scene Understanding with Open Vocabularies, CVPR2023. [Code]
[3D-OVS] Weakly Supervised 3D Open-vocabulary Segmentation, CVPR2023. [Code]
[PLA] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding, CVPR2023. [Code]
[Open3DIS] Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance, CVPR2024. [Code]
[MaskClustering] MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation, CVPR2024. [Code
[LEGaussians] LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding, CVPR2024. [Code

2D

Open-Vocabulary 2D Object Detection

[Detclip] Dictionary-enriched visual-concept paralleled pre-training for open-world detection, NeurIPS2023
[Detclipv2] Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment, CVPR2023
[Detclipv3] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection, CVPR2024
[YOLO-World] YOLO-World: Real-Time Open-Vocabulary Object Detection, CVPR2024. [Code]

Open-Vocabulary 2D Segmentation

[ODISE] Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models, CVPR2023 Highlight. [Code]
[FreeDA] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation, CVPR2024. [Code]
[OVAM] Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models, CVPR2024. [Code]
[PnP-OVSS] Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models, CVPR2024. [Code]
[OVFoodSeg] OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation, CVPR2024.
[SED] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation, CVPR2024.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Open-Vocabulary-Perception

3D

Open-Vocabulary 3D Object Detection

Open-Vocabulary 3D Segmentation

2D

Open-Vocabulary 2D Object Detection

Open-Vocabulary 2D Segmentation

About

Releases

Packages

yangcaoai/Awesome-Open-Vocabulary-Perception

Folders and files

Latest commit

History

Repository files navigation

Awesome-Open-Vocabulary-Perception

3D

Open-Vocabulary 3D Object Detection

Open-Vocabulary 3D Segmentation

2D

Open-Vocabulary 2D Object Detection

Open-Vocabulary 2D Segmentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages