Papers and codes for open-vocabulary perception (3D&2D). π
This repo mainly focuses on the open-vocabulary perception tasks (both 3D and 2D). Please pull requests or email me by [email protected]
if you want to recommend papers.
- [CoDAv2] Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection,
Arxiv2024
. [Code] - [ImOV3D] ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images,
NeurIPS2024
. [Code] - [INHA] Unlocking textual and visual wisdom: Open-vocabulary 3d object detection enhanced by comprehensive guidance from text and image,
ECCV2024
. - [GLIS] Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection,
ECCV2024
. [Code] - [CoDA] Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection,
NeurIPS2023
. [Code] - [OV-3DET] Open-Vocabulary Point-Cloud Object Detection without 3D Annotation,
CVPR2023
. [Code] - [FM-OV3D] FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection,
AAAI2024
. [Code]
- [OpenMask3D] OpenMask3D: Open-Vocabulary 3D Instance Segmentation,
NeurIPS2023
. [Code] - [OpenScene] OpenScene: 3D Scene Understanding with Open Vocabularies,
CVPR2023
. [Code] - [3D-OVS] Weakly Supervised 3D Open-vocabulary Segmentation,
CVPR2023
. [Code] - [PLA] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding,
CVPR2023
. [Code] - [Open3DIS] Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance,
CVPR2024
. [Code] - [MaskClustering] MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation,
CVPR2024
. [Code - [LEGaussians] LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding,
CVPR2024
. [Code
- [Detclip] Dictionary-enriched visual-concept paralleled pre-training for open-world detection,
NeurIPS2023
- [Detclipv2] Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment,
CVPR2023
- [Detclipv3] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection,
CVPR2024
- [YOLO-World] YOLO-World: Real-Time Open-Vocabulary Object Detection,
CVPR2024
. [Code]
- [ODISE] Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models,
CVPR2023 Highlight
. [Code] - [FreeDA] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation,
CVPR2024
. [Code] - [OVAM] Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models,
CVPR2024
. [Code] - [PnP-OVSS] Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models,
CVPR2024
. [Code] - [OVFoodSeg] OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation,
CVPR2024
. - [SED] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation,
CVPR2024
.