Skip to content

📖A curated list of VLMs Paper with codes in RS, Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques.

License

Notifications You must be signed in to change notification settings

taolijie11111/VLMs-in-RS-review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

📒Awesome VLMs in RS

This is the repository of Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques, a systematic survey of recent VLM studies in Remote Sensing including Datasets, Capabilities, and Enhancement Techniques. For details, please refer to:

Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques [paper]

©️Abstract

Recently, the remarkable success of ChatGPT has sparked a renewed wave of interest in artificial intelligence (AI), and the advancements in visual language models (VLMs) have pushed this enthusiasm to new heights. Differring from previous AI approaches that generally formulated different tasks as discriminative models, VLMs frame tasks as generative models and align language with visual information, enabling the handling of more challenging problems. The remote sensing (RS) field, a highly practical domain, has also embraced this new trend and introduced several VLM-based RS methods that have demonstrated promising performance and enormous potential. In this paper, we first review the fundamental theories related to VLM, then summarize the datasets constructed for VLMs in remote sensing and the various tasks they addressed. Finally, we categorize the improvement methods into three main parts according to the core components of VLMs and provide a detailed introduction and comparison of these methods.

©️Citation

If you find our work useful in your research, please consider citing:

@misc{tao2024advancementsvisuallanguagemodels,
      title={Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques}, 
      author={Lijie Tao and Haokui Zhang and Haizhao Jing and Yu Liu and Kelu Yao and Chao Li and Xizhe Xue},
      year={2024},
      eprint={2410.17283},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.17283}, 
}

📖Contents

📖Recent in VLMs for RS (©️back👆🏻)

📖Contrastive Methods (©️back👆🏻)

Published in Title Paper Code/Project
TGRS 2024 [RemoteCLIP] Remoteclip: A vision language foundation model for remote sensing link RemoteCLIP
RS 2024 [CRSR] Cross-modal retrieval and semantic refinement for remote sensing image captioning link
arXiv 2024 [ProGEO] Progeo: Generating prompts through image-text contrastive learning for visual geo-localization pdf ProGEO
ICLR 2024 [GRAFT] Remote sensing vision-language foundation models without annotations via ground remote alignment pdf
TGRS 2024 [GeoRSCLIP] Rs5m and georsclip: A large scale vision-language dataset and a large vision-language model for remote sensing pdf GeoRSCLIP
ISPRS 2024 [ChangeCLIP] Changeclip: Remote sensing change detection with multimodal vision-language representation learning link ChangeCLIP
CVPR 2023 [APPLeNet]Applenet: Visual attention parameterized prompt learning for few-shot remote sensing image generalization using clip pdf APPLeNet
TGRS 2023 [MGVLF] Rsvg: Exploring data and models for visual grounding on remote sensing data pdf MGVLF

📖Conversational Methods (©️back👆🏻)

Published in Title Paper Code/Project
RS 2024 [RS-LLaVA] Rs-llava: A large vision-language model for joint captioning and question answering in remote sensing imagery link RS-LLaVA
arXiv 2023 [H2RSVLM] H2rsvlm: Towards helpful and honest remote sensing large vision language model pdf H2RSVLM
arXiv 2024 [SkySenseGPT] Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding pdf SkySenseGPT
CVPR 2024 [GeoChat] Geochat: Grounded large vision-language model for remote sensing pdf GeoChat
arXiv 2023 [RSGPT] RSGPT: A Remote Sensing Vision Language Model and Benchmark pdf RSGPT
arXiv 2024 [Skyeyegpt] Skyeyegpt: Unifying remote sensing vision-language tasks via instruction tuning with large language model pdf Skyeyegpt
arXiv 2024 [RS-CapRet] Large language models for captioning and retrieving remote sensing images pdf
arXiv 2024 [LHRS-Bot] Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model pdf LHRS-Bot
TGRS 2024 [EarthGPT] Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain pdf EarthGPT
TGRS 2023 A decoupling paradigm with prompt learning for remote sensing image change captioning link code

📖Other Methods (©️back👆🏻)

Published in Title Paper Code/Project
TIP 2023 [Txt2Img] Txt2img-mhn: Remote sensing image generation from text using modern hopfield networks pdf Txt2Img
WACV 2024 [CPSeg] Cpseg: Finer-grained image semantic segmentation via chain-of-thought language prompting pdf
TGRS 2023 [SHRNet] A spatial hierarchical reasoning network for remote sensing visual question answering link
SIGIR 2023 [MGeo] Mgeo: Multi-modal geographic language model pre-training pdf Mgeo
NeurIPS 2023 [GeoCLIP] Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization pdf GeoCLIP
TPAMI 2024 [SpectralGPT] Spectralgpt: Spectral remote sensing foundation model pdf SpectralGPT
TGRS 2023 [TEMO] Few-shot object detection in aerial imagery guided by text-modal knowledge link

📖Datasets in VLMs for RS (©️back👆🏻)

📖Manual Datasets (©️back👆🏻)

Published in Title Image Paper Code/Project
CVPR 2024 [Hallusionbench] HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models 346 pdf Hallusionbench
arXiv 2023 [RSICap] RSGPT: A Remote Sensing Vision Language Model and Benchmark 2585 pdf RSICap
TGRS 2023 [CRSVQA] Multistep Question-Driven Visual Question Answering for Remote Sensing 4639 pdf CRSVQA

📖Combining Datasets (©️back👆🏻)

Published in Title Image Paper Code/Project
ICCV 2023 [SATIN] Satin: A multi-task metadataset for classifying satellite imagery using vision-language models ≈775K pdf SATIN
ICCV 2023 [GeoPile] Towards geospatial foundation models via continual pretraining 600K pdf GeoPile
ICCV 2023 [SatlasPretrain] Satlaspretrain: A large-scale dataset for remote sensing image understanding 856K pdf SatlasPretrain
TGRS 2023 [RSVGD] Rsvg: Exploring data and models for visual grounding on remote sensing data 17402 pdf RSVGD
TGRS 2024 [RefsegRS] Rrsis: Referring remote sensing image segmentation 4420 pdf RefsegRS
arXiv 2024 [SkyEye-968K] Skyeyegpt: Unifying remote sensing vision-language tasks via instruction tuning with large language model 968K pdf SkyEye-968K
TGRS 2024 [MMRS-1M] Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain 1M pdf MMRS-1M
arXiv 2023 [RSSA] H2rsvlm: Towards helpful and honest remote sensing large vision language model 44K pdf RSSA
TGRS 2024 [FineGrip] Panoptic perception: A novel task and fine-grained dataset for universal remote sensing image interpretation 2649 pdf
CVPR 2024 [RRSIS-D] Rotated multiscale interaction network for referring remote sensing image segmentation 17402 pdf RRSIS-D
TGRS 2022 [RingMo] Ringmo: A remote sensing foundation model with masked image modeling 2096640 link
arXiv 2023 [GRAFT] Remote sensing vision-language foundation models without annotations via ground remote alignment - pdf
CVPR 2024 [SkySense] Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery 21.5M pdf
AAAI 2024 [EarthVQA] Earthvqa: Towards queryable earth via relational reasoning-based remote sensing visual question answering 6000 pdf EarthVQA
TGRS 2024 [GeoSense] Generative convnet foundation model with sparse modeling and low-frequency reconstruction for remote sensing image interpretation ≈9M link GeoSense

📖Automatically Annoteted Datasets (©️back👆🏻)

Published in Title Image Paper Code/Project
TGRS 2024 [RS5M] Rs5m and georsclip: A large scale vision-language dataset and a large vision-language model for remote sensing 5M pdf RS5M
AAAI 2024 [SkyScript] Skyscript: A large and semantically diverse vision-language dataset for remote sensing 2.6M pdf
arXiv 2024 [LHRS-Align] Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model 1.15M pdf LHRS-Align
CVPR 2024 [GeoChat] Geochat: Grounded large vision-language model for remote sensing 318K pdf GeoChat
ICML 2024 [GeoReasoner] Georeasoner: Geo-localization with reasoning in street views using a large vision-language model 70K+ pdf GeoReasoner
arXiv 2023 [HqDC-1.4M] H2rsvlm: Towards helpful and honest remote sensing large vision language model ≈1.4M pdf HqDC-1.4M
CVPR 2024 [ChatEarthNet] ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models 163488 pdf ChatEarthNet
arXiv 2024 [VRSBench] Vrsbench: A versatile vision-language benchmark dataset for remote sensing image understanding 29614 pdf VRSBench
arXiv 2024 [FIT-RS] Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding 1800.8K pdf FIT-RS

📖Capabilities in VLMs for RS (©️back👆🏻)

©️License

GNU General Public License v3.0

🎉Contribute

Welcome to star & submit a PR to this repo!

About

📖A curated list of VLMs Paper with codes in RS, Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques.

Resources

License

Stars

Watchers

Forks

Packages

No packages published