Skip to content

Official repo for "Foundation Models for Remote Sensing and Earth Observation: A Survey"

Notifications You must be signed in to change notification settings

xiaoaoran/awesome-RSFMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

This is the repository of Foundation Models for Remote Sensing and Earth Observation: A Survey, a comprehensive survey of recent progress in multimodal foundation models for remote sensing and earth observation. For details, please refer to:

Foundation Models for Remote Sensing and Earth Observation: A Survey
[Paper]

arXiv Survey Maintenance PR's Welcome

Abstract

Remote Sensing (RS) is a crucial technology for observing, monitoring, and interpreting our planet, with broad applications across geoscience, economics, humanitarian fields, etc. While artificial intelligence (AI), particularly deep learning, has achieved significant advances in RS, unique challenges persist in developing more intelligent RS systems, including the complexity of Earth's environments, diverse sensor modalities, distinctive feature patterns, varying spatial and spectral resolutions, and temporal dynamics. Meanwhile, recent breakthroughs in large Foundation Models (FMs) have expanded AI’s potential across many domains due to their exceptional generalizability and zero-shot transfer capabilities. However, their success has largely been confined to natural data like images and video, with degraded performance and even failures for RS data of various non-optical modalities. This has inspired growing interest in developing Remote Sensing Foundation Models (RSFMs) to address the complex demands of Earth Observation (EO) tasks, spanning the surface, atmosphere, and oceans. This survey systematically reviews the emerging field of RSFMs. It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts. It then categorizes and reviews existing RSFM studies including their datasets and technical contributions across Visual Foundation Models (VFMs), Visual-Language Models (VLMs), Large Language Models (LLMs), and beyond. In addition, we benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions in this rapidly evolving field.

Citation

If you find our work useful in your research, please consider citing:

@article{xiao2024foundation,
  title={Foundation Models for Remote Sensing and Earth Observation: A Survey},
  author={Xiao, Aoran and Xuan, Weihao and Wang, Junjue and Huang, Jiaxing and Tao, Dacheng and Lu, Shijian and Yokoya, Naoto},
  journal={arXiv preprint arXiv:2410.16602},
  year={2024}
}

Menu

Visual Foundation models for RS

VFM Datasets

Datasets Date #Samples Modal Annotations Data Sources GSD paper Link
FMoW-RGB 2018 363.6k RGB 62 classes QuickBird-2, GeoEye-1, WorldView-2/3 varying paper download
BigEarthNet 2019 1.2 million MSI,SAR 19 LULC classes Sentinel-1/2 10,20,60m paper download
SeCo 2021 1 million MSI None Sentinel-2; NAIP 10,20,60m paper download
FMoW-Sentinel 2022 882,779 MSI None Sentinel-2 10m paper download
MillionAID 2022 1 million RGB 51 LULC classes SPOT, IKONOS,WorldView, Landsat, etc. 0.5m-153m paper download
GeoPile 2023 600K RGB None Sentinel-2, NAIP, etc. 0.1m-30m paper download
SSL4EO-L 2023 5 million MSI None Landsat 4–9 30m paper download
SSL4EO-S12 2023 3 million MSI, SAR None Sentinel-1/2 10m paper download
SatlasPretrain 2023 856K tiles RGB,MSI,SAR 137 classes of 7 types Sentinel-1/2, NAIP, NOAA Lidar Scans 0.5–2m,10m paper download
MMEarth 2024 1.2 million RGB,MSI,SAR,DSM None Sentinel-1/2, Aster DEM, etc. 10,20,60m paper download

VFM Models

Pre-training studies

  1. An empirical study of remote sensing pretraining. TGRS2022. | paper | code |
  2. Satlaspretrain: A large-scale dataset for remote sensing image understanding. ICCV2023. | paper | code |
  3. Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. ICCV2021. | paper | code |
  4. Geography-aware self-supervised learning. ICCV2021. | paper | code |
  5. Self-supervised material and texture representation learning for remote sensing tasks. CVPR2022. | paper | code |
  6. Change-aware sampling and contrastive learning for satellite images. CVPR2023. | paper | code |
  7. Csp: Self-supervised contrastive spatial pre-training for geospatial-visual representations. ICML2023. | paper | code |
  8. Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery. CVPR2024. | paper | code |
  9. Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery. ECCV2024. | paper | code |
  10. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. NeurIPS2022. | paper | code |
  11. Towards geospatial foundation models via continual pretraining. ICCV2023. | paper | code |
  12. Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning. ICCV2023. | paper | code |
  13. Bridging remote sensors with multisensor geospatial foundation models. CVPR2024. | paper | code |
  14. Rethinking transformers pre-training for multi-spectral satellite imagery. CVPR2024. | paper | code |
  15. Masked angle-aware autoencoder for remote sensing images. ECCV2024. | paper | code |
  16. Mmearth: Exploring multi-modal pretext tasks for geospatial representation learning. ECCV2024. | paper | code |
  17. Croma: Remote sensing representations with contrastive radar-optical masked autoencoders. NeurIPS2023. | paper | code |
  18. Cross-scale mae: A tale of multiscale exploitation in remote sensing. NeurIPS2023. | paper | code |

SAM-based studies

  1. SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model. NeurIPS2023 (DB). | paper | code |
  2. Sam-assisted remote sensing imagery semantic segmentation with object and boundary constraints. TGRS2024. | paper | code |
  3. Uv-sam: Adapting segment anything model for urban village identification. AAAI2024. | paper | code |
  4. Cs-wscdnet: Class activation mapping and segment anything model-based framework for weakly supervised change detection. TGRS2023. | paper | code |
  5. Adapting segment anything model for change detection in vhr remote sensing images. TGRS2024. | paper | code |
  6. Segment any change. NeurIPS2024. | paper |
  7. Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. TGRS2024. | paper | code |
  8. Ringmo-sam: A foundation model for segment anything in multimodal remote-sensing images. TGRS2023. | paper
  9. The segment anything model (sam) for remote sensing applications: From zero to one shot. JSTAR2023. | paper | code |
  10. Cat-sam: Conditional tuning for few-shot adaptation of segmentation anything model. ECCV2024 (oral). | paper | code |
  11. Segment anything with multiple modalities. arXiv2024. | paper | code |

Vision-Language Models for RS

VLM Datasets

Task Dataset Image Size GSD (m) #Text #Images Content Link
VQA RSVQA-LR 256 10 77K 772 Questions for existing judging, area estimation, object comparison, scene recognition download
VQA RSVQA-HR 512 0.15 955K 10,659 Questions for existing judging, area estimation, object comparison, scene recognition download
VQA RSVQAxBen 120 10--60 15M 590,326 Questions for existing judging, object comparison, scene recognition download
VQA RSIVQA 512--4,000 0.3--8 111K 37,000 Questions for existing judging, area estimation, object comparison, scene recognition download
VQA HRVQA 1,024 0.08 1,070K 53,512 Questions for existing judging, object comparison, scene recognition download
VQA CDVQA 512 0.5--3 122K 2,968 Questions for object changes download
VQA FloodNet 3,000--4,000 - 11K 2,343 Questions for for building and road damage assessment in disaster scenes download
VQA RescueNet-VQA 3,000--4,000 0.15 103K 4,375 Questions for building and road damage assessment in disaster scenes download
VQA EarthVQA 1,024 0.3 208K 6,000 Questions for relational judging, relational counting, situation analysis, and comprehensive analysis download
Image-Text Pre-tranining RemoteCLIP varied varied not specified not specified Developed based on retrieval, detection and segmentation data download
Image-Text Pre-tranining RS5M not specified varied 5M 5M Filtered public datasets, captioned existing data download
Image-Text Pre-tranining SKyScript not specified 0.1 - 30 2.6M 2.6M Earth Engine images linked with OpenStreetMap semantics download
Caption RSICD 224 - 24,333 10,921 Urban scenes for object description download
Caption UCM-Caption 256 0.3 2,100 10,500 Urban scenes for object description download
Caption Sydney 500 0.5 613 3,065 Urban scenes for object description download
Caption NWPU-Caption 256 0.2-30 157,500 31,500 Urban scenes for object description download
Caption RSITMD 224 - 4,743 4,743 Urban scenes for object description download
Caption RSICap 512 varied 3,100 2,585 Urban scenes for object description download
Caption ChatEarthNet 256 10 173,488 163,488 Urban and rural scenes for object description download
Visual Grounding GeoVG 1,024 0.24--4.8 7,933 4,239 Visual grounding based on object properties and relations download
Visual Grounding DIOR-RSVG 800 0.5--30 38,320 17,402 Visual grounding based on object properties and relations download
Mixed Multi-task MMRS-1M varied varied 1M 975,022 Collections of RSICD, UCM-Captions, FloodNet, RSIVQA, UC Merced, DOTA, DIOR-RSVG, etc download
Mixed Multi-task Geochat-Set varied varied 318k 141,246 Developed based on DOTA, DIOR, FAIR1M, FloodNet, RSVQA and NWPU-RESISC45 download
Mixed Multi-task LHRS-Align 256 1.0 1.15M 1.15M Constructed from Google Map and OSM properties download
Mixed Multi-task VRSBench 512 varied 205,307 29,614 Developed based on DOTA-v2 and DIOR dataset download

VLM Models

  1. Remoteclip: A vision language foundation model for remote sensing. TGRS2024. | paper | code |
  2. Rs5m: A large scale vision-language dataset for remote sensing vision-language foundation model. TGRS2024. | paper | code |
  3. Skyscript: A large and semantically diverse vision-language dataset for remote sensing. AAAI2024. | paper | code |
  4. Remote sensing vision-language foundation models without annotations via ground remote alignment. ICLR2024. | paper |
  5. Csp: Self-supervised contrastive spatial pre-training for geospatial-visual representations. ICML2023. | paper | code |
  6. Geoclip: Clip-inspired alignment between locations and images for effective worldwide geo-localization. NeurIPS2024. | paper | code |
  7. Satclip: Global, general-purpose location embeddings with satellite imagery. arXiv2023. | paper | code |
  8. Learning representations of satellite images from metadata supervision. ECCV2024. | paper |

Large Language Models for RS

  1. Geollm: Extracting geospatial knowledge from large language models. ICLR2024. | paper | code |

Generative Foundation Models for RS

  1. Diffusionsat: A generative foundation model for satellite imagery. ICLR2024. | paper | code |
  2. MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation. NeurIPS2024.

Other RSFMs

weather forecasting

  1. Accurate medium-range global weather forecasting with 3d neural networks. Nature, 2023. | paper | code |

About

Official repo for "Foundation Models for Remote Sensing and Earth Observation: A Survey"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published