Welcome to our carefully curated collection of amazing Multimodal Urban Computing models! This repository serves as a valuable addition to our comprehensive survey paper. Rest assured, we are committed to consistently updating it to ensure it remains up-to-date and relevant.
By Citymind LAB, HKUST(GZ). If there are any areas, papers, and datasets I missed, please let me know!
Check out our comprehsensive tutorial paper:
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook. Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, Yuxuan Liang. [Link]
Abstract: As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., traffic, geographical, social network, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning methods to facilitate cross-domain data fusion in smart cities. To this end, we propose the first survey that systematically reviews the latest advancements in deep learning-based data fusion methods tailored for urban computing. Specifically, we first delve into data perspective to comprehend the role of each modality and data source. Secondly, we classify the methodology into four primary categories: feature-based, alignment-based, contrast-based, and generation-based fusion methods. Thirdly, we further categorize multi-modal urban applications into seven types: urban planning, transportation, economy, public safety, society, environment, and energy. Compared with previous surveys, we focus more on the synergy of deep learning methods with urbancomputing applications. Furthermore, we shed light on the interplay between Large Language Models (LLMs) and urban computing, postulating future research directions that could revolutionize the field. We firmly believe that the taxonomy, progress, and prospects delineated in our survey stand poised to significantly enrich the research community.
We strongly encourage authors of relevant works to make a pull request and add their paper's information [here].
- 2024.12.16: Latest update of this paper list.
- 2024.07.28: 🎉🎉🎉 Our paper has been accepted by Information Fusion (IF=18.6)!
- 2024.05.31: Latest update of this paper list.
- 2024.01.31: Update of this paper list.
If you find our work useful in your research, please consider citing:
@article{zou2025deep,
title={Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook},
author={Zou, Xingchen and Yan, Yibo and Hao, Xixuan and Hu, Yuehong and Wen, Haomin and Liu, Erdong and Zhang, Junbo and Li, Yong and Li, Tianrui and Zheng, Yu and others},
journal={Information Fusion},
volume={113},
pages={102606},
year={2025},
publisher={Elsevier}
}
- Related Surveys
- Taxonomy Framework
- Data Fusion Methods
- Taxonomy and summary of open-sourced dataset
- Highly Related Paper List
- Methodologies for cross-domain data fusion: An overview [paper]
IEEE Transactions on Big Data (2015) - Deep learning for spatio-temporal data mining: A survey [paper]
IEEE Transactions on Knowledge and Data Engineering (2020) - Urban big data fusion based on deep learning: An overview [paper]
Elsevier Information Fusion (2020) - Urban flow prediction from spatiotemporal data using machine learning: A survey [paper]
Elsevier Information Fusion (2020) - A survey of traffic prediction: from spatio-temporal data to intelligent transportation [paper]
Springer Data Science and Engineering (2021) - Multi-feature, multi-modal, and multi-source social event detection: A comprehensive survey [paper]
Elsevier Information Fusion (2022) - Generative adversarial networks for spatio-temporal data: A survey [paper]
ACM Transactions on Intelligent Systems and Technology (2022) - Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data [paper]
arXiv preprint (2022) - Urban Foundation Models: A Survey [paper)]
KDD Tutorial Track Paper(2024)
This survey is structured along three dimensions:
- data in cross-domain fusion in urban computing
- modality fusion methods
- applications based on data fusion.
The summary of deep learning-based cross-domain data fusion models in urban computing.
Notice that method names are assigned based on original reference model names if available; otherwise, they are named after the first authors.
Below is a list of open source datasets categorized by their type and source.
Category | Content | Format | Dataset | Reference |
---|---|---|---|---|
Geographical Data | Satellite Image | Image | ArcGIS [Link] PlanetScope [Link] Google Earth [Link] OpenStreetMap [Link] Baidu Maps [Link] |
[1] [1] [1] [1] [1][2] |
Street View Image | Image | Baidu Map [Link] Google Street [Link] Tencent Map [Link] |
[1][2] [1][2] [1] |
|
POIs | Point Vector | Tencent Map Service [Link] WeChat POIs [Link] Baidu Map POIs [Link] NYC Open POIs [Link] Foursquare [Link] Wikipedia POIs [Link] AMap Service [Link] Yelp POIs [Link] Dianping.com POIs [Link] Weibo POIs [Link] Flickr POIs [Link] Bing Map POIs [Link] |
[1][2] [1] [1][2][3][4][5] [1][2][3][4][5] [1][2][3][4][5][6] [1] [1] [1][2 ][3 ] [1 ][2 ] [1 ][2 ][3] [1] [1] |
|
Traffic Data | Traffic Trajectory | Spatio-temporal Trajectory | Shenzhen UCar [Link] Chicago Transportation [Link] VED [Link] Taxi Shenzhen [Link] NYC Open Taxi Data [Link] GeoLife [Link] T-Drive Taxi [Link] DiDi Traffic [Link] Xiamen Taxi [Link] Grab-Posisi [Link] |
[1] [1][2][3] [2][3] [1][2] [1][2] [1][2][3][4][5] [1][2][3][4] [1][2][3][4][5] [1][2][3][4] [1][2] |
Traffic Flow | Spatial-temporal Graph | California-PEMS [Link] METR-LA [Link] Large-ST [Link] MobileBJ [Link] DiDi (Traffic flows) [Link] TaxiBJ [Link] BikeNYC [Link] |
[1][2] [1][2] [1] [1][2][3] [1][2][3][4][5][6] [1][2][3][4] |
|
Road Network | Spatial Graph | OpenStreetMap [Link] US Census Bureau [Link] |
[1][2][3][4][5] [1] |
|
Logistics | Spatio-temporal Trajectory | LaDe [Link] JD Logistics [Link] |
[1] [1] |
|
Social Network Data | Text | Text | Twitter [Link] Common Crawl [Link] Yelp Reviews [Link] Weibo Traffic Police [Link] |
[1][2][3][4][5][6][7] [8][9][10][11][12][13] [1] [1][2] [1] |
Geo-tagged Image & Video |
Image & Video | YFCC100M [Link] NUS-WIDE [Link] GeoUGV [Link] |
[1][2][3] [1][2] [1] |
|
User' Info. | Time Series | Jiepang User Check-in [Link] Gowalla User Location [Link] WeChat Mobility [Link] |
[1] [1][2] [1] |
|
Demographic Data | Crime | Time series | NYC Crime [Link] | [1] |
Land Use | Time series | Land Use SG [Link] Land Use NYC [Link] |
[1] [1] |
|
Population | Time series | WorldPop [Link] | [1][2][3] | |
Environmental Data | Meteorology | Time series | TipDM China Weather [Link] DarkSky Weather [Link] WeatherNYC [Link] WeatherChicago [Link] Weather Underground [Link] DidiSY [Link] WD_BJ weather [Link] WD_USA weather [Link] |
[1] [1] [1] [1] [1] [1] [1] [1] |
Greenery | Time series | Google Earth [Link] | [1] | |
Air Quality | Time series | UrbanAir [Link] KnowAir [Link] |
[1][2][3] [1][2][3][4] |
The list provided below represents only a portion of the projects that we have undertaken in this field. It is important to note that this list is not exhaustive and will be continuously updated.
Please find below a partial list of our laboratory's highly relevant projects in multimodal data fusion in urban computing:
- Will You Come Back / Check-in Again? Understanding Characteristics Leading to Urban Revisitation and Re-check-in
[paper]
In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2020 - Spatio-Temporal Vehicle Trajectory Recovery on Road Network Based on Traffic Camera Video Data
[paper]
In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022 - Beyond the First Law of Geography: Learning Representations ofSatellite Imagery by Leveraging Point-of-Interests
[paper]
In ACM Web Conference, 2022 - Spatio-Temporal Urban Knowledge Graph Enabled Mobility Prediction
[paper]
In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2021 - Vehicle Trajectory Recovery on Road Network Based on Traffic Camera Video Data
[paper]
In Proceedings of the 29th International Conference on Advances in Geographic Information Systems, 2021. - Predicting multi-level socioeconomic indicators from structural urban imagery
[paper]
In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022 - Knowledge-infused Contrastive Learning for Urban Imagery-based Socioeconomic Prediction
[paper]
In ACM Web Conference, 2023 - Multi-View Joint Graph Representation Learning for Urban Region Embedding
[paper]
In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021 - DeepSTN+: Context-Aware Spatial-Temporal Neural Network for Crowd Flow Prediction in Metropolis
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2019
- An Effective Joint Prediction Model for Travel Demands and Traffic Flows.
[paper]
In Proceedings of the IEEE 37th International Conference on Data Engineering, 2021
- A force-directed approach to seeking route recommendation in ride-on-demand service using multi-source urban data
[paper]
In IEEE Transactions on Mobile Computing, 2020 - Rod-revenue: Seeking strategies analysis and revenue prediction in ride-on-demand service using multi-source urban data
[paper]
In IEEE Transactions on Mobile Computing, 2019
- GSNet: Learning Spatial-Temporal Correlations from Geographical and Semantic Aspects for Traffic Accident Risk Forecasting
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2021 - Modeling Spatial--Temporal Constraints and Spatial-Transfer Patterns for Couriers’ Package Pick-up Route Prediction
[paper]
In IEEE Transactions on Intelligent Transportation Systems, 2023
- Inferring region significance by using multi-source spatial data
[paper]
In Neural Computing and Applications, 2020
- Pre-Trained Semantic Embeddings for POI Categories Based on Multiple Contexts
[paper]
In IEEE Transactions on Knowledge and Data Engineering, 2022
- Photo2Trip: Exploiting Visual Contents in Geo-tagged Photos for Personalized Tour Recommendation
[paper]
In Proceedings of the 25th ACM International Conference on Multimedia, 2017
- TripPlanner: Personalized Trip Planning Leveraging Heterogeneous Crowdsourced Digital Footprints
[paper]
In IEEE Transactions on Intelligent Transportation Systems, 2014
- Forecasting Fine-Grained Air Quality Based on Big Data
[paper]
In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015 - Urban Sensing Based on Human Mobility
[paper]
In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2016 - Dual-grained human mobility learning for location-aware trip recommendation with spatial–temporal graph knowledge fusion
[paper]
In Information Fusion, 2023 - Symbolic aggregate approximation based data fusion model for dangerous driving behavior detection
[paper]
In Information Sciences, 2022 - Contextual spatio-temporal graph representation learning for reinforced human mobility mining
[paper]
In Information Sciences, 2022 - HiSTGNN: Hierarchical spatio-temporal graph neural network for weather forecasting [paper]
In Information Sciences, 2023 - Modeling multi-regional temporal correlation with gated recurrent unit and multiple linear regression for urban traffic flow prediction [paper]
In Knowledge-Based Systems, 2023 - Predicting citywide crowd flows using deep spatio-temporal residual networks [paper]
In Elsevier, 2018
- DeepTransport: Prediction and Simulation of Human Mobility and Transportation Mode at a Citywide Level
[paper]
In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016 - Optimization of Causative Factors for Landslide Susceptibility Evaluation Using Remote Sensing and GIS Data in Parts of Niigata, Japan
[paper]
In PloS One, 2015 - DeepMob: Learning Deep Knowledge of Human Emergency Behavior and Mobility from Big and Heterogeneous Data
[paper]
In ACM Transactions on Information Systems, 2017 - DeepUrbanEvent: A System for Predicting Citywide Crowd Dynamics at Big Events
[paper]
In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019 - DeepMob: Integrating GPS trajectory and topics from Twitter stream for human mobility estimation
[paper]
In Frontiers of Computer Science, 2019
- Citywide traffic congestion estimation with social media
[paper]
In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2015 - Estimating Urban Traffic Congestions with Multi-sourced Data
[paper]
In Proceedings of the 17th IEEE International Conference on Mobile Data Management, 2016 - Enhancing Traffic Congestion Estimation with Social Media by Coupled Hidden Markov Model
[paper]
In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), 2016 - Computing urban traffic congestions by incorporating sparse GPS probe data and social media data
[paper]
In ACM Transactions on Information Systems, 2017 - Forecasting Citywide Traffic Congestion Based on Social Media
[paper]
In Wireless Personal Communications, 2018 - Traffic Accident Risk Prediction via Multi-View Multi-Task Spatio-Temporal Networks
[paper]
In IEEE Transactions on Knowledge and Data Engineering, 2021
- When Urban Region Profiling Meets Large Language Models
[paper]
In Proceedings of the Web Conference, 2024 - Airformer: Predicting nationwide air quality in china with
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2023 - Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery
[paper]
In Proceedings of the 30th ACM International Conference on Multimedia, 2022 - Learning Multi-context Aware Location Representations from Large-scale Geotagged Images
[paper]
In Proceedings of the 29th ACM International Conference on Multimedia, 2021 - Fine-Grained Urban Flow Prediction
[paper]
In Proceedings of the Web Conference, 2021 - Geoman: Multi-level attention networks for geo-sensory time series prediction
[paper]
In Proceedings of the Web Conference, 2021 - Fine-Grained Urban Flow Prediction
[paper]
In Proceedings of the Web Conference, 2021 - Urbanfm: Inferring fine-grained urban flows
[paper]
In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery, 2019 - Diffstg: Probabilistic spatio-temporal graph forecasting with denoising diffusion models
[paper]
In Proceedings of the Joint Conference on Artificial Intelligence, 2023
- Spatio-Temporal Meta Contrastive Learning
[paper)]
Proceedings of the ACM International Conference on Information and Knowledge Management, 2023 - Exploiting spatial-temporal-social constraints for localness inference using online social media
[paper]
In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2016 - DeepCrime: Attentive hierarchical recurrent networks for crime prediction
[paper)]
Proceedings of the ACM international conference on information and knowledge management, 2018
- PANDA: predicting road risks after natural disasters leveraging heterogeneous urban data
[paper]
In CCF Transactions on Pervasive Computing and Interaction, 2022 - UVLens: Urban Village Boundary Identification and Population Estimation Leveraging Open Government Data
[paper]
In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2021 - iTV: Inferring Traffic Violation-Prone Locations With Vehicle Trajectories and Road Environment Data
[paper]
In IEEE Systems Journal, 2021 - RADAR: Road Obstacle Identification for Disaster Response Leveraging Cross-Domain Urban Data
[paper]
In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2018
- Unified route representation learning for multi-modal transportation recommendation with spatiotemporal pre-training
[paper]
In The VLDB Journal, 2023 - NodeSense2Vec: Spatiotemporal Context-Aware Network Embedding for Heterogeneous Urban Mobility Data
[paper]
In Proceedings of the IEEE International Conference on Big Data (Big Data), 2021 - Collective embedding with feature importance: A unified approach for spatiotemporal network embedding
[paper]
In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020 - Beyond Geo-First Law: Learning Spatial Representations via Integrated Autocorrelations and Complementarity
[paper]
In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2019 - Joint Representation Learning for Multi-Modal Transportation Recommendation
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2019 - Efficient Region Embedding with Multi-View Spatial Networks:A Perspective of Locality-Constrained Spatial Autocorrelations
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2019 - Human-instructed deep hierarchical generative learning for automated urban planning
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2023 - Deep human-guided conditional variational generative modeling for automated urban planning
[paper]
In IEEE international conference on data mining, 2021
- Spatiotemporal Activity Modeling via Hierarchical Cross-Modal Embedding
[paper]
In IEEE Transactions on Knowledge and Data Engineering, 2020
- Similar Trajectory Search with Spatio-Temporal Deep Representation Learning
[paper]
In ACM Transactions on Intelligent Systems and Technology, 2021 - An Effective Joint Prediction Model for Travel Demands and Traffic Flows
[paper]
*In Proceedings of the IEEE 37th International Conference on Data Engineering (ICDE), 2021
- A Joint Context-Aware Embedding for Trip Recommendations
[paper]
In Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE), 2019
- Multimodal Trajectory Prediction: A Survey
[paper]
In arXiv, 2023 - Explainable spatiotemporal reasoning for geospatial intelligence applications
[paper]
In Transactions in GIS, 2022 - Event-Aware Multimodal Mobility Nowcasting
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2022 - Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
[paper]
In arXiv, 2022
- DuARE: Automatic Road Extraction with Aerial Images and Trajectory Data at Baidu Maps
[paper]
In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022 - ERNIE-GeoL: A Geography-and-Language Pre-trained Model and its Applications in Baidu Maps
[paper]
In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022 - DuTraffic: Live Traffic Condition Prediction with Trajectory Data and Street Views at Baidu Mapss
[paper]
In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022 - A Contextual Master-Slave Framework on Urban Region Graph for Urban Village Detection
[paper]
In Proceedings of the IEEE 39th International Conference on Data Engineering (ICDE), 2023
- Jointly Contrastive Representation Learning on Road Network and Trajectory
[paper]
In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022 - Spatio-Temporal Graph Convolutional and Recurrent Networks for Citywide Passenger Demand Prediction
[paper]
In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019 - STG2Seq: Spatial-temporal Graph to Sequence Model for Multi-step Passenger Demand Forecasting
[paper]
In arXiv, 2019
- Joint predictions of multi-modal ride-hailing demands: A deep multi-task multi-graph learning-based approach
[paper]
In Transportation Research Part C: Emerging Technologies, 2021 - Multi-modal graph interaction for multi-graph convolution network in urban spatiotemporal forecasting
[paper]
In Sustainability, 2022
- Unsupervised Representation Learning of Spatial Data via Multimodal Embedding
[paper]
In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019 - Deep multi-view spatial-temporal network for taxi demand prediction
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2018
- Forecasting fine-grained urban flows via spatio-temporal contrastive self-supervision
[paper]
In IEEE Transactions on Knowledge and Data Engineering, 2022 - Spatio-Temporal Self-Supervised Learning for Traffc Flow Prediction
[paper]
In Proceedings of the AAAI Conference on Artificial Intelligence, 2023 - Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd Flow Inference
[paper]
In arXiv, 2023 - A Cross-City Federated Transfer Learning Framework: A Case Study on Urban Region Profiling
[paper]
In arXiv, 2022 - Predicting citywide crowd flows in irregular regions using multi-view graph convolutional networks
[paper]
In IEEE Transactions on Knowledge and Data Engineering, 2020 - Urban computing: concepts, methodologies, and applications
[paper]
In ACM Transactions on Intelligent Systems and Technology, 2014 - Traffic flow forecasting with spatial-temporal graph diffusion network
[paper]
In Proceedings of the AAAI conference on artificial intelligence, 2021 - Spatio-temporal meta learning for urban traffic prediction
[paper]
In IEEE Transactions on Knowledge and Data Engineering, 2020 - Deep distributed fusion network for air quality prediction
[paper]
In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018 - Service Time Prediction for Delivery Tasks via Spatial Meta-Learning
[paper]
In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022 - Spatio-Temporal Graph Neural Networks for Predictive Learning in Urban Computing: A Survey
[paper]
In IEEE Transactions on Knowledge and Data Engineering, 2023 - SAInf: Stay Area Inference of Vehicles using Surveillance Camera Records
[paper]
In urban-computing.com, 2023