Skip to content

NO2 Prediction: Performance and Robustness Comparison between Random Forest and Graph Neural Network

License

Notifications You must be signed in to change notification settings

RiSchmi/MLbased_spatial_pollution_prediction

Repository files navigation

This repository utilizes and compares two machine learning approaches for spatial pollution prediction/ interpolation based on shared point-of-interest attributes between different NO2 measuring sites in Berlin. A traditional Random Forest Regressor is compared to a graph neural network which combines neighborhood aggregation and contrastive, unsupervised embedding for graph representation learning and is alternated from the implementation by Vu et al. (2024).

This work is part of my Master Thesis which can be found in the repository under Thesis_spatial_pollution_prediction.pdf, which serves as detailed source of explanation.

Structure and Notebooks:

  • RF_model.ipynb: RF Regressor for prediction
  • RF_model_wrapper_importance.ipynb: global feature importance through wrapper-based retraining for RF Regressor
  • RF_model_hyperparameter_tuning.ipynb: incremental hyperparameter section for RF Regressor
  • graph_neural_network.ipynb + utilities: utilization of gnn
  • analysis_error_feature.ipynb: comprehensive model comparison, including: performance metrics, feature-wise and temporal residual analysis, feature importance through Shapley values
  • Thesis_spatial_pollution_prediction.pdf: detailed theoretical describtion of architecture and data transformation

Data source and Acknowledgement:

The dataset is constructed with the intersection of multiple geological and meteorological datasets by the Berlin Geo Portal and the German Weather Serves(Deutscher Wetter Dienst). The dataset and feature engineering including missing data imputation, multicollinearity and EDA are separately addressed in the repositories: construction of berlin_land_use_dataset and MLbased_meteorological_data_imputation. The usage of the geological and meteorological data is regulated by the "Creative Commons BY 4.0" (CC BY 4.0) and detailed in License.

Vu, V., Nguyen, D., Nguyen, T., Nguyen, Q., P.L., N., & Huynh, T. (2024). Self-supervised air quality estimation with graph neural network assistance and attention enhancement. Neural Computing and Applications. https://doi.org/https://doi.org/10.1007/s00521-024-09637-7

The original owner of the data and code used in this thesis retains ownership of the data and code.