- Introduction
- Project Structure
- Dependencies
- Data
- Prediction Model
- Assimilation
- Visualization
- Contributors
- Citation
- License
In recent years, the integration of data-driven machine learning models with Data Assimilation (DA) has garnered significant interest in enhancing model performance in weather forecasting. This study embarks on this trend, detailing our approach and findings. We utilised the UK's local ERA5 T850 data and retrained the global weather forecasting model, USTN12, to enhance its accuracy in predicting temperatures specific to the UK region. We acquired t2m data from the ASOS ground observation stations across the UK. We applied the kriging method with polynomial drift term—an advanced geostatistical procedure—for interpolation to achieve a uniform resolution. Additionally, based on the ERA5 T850 data, Gaussian noise was randomly generated, laying the groundwork for subsequent multi-time step virtual observations. To investigate the assimilation effects, we assimilated the ASOS t2m data into the ERA5 T850 data. Our results indicate that while the original global forecast model can be migrated to cater to local regions, using atmospheric data for data assimilation notably enhances model performance. However, assimilating surface temperature into atmospheric data counters this improvement, diminishing the model's predictive capabilities.
For a detailed overview of the project's directory structure, please refer to the tree_structure.txt
file located in
the utils
directory. This file provides a comprehensive breakdown of the organization of files and directories within
the project.
To establish the required environment for this project using Conda, follow these steps:
-
Clone the Repository:
git clone https://github.com/acse-ww721/DA_ML_ERA5_ASOS_Weather_Forecasting_UK.git
To set up the required environment for this project, please follow the instructions below based on your operating system:
-
Create a Conda virtual environment on Windows or Mac OS
conda env create -f config/env/env_win.yml
or
conda env create -f config/env/env_mac.yml
-
Activate the virtual environment:
conda activate your_environment_name
-
Install the required packages based on your operating system:
pip install -r config/env/requirements_win.txt
or
conda install --file config/env/requirements_mac.txt
This project relies on several primary data sources for its analysis:
-
ERA5 Hourly Pressure Level Data (1940 - Present) from CDS:
- The project utilizes ERA5 hourly data on pressure levels ranging from 1940 to the present. This dataset can be accessed through the Copernicus Climate Data Store (CDS).
- Data source: CDS - ERA5 Pressure Levels.
-
ERA5 Hourly Single-Level Data (1940 - Present) from CDS:
- The project also leverages ERA5 hourly data on single levels covering the period from 1940 to the present, available through the Copernicus Climate Data Store (CDS).
- Data source: CDS - ERA5 Single Levels.
-
ASOS Hourly Observation Data (1979 - Present):
- Hourly ASOS data, collected from 1979 to the present, is a vital component of this project. These observations are obtained from the Mesonet program and can be accessed for download.
- Data source: Mesonet ASOS Data.
The code available in the src/data_collection
directory allows users to access, download, or crawl data from the
corresponding websites. While the primary focus is on the UK region, the code is designed to be adaptable for use in
other regions as well.
In the src/data_preprocessing
folder, you will find detailed information on the specific preprocessing steps applied
to the data. These steps include handling missing data, interpolation, regridding, and data cleaning.
You can access various data sets related to this project, including raw data, processed data for training models, and assimilated data, through the following Google Drive link:
The code for the model is available in the src/model
directory. The model is implemented using Python and relies on
the Tensorflow library.
The model undergoes training and validation using ERA5 T850 data spanning from 1979 to 2020. For validation purposes, ERA5 T850 data from the year 2021 is employed.
The model's performance is rigorously evaluated through testing, utilizing ERA5 T850 data for the year 2022.
Subsequently, the model is deployed to predict temperature values at ASOS stations and ERA5 data points for a time interval of 12 hours later.
For the assimilation of ASOS data, noisy model data, and virtual generated data into the ERA5 dataset, we employ the Sigma Point Ensemble Kalman Filter (SPEnKF) technique.
The code responsible for the assimilation process can be found in the src/assimilation
directory. This assimilation
procedure is implemented using Python.
Our assimilation methodology draws inspiration from the work of @ashesh6810. Their contributions have influenced the development of our assimilation approach.
You can access all the code related to the visualization part in the src/visualization
directory. The code is
implemented using Python and relies on the Matplotlib library.
If you find this repository useful in your research, please consider citing the following paper:
@inproceedings{wang2023data,
title={Data Assimilation using ERA5, ASOS, and the U-STN model for Weather Forecasting over the UK},
author={WANG, WENQI and Bieker, Jacob and Arcucci, Rossella and Quilodran-Casas, Cesar},
booktitle={NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning},
url={https://www.climatechange.ai/papers/neurips2023/61},
year={2023}
}
For any inquiries or issues with the code, please don't hesitate to reach out to me:
This project is licensed under the terms of the Apache 2.0 license. See the LICENSE file for details.