Software to train climate reconstruction technology (image inpainting with partial convolutions) with numerical model output and to re-fill missing values in observational datasets (e.g., HadCRUT4) using trained models.
- python>=3.7
- pytorch>=1.8.0
- tqdm>=4.59.0
- torchvision>=0.2.1
- numpy>=1.20.1
- matplotlib>=3.4.3
- tensorboardX>=2.4.0
- tensorboard>=2.8.0
- xarray>=0.20.2
- netcdf4>=1.5.8
- setuptools==59.5.0
- xesmf>=0.6.2
- cartopy>=0.20.2
- numba>=0.55.1
An Anaconda environment with all the required dependencies can be created using environment.yml
conda env create -f environment.yml
To activate the environment, use:
conda activate crai
can be installed using pip
in the current directory:
pip install .
The software can be used to:
- train a model (training)
- infill climate datasets using a trained model (evaluation)
The directory containing the climate datasets should have the following sub-directories:
for trainingtest_large
for evaluation
The climate datasets should be in netCDF format and placed in the corresponding sub-directories.
The missing values can be defined separately as masks. These masks should be in netCDF format and have the same dimension as the climate dataset.
A PyTorch model is required for the evaluation.
Once installed, the package can be used as:
- a command line interface (CLI):
- training:
- evaluation:
- a Python library:
- training:
from climatereconstructionai import train train()
- evaluation:
from climatereconstructionai import evaluate evaluate()
For more information about the arguments:
crai-train --help
usage: crai-train [-h] [--data-root-dir DATA_ROOT_DIR] [--mask-dir MASK_DIR] [--log-dir LOG_DIR] [--img-names IMG_NAMES] [--mask-names MASK_NAMES] [--data-types DATA_TYPES] [--device DEVICE] [--prev-next PREV_NEXT] [--lstm-steps LSTM_STEPS]
[--prev-next-steps PREV_NEXT_STEPS] [--encoding-layers ENCODING_LAYERS] [--pooling-layers POOLING_LAYERS] [--image-sizes IMAGE_SIZES] [--weights WEIGHTS] [--attention] [--channel-reduction-rate CHANNEL_REDUCTION_RATE]
[--disable-skip-layers] [--disable-first-last-bn] [--out-channels OUT_CHANNELS] [--snapshot-dir SNAPSHOT_DIR] [--resume-iter RESUME_ITER] [--batch-size BATCH_SIZE] [--n-threads N_THREADS] [--finetune] [--lr LR]
[--lr-finetune LR_FINETUNE] [--max-iter MAX_ITER] [--log-interval LOG_INTERVAL] [--save-snapshot-image] [--save-model-interval SAVE_MODEL_INTERVAL] [--loss-criterion LOSS_CRITERION] [--eval-timesteps EVAL_TIMESTEPS]
[--load-from-file LOAD_FROM_FILE]
optional arguments:
-h, --help show this help message and exit
--data-root-dir DATA_ROOT_DIR
Root directory containing the climate datasets
--mask-dir MASK_DIR Directory containing the mask datasets
--log-dir LOG_DIR Directory where the log files will be stored
--img-names IMG_NAMES
Comma separated list of netCDF files (climate dataset)
--mask-names MASK_NAMES
Comma separated list of netCDF files (mask dataset). If None, it extracts the masks from the climate dataset
--data-types DATA_TYPES
Comma separated list of variable types, in the same order as img-names and mask-names
--device DEVICE Device used by PyTorch (cuda or cpu)
--prev-next PREV_NEXT
--lstm-steps LSTM_STEPS
Number of considered sequences for lstm (0 = lstm module is disabled)
--prev-next-steps PREV_NEXT_STEPS
--encoding-layers ENCODING_LAYERS
Number of encoding layers in the CNN
--pooling-layers POOLING_LAYERS
Number of pooling layers in the CNN
--image-sizes IMAGE_SIZES
Spatial size of the datasets (latxlon must be of shape NxN)
--weights WEIGHTS Initialization weight
--attention Enable the attention module
--channel-reduction-rate CHANNEL_REDUCTION_RATE
Channel reduction rate for the attention module
Disable the skip layers
Disable the batch normalization on the first and last layer
--out-channels OUT_CHANNELS
Number of channels for the output image
--snapshot-dir SNAPSHOT_DIR
Parent directory of the training checkpoints and the snapshot images
--resume-iter RESUME_ITER
Iteration step from which the training will be resumed
--batch-size BATCH_SIZE
Batch size
--n-threads N_THREADS
Number of threads
--finetune Enable the fine tuning mode (use fine tuning parameterization and disable batch normalization
--lr LR Learning rate
--lr-finetune LR_FINETUNE
Learning rate for fine tuning
--max-iter MAX_ITER Maximum number of iterations
--log-interval LOG_INTERVAL
Iteration step interval at which a tensorboard summary log should be written
Save evaluation images for the iteration steps defined in --log-interval
--save-model-interval SAVE_MODEL_INTERVAL
Iteration step interval at which the model should be saved
--loss-criterion LOSS_CRITERION
Index defining the loss function (0=original from Liu et al., 1=MAE of the hole region)
--eval-timesteps EVAL_TIMESTEPS
Iteration steps for which an evaluation is performed
--load-from-file LOAD_FROM_FILE
Load all the arguments from a text file
crai-evaluate --help
usage: crai-evaluate [-h] [--data-root-dir DATA_ROOT_DIR] [--mask-dir MASK_DIR] [--log-dir LOG_DIR] [--img-names IMG_NAMES] [--mask-names MASK_NAMES] [--data-types DATA_TYPES] [--device DEVICE] [--prev-next PREV_NEXT] [--lstm-steps LSTM_STEPS]
[--prev-next-steps PREV_NEXT_STEPS] [--encoding-layers ENCODING_LAYERS] [--pooling-layers POOLING_LAYERS] [--image-sizes IMAGE_SIZES] [--weights WEIGHTS] [--attention] [--channel-reduction-rate CHANNEL_REDUCTION_RATE]
[--disable-skip-layers] [--disable-first-last-bn] [--out-channels OUT_CHANNELS] [--model-dir MODEL_DIR] [--model-names MODEL_NAMES] [--dataset-name DATASET_NAME] [--evaluation-dirs EVALUATION_DIRS] [--eval-names EVAL_NAMES]
[--infill {infill,test}] [--create-graph] [--original-network] [--partitions PARTITIONS] [--maxmem MAXMEM] [--load-from-file LOAD_FROM_FILE]
optional arguments:
-h, --help show this help message and exit
--data-root-dir DATA_ROOT_DIR
Root directory containing the climate datasets
--mask-dir MASK_DIR Directory containing the mask datasets
--log-dir LOG_DIR Directory where the log files will be stored
--img-names IMG_NAMES
Comma separated list of netCDF files (climate dataset)
--mask-names MASK_NAMES
Comma separated list of netCDF files (mask dataset). If None, it extracts the masks from the climate dataset
--data-types DATA_TYPES
Comma separated list of variable types, in the same order as img-names and mask-names
--device DEVICE Device used by PyTorch (cuda or cpu)
--prev-next PREV_NEXT
--lstm-steps LSTM_STEPS
Number of considered sequences for lstm (0 = lstm module is disabled)
--prev-next-steps PREV_NEXT_STEPS
--encoding-layers ENCODING_LAYERS
Number of encoding layers in the CNN
--pooling-layers POOLING_LAYERS
Number of pooling layers in the CNN
--image-sizes IMAGE_SIZES
Spatial size of the datasets (latxlon must be of shape NxN)
--weights WEIGHTS Initialization weight
--attention Enable the attention module
--channel-reduction-rate CHANNEL_REDUCTION_RATE
Channel reduction rate for the attention module
Disable the skip layers
Disable the batch normalization on the first and last layer
--out-channels OUT_CHANNELS
Number of channels for the output image
--model-dir MODEL_DIR
Directory of the trained models
--model-names MODEL_NAMES
Model names
--dataset-name DATASET_NAME
Name of the dataset for format checking
--evaluation-dirs EVALUATION_DIRS
Directory where the output files will be stored
--eval-names EVAL_NAMES
Prefix used for the output filenames
--infill {infill,test}
Infill the climate dataset ('test' if mask order is irrelevant, 'infill' if mask order is relevant)
--create-graph Create a Tensorboard graph of the NN
--original-network Use the original network architecture (from Kadow et al.)
--partitions PARTITIONS
Split the climate dataset into several partitions along the time coordinate
--maxmem MAXMEM Maximum available memory in MB (overwrite partitions parameter)
--load-from-file LOAD_FROM_FILE
Load all the arguments from a text file
An example can be found in the directory demo
The instructions to run the example are given in the file.
is licensed under the terms of the BSD 3-Clause license.
is maintained by the Climate Informatics and Technology group at DKRZ (Deutsches Klimarechenzentrum).
- Previous contributing authors: Naoto Inoue, Christopher Kadow, Stephan Seitz
- Current contributing authors: Johannes Meuer, Étienne Plésiat.