Feature/save intermediate ml diag data #200

AnnaKwa · 2020-03-25T21:04:21Z

Refactor for offline ML diagnostics workflow.

single dataset input to metrics and diagnostics functions
separation of "metrics" vs. "diagnostic" quantities
- metrics: R^2 (global values for 2d quantities, pressure level profiles for 3d) and RMSE
- diagnostics: ML dQ vs total maps, LTS, Vertical dQ2 profiles in wet/dry columns, diurnal cycle, time avg and snapshots of net precip and heating compared across datasets

AnnaKwa · 2020-03-25T21:09:09Z

fv3net/diagnostics/sklearn_model_performance/_diagnostics.py

@@ -3,116 +3,75 @@
 import numpy as np


Github isn't showing this diff by default since it is large. This file originally contained all the plotting functions for metrics and diagnostics; the main changes are

some of the "metrics" plots got moved out

plotting functions take the same single common dataset input

AnnaKwa · 2020-03-25T21:19:46Z

sample format of saved metrics netcdf:

<xarray.Dataset>
Dimensions:                                (grid_x: 49, grid_xt: 48, grid_y: 49, grid_yt: 48, initialization_time: 4, pressure: 37, tile: 6)
Coordinates:
    time                                   object ...
    dataset                                object ...
  * tile                                   (tile) int64 0 1 2 3 4 5
  * grid_xt                                (grid_xt) float64 1.0 2.0 ... 48.0
  * grid_yt                                (grid_yt) float64 1.0 2.0 ... 48.0
  * grid_x                                 (grid_x) float64 1.0 2.0 ... 49.0
  * grid_y                                 (grid_y) float64 1.0 2.0 ... 49.0
  * pressure                               (pressure) float64 1.0 2.0 ... 1e+03
  * initialization_time                    (initialization_time) object 2016-08-05 09:45:00 ... 2016-08-05 10:30:00
Data variables:
    R2_global_net_heating_vs_target        float64 ...
    R2_global_net_heating_vs_hires         float64 ...
    R2_sea_net_heating_vs_target           float64 ...
    R2_sea_net_heating_vs_hires            float64 ...
    R2_land_net_heating_vs_target          float64 ...
    R2_land_net_heating_vs_hires           float64 ...
    R2_global_net_precipitation_vs_target  float64 ...
    R2_global_net_precipitation_vs_hires   float64 ...
    R2_sea_net_precipitation_vs_target     float64 ...
    R2_sea_net_precipitation_vs_hires      float64 ...
    R2_land_net_precipitation_vs_target    float64 ...
    R2_land_net_precipitation_vs_hires     float64 ...
    lat                                    (tile, grid_yt, grid_xt) float32 ...
    latb                                   (tile, grid_y, grid_x) float32 ...
    lon                                    (tile, grid_yt, grid_xt) float32 ...
    lonb                                   (tile, grid_y, grid_x) float32 ...
    area                                   (tile, grid_yt, grid_xt) float32 ...
    r2_dQ1_pressure_levels_global          (pressure) float64 ...
    r2_dQ2_pressure_levels_global          (pressure) float64 ...
    r2_dQ1_pressure_levels_sea             (pressure) float64 ...
    r2_dQ2_pressure_levels_sea             (pressure) float64 ...
    r2_dQ1_pressure_levels_land            (pressure) float64 ...
    r2_dQ2_pressure_levels_land            (pressure) float64 ...
    mse_net_precipitation_vs_fv3_target    (grid_xt, grid_yt, initialization_time, tile) float64 ...
    mse_net_precipitation_vs_shield        (grid_xt, grid_yt, initialization_time, tile) float64 ...
    mse_net_heating_vs_fv3_target          (grid_xt, grid_yt, initialization_time, tile) float64 ...
    mse_net_heating_vs_shield              (grid_xt, grid_yt, initialization_time, tile) float64 ...

nbren12

Thanks Anna, the main method is now nice and clean. I think some attention should be placed on removing the use of global constants in the metrics.py file. I think my suggested refactors will make this code both more reusable and robust to future changes in variables names and conventions.

fv3net/diagnostics/__init__.py

nbren12 · 2020-03-26T16:49:11Z

fv3net/diagnostics/data.py

@@ -126,3 +130,46 @@ def net_heating_from_dataset(ds: xr.Dataset, suffix: str = None) -> xr.DataArray
        ds["PRATEsfc" + suffix],


hard code here. I expect this name will change in future versions.

fv3net/diagnostics/sklearn_model_performance/__main__.py

fv3net/diagnostics/sklearn_model_performance/metrics.py

nbren12 · 2020-03-26T19:29:18Z

fv3net/diagnostics/sklearn_model_performance/__main__.py

+
+    # create and save metrics dataset
+    # metrics: r2 global values, r2 pressure level profiles, MSE at locations
+    ds_metrics = create_metrics_dataset(ds)


This high level structure is very good.

AnnaKwa · 2020-03-27T00:42:44Z

Thanks for the review @nbren12 , ready for re-review.

nbren12

Great! Thanks for all the changes. I have some very minor comments below which you don't necessarily have to address if you don't want to.

external/vcm/vcm/select.py

nbren12 · 2020-03-27T20:50:30Z

fv3net/diagnostics/sklearn_model_performance/__main__.py

+    ds_test = ds.sel(dataset=DATASET_NAME_FV3_TARGET)
+    ds_hires = ds.sel(dataset=DATASET_NAME_SHIELD_HIRES)
+
+    ds_metrics = create_metrics_dataset(ds_pred, ds_test, ds_hires)


Great! this structure is pretty clear.

fv3net/diagnostics/sklearn_model_performance/metrics.py

nbren12 · 2020-03-27T21:01:04Z

fv3net/diagnostics/sklearn_model_performance/metrics.py

+    return ds_metrics
+
+
+def plot_metrics(ds_metrics, output_dir, dpi_figures):


I would consider moving these plotting routines to another module to separate the plotting and computation code even more.

CLN: Updates

Anna Kwa added 12 commits March 19, 2020 19:36

fix error

9bc7a2f

move metrics creation to separate .py

7ca7fd1

allow concat to fill in emtpy data arrays if missing

132fb22

fix merge datasets with missing data vars func

451d342

add rmse and fix pressure regridding

c8ef7a1

adjust diag plots for common input ds: dQ maps and LTS

5e6c76c

adjust remaining diag plots for common input ds

914efd1

metrics plots

6152679

linting

e19b97c

fix ds arg

60e9eb5

change module names and lint

b742116

clarify root mean square

7f5c6d4

AnnaKwa requested a review from nbren12 March 25, 2020 21:05

AnnaKwa commented Mar 25, 2020

View reviewed changes

rm underscore from private module names

dd9cc5c

nbren12 suggested changes Mar 26, 2020

View reviewed changes

Anna Kwa added 5 commits March 26, 2020 22:00

PR comments

5f411c8

fix bugs from last changes

ea390c2

fix more problems

2afac18

update test script

85947a5

Merge branch 'master' into feature/save-intermediate-ml-diag-data

9bf0dcc

AnnaKwa requested a review from nbren12 March 27, 2020 00:41

linting

6e97aaa

Anna Kwa added 2 commits March 27, 2020 00:45

linting

42d33fd

add explanatory comment

197a6fb

nbren12 approved these changes Mar 27, 2020

View reviewed changes

Anna Kwa added 2 commits March 30, 2020 16:40

PR comments

528b63a

update test script

a68d2d2

Anna Kwa added 3 commits March 30, 2020 16:49

add target dataset label coord back to metrics dataset

1e31068

linting

ca530fa

Merge branch 'master' into feature/save-intermediate-ml-diag-data

929fa10

AnnaKwa merged commit 2cceac3 into master Mar 30, 2020

AnnaKwa deleted the feature/save-intermediate-ml-diag-data branch March 30, 2020 18:55

spencerkclark pushed a commit that referenced this pull request May 7, 2021

Merge pull request #200 from TomAugspurger/cleanup

a97515a

CLN: Updates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/save intermediate ml diag data #200

Feature/save intermediate ml diag data #200

AnnaKwa commented Mar 25, 2020 •

edited

Loading

AnnaKwa Mar 25, 2020

AnnaKwa commented Mar 25, 2020

nbren12 left a comment

nbren12 Mar 26, 2020

nbren12 Mar 26, 2020

AnnaKwa commented Mar 27, 2020

nbren12 left a comment

nbren12 Mar 27, 2020

nbren12 Mar 27, 2020

		@@ -126,3 +130,46 @@ def net_heating_from_dataset(ds: xr.Dataset, suffix: str = None) -> xr.DataArray
		ds["PRATEsfc" + suffix],

		return ds_metrics


		def plot_metrics(ds_metrics, output_dir, dpi_figures):

Feature/save intermediate ml diag data #200

Feature/save intermediate ml diag data #200

Conversation

AnnaKwa commented Mar 25, 2020 • edited Loading

AnnaKwa Mar 25, 2020

Choose a reason for hiding this comment

AnnaKwa commented Mar 25, 2020

nbren12 left a comment

Choose a reason for hiding this comment

nbren12 Mar 26, 2020

Choose a reason for hiding this comment

nbren12 Mar 26, 2020

Choose a reason for hiding this comment

AnnaKwa commented Mar 27, 2020

nbren12 left a comment

Choose a reason for hiding this comment

nbren12 Mar 27, 2020

Choose a reason for hiding this comment

nbren12 Mar 27, 2020

Choose a reason for hiding this comment

AnnaKwa commented Mar 25, 2020 •

edited

Loading