-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/save intermediate ml diag data #200
Conversation
@@ -3,116 +3,75 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github isn't showing this diff by default since it is large. This file originally contained all the plotting functions for metrics and diagnostics; the main changes are
- some of the "metrics" plots got moved out
- plotting functions take the same single common dataset input
sample format of saved metrics netcdf:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Anna, the main method is now nice and clean. I think some attention should be placed on removing the use of global constants in the metrics.py
file. I think my suggested refactors will make this code both more reusable and robust to future changes in variables names and conventions.
@@ -126,3 +130,46 @@ def net_heating_from_dataset(ds: xr.Dataset, suffix: str = None) -> xr.DataArray | |||
ds["PRATEsfc" + suffix], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard code here. I expect this name will change in future versions.
|
||
# create and save metrics dataset | ||
# metrics: r2 global values, r2 pressure level profiles, MSE at locations | ||
ds_metrics = create_metrics_dataset(ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This high level structure is very good.
Thanks for the review @nbren12 , ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for all the changes. I have some very minor comments below which you don't necessarily have to address if you don't want to.
ds_test = ds.sel(dataset=DATASET_NAME_FV3_TARGET) | ||
ds_hires = ds.sel(dataset=DATASET_NAME_SHIELD_HIRES) | ||
|
||
ds_metrics = create_metrics_dataset(ds_pred, ds_test, ds_hires) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! this structure is pretty clear.
return ds_metrics | ||
|
||
|
||
def plot_metrics(ds_metrics, output_dir, dpi_figures): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider moving these plotting routines to another module to separate the plotting and computation code even more.
Refactor for offline ML diagnostics workflow.