-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to gcm-driven single column calibrations #3336
Conversation
d686f67
to
eb05766
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a look, and LGTM!
- Could you add into the PR message about the additional normalization / data processing added for LES
- Could you check the minibatching? The setting of
FixedSizeMinibatcher[1:k]
will only (i think) create one minibatch. i.e. this is the same as calibrating over only cases1:k
- was this the intent? or did you want something likecollect(1:k),collect(k+1:2k),...
If so this needs to be changed.
Otherwise my comments are really just small questions.
@@ -0,0 +1,49 @@ | |||
[entr_param_vec] | |||
prior = "VectorOfParameterized([Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.25, 0.15), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.6, 0.3)])" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this input format is sustainable? is there something you would have preferred to put in as input in an ideal world. We could add something into the EKP toml parsing if this is the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can get unwieldy with i.e. hundreds of NN parameters. As is, I believe it allows for "repeat" logic to define the prior vector, which I could use to make it more concise.
Otherwise I have a use case where I need to load pretrained NN weights as prior means, so I will probably need to load that from a file and set a constant spread. This may be a niche use case though.
… plotting scripts, add rmse metrics, and parallelize cases over cpu cores
eb05766
to
71ce077
Compare
Improve gcm-driven single column calibrations:
A summary of the
The observation vector for a single configuration is formed by concatenating profiles across calibration variables, where each geophysical variable is normalized to have approximately unit variance and zero mean. These variable-by-variable normalization factors are precomputed (
norm_factors_dict
) and applied to all observations. Following this operation, the spatiotemporal calibration window is applied and temporal means are computed to form the observation vectory
. Because variables are normalized to have 0 mean and unit variance, a constant diagonal noise matrix is used (configurable asconst_noise
).Observation Map
y_t_start_sec
,y_t_end_sec
] fory
and [g_t_start_sec
,g_t_end_sec
] forG
.z_cal_grid
.norm_factors_by_var
. Optionally, take log of variables usinglog_vars
before normalization.y
,G
.Prognostic EDMF results after 11 iterations with default calibration configurations (defined in
experiment_config.yml
)cfSite 23
cfSite 17
RMSE Plot