Updates to gcm-driven single column calibrations #3336

costachris · 2024-09-24T18:30:42Z

Improve gcm-driven single column calibrations:

modify priors
add entrainment pi group calibration
improve plotting scripts
add rmse metrics and plotting
add eki metrics and plotting
parallelize CA single column runs (cases) over cpu cores
easier specification of normalization for LES/EDMF profiles (mean, std to transform vars -> N(0, 1)); option for taking log

A summary of the

The observation vector for a single configuration is formed by concatenating profiles across calibration variables, where each geophysical variable is normalized to have approximately unit variance and zero mean. These variable-by-variable normalization factors are precomputed (norm_factors_dict) and applied to all observations. Following this operation, the spatiotemporal calibration window is applied and temporal means are computed to form the observation vector y. Because variables are normalized to have 0 mean and unit variance, a constant diagonal noise matrix is used (configurable as const_noise).

Observation Map

Time-mean: Time-mean of profiles taken between [y_t_start_sec, y_t_end_sec] for y and [g_t_start_sec, g_t_end_sec] for G.
Interpolation: Case-specific (i.e., "shallow", "deep") interpolation to the calibration grid, defined with stretch-grid parameters in z_cal_grid.
Normalization: Variable-specific normalization using the mean and standard deviation defined in norm_factors_by_var. Optionally, take log of variables using log_vars before normalization.
Concatenation: Stack across cases in a batch, forming y, G.

Prognostic EDMF results after 11 iterations with default calibration configurations (defined in `experiment_config.yml`)

cfSite 23

cfSite 17

RMSE Plot

odunbar

I've taken a look, and LGTM!

Could you add into the PR message about the additional normalization / data processing added for LES
Could you check the minibatching? The setting of FixedSizeMinibatcher[1:k] will only (i think) create one minibatch. i.e. this is the same as calibrating over only cases 1:k - was this the intent? or did you want something like collect(1:k),collect(k+1:2k),... If so this needs to be changed.

Otherwise my comments are really just small questions.

odunbar · 2024-09-27T18:00:00Z

calibration/experiments/gcm_driven_scm/prior_prognostic_pi_entr.toml

@@ -0,0 +1,49 @@
+[entr_param_vec]
+prior = "VectorOfParameterized([Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.25, 0.15), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.6, 0.3)])"


Do you think this input format is sustainable? is there something you would have preferred to put in as input in an ideal world. We could add something into the EKP toml parsing if this is the case

I think it can get unwieldy with i.e. hundreds of NN parameters. As is, I believe it allows for "repeat" logic to define the prior vector, which I could use to make it more concise.

Otherwise I have a use case where I need to load pretrained NN weights as prior means, so I will probably need to load that from a file and set a constant spread. This may be a niche use case though.

calibration/experiments/gcm_driven_scm/run_calibration.jl

… plotting scripts, add rmse metrics, and parallelize cases over cpu cores

costachris force-pushed the cc/gcm_cal_updates branch 2 times, most recently from d686f67 to eb05766 Compare September 25, 2024 03:34

costachris requested a review from odunbar September 26, 2024 19:16

odunbar approved these changes Sep 27, 2024

View reviewed changes

costachris marked this pull request as ready for review October 4, 2024 22:12

Improve gcm driven single column calibrations: modify priors, improve…

71ce077

… plotting scripts, add rmse metrics, and parallelize cases over cpu cores

costachris force-pushed the cc/gcm_cal_updates branch from eb05766 to 71ce077 Compare October 5, 2024 00:51

costachris enabled auto-merge October 5, 2024 00:54

costachris added this pull request to the merge queue Oct 5, 2024

Merged via the queue into main with commit 9ee1c74 Oct 5, 2024
16 checks passed

costachris deleted the cc/gcm_cal_updates branch October 5, 2024 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to gcm-driven single column calibrations #3336

Updates to gcm-driven single column calibrations #3336

costachris commented Sep 24, 2024 •

edited

Loading

odunbar left a comment •

edited

Loading

odunbar Sep 27, 2024

costachris Oct 4, 2024

		@@ -0,0 +1,49 @@
		[entr_param_vec]
		prior = "VectorOfParameterized([Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.25, 0.15), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.6, 0.3)])"

Updates to gcm-driven single column calibrations #3336

Updates to gcm-driven single column calibrations #3336

Conversation

costachris commented Sep 24, 2024 • edited Loading

Observation Map

Prognostic EDMF results after 11 iterations with default calibration configurations (defined in experiment_config.yml)

cfSite 23

cfSite 17

RMSE Plot

odunbar left a comment • edited Loading

Choose a reason for hiding this comment

odunbar Sep 27, 2024

Choose a reason for hiding this comment

costachris Oct 4, 2024

Choose a reason for hiding this comment

costachris commented Sep 24, 2024 •

edited

Loading

Prognostic EDMF results after 11 iterations with default calibration configurations (defined in `experiment_config.yml`)

odunbar left a comment •

edited

Loading