Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to gcm-driven single column calibrations #3336

Merged
merged 1 commit into from
Oct 5, 2024

Conversation

costachris
Copy link
Member

@costachris costachris commented Sep 24, 2024

Improve gcm-driven single column calibrations:

  1. modify priors
  2. add entrainment pi group calibration
  3. improve plotting scripts
  4. add rmse metrics and plotting
  5. add eki metrics and plotting
  6. parallelize CA single column runs (cases) over cpu cores
  7. easier specification of normalization for LES/EDMF profiles (mean, std to transform vars -> N(0, 1)); option for taking log

A summary of the

The observation vector for a single configuration is formed by concatenating profiles across calibration variables, where each geophysical variable is normalized to have approximately unit variance and zero mean. These variable-by-variable normalization factors are precomputed (norm_factors_dict) and applied to all observations. Following this operation, the spatiotemporal calibration window is applied and temporal means are computed to form the observation vector y. Because variables are normalized to have 0 mean and unit variance, a constant diagonal noise matrix is used (configurable as const_noise).

Observation Map

  1. Time-mean: Time-mean of profiles taken between [y_t_start_sec, y_t_end_sec] for y and [g_t_start_sec, g_t_end_sec] for G.
  2. Interpolation: Case-specific (i.e., "shallow", "deep") interpolation to the calibration grid, defined with stretch-grid parameters in z_cal_grid.
  3. Normalization: Variable-specific normalization using the mean and standard deviation defined in norm_factors_by_var. Optionally, take log of variables using log_vars before normalization.
  4. Concatenation: Stack across cases in a batch, forming y, G.

Prognostic EDMF results after 11 iterations with default calibration configurations (defined in experiment_config.yml)

cfSite 23

ensemble_plot_clw_i_11_cu

cfSite 17

ensemble_plot_clw_i_11

RMSE Plot

rmse_vs_iteration.pdf

@costachris costachris force-pushed the cc/gcm_cal_updates branch 2 times, most recently from d686f67 to eb05766 Compare September 25, 2024 03:34
@costachris costachris requested a review from odunbar September 26, 2024 19:16
Copy link

@odunbar odunbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a look, and LGTM!

  • Could you add into the PR message about the additional normalization / data processing added for LES
  • Could you check the minibatching? The setting of FixedSizeMinibatcher[1:k] will only (i think) create one minibatch. i.e. this is the same as calibrating over only cases 1:k - was this the intent? or did you want something like collect(1:k),collect(k+1:2k),... If so this needs to be changed.

Otherwise my comments are really just small questions.

@@ -0,0 +1,49 @@
[entr_param_vec]
prior = "VectorOfParameterized([Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.25, 0.15), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.0, 1.0), Normal(0.6, 0.3)])"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this input format is sustainable? is there something you would have preferred to put in as input in an ideal world. We could add something into the EKP toml parsing if this is the case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can get unwieldy with i.e. hundreds of NN parameters. As is, I believe it allows for "repeat" logic to define the prior vector, which I could use to make it more concise.

Otherwise I have a use case where I need to load pretrained NN weights as prior means, so I will probably need to load that from a file and set a constant spread. This may be a niche use case though.

@costachris costachris marked this pull request as ready for review October 4, 2024 22:12
… plotting scripts, add rmse metrics, and parallelize cases over cpu cores
@costachris costachris enabled auto-merge October 5, 2024 00:54
@costachris costachris added this pull request to the merge queue Oct 5, 2024
Merged via the queue into main with commit 9ee1c74 Oct 5, 2024
16 checks passed
@costachris costachris deleted the cc/gcm_cal_updates branch October 5, 2024 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants