Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Ensemble-Stat to compute probabilistic statistics for user-defined or climatology-based thresholds. #1259

Closed
20 of 21 tasks
JohnHalleyGotway opened this issue Feb 18, 2020 · 17 comments · Fixed by #2080 or #2082
Closed
20 of 21 tasks
Assignees
Labels
MET: Ensemble Verification MET: Probability Verification reporting: DTC NOAA R2O NOAA Research to Operations DTC Project requestor: NOAA/EMC NOAA Environmental Modeling Center required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone type: new feature Make it do something new
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Feb 18, 2020

Describe the Enhancement

Enhance Ensemble-Stat to compute probabilistic statistics on the fly for each threshold. This is needed by the NOAA/EMC global group for ensemble verification. Update the Ensemble-Stat configuration file to write output for PCT, PSTD, PRC, PRJ, and ECON line types. When climo_cdf thresholds are used, also write the mean of the stats across those bins (just like we do in Point-Stat and Grid-Stat).

Note that a related issue #1583 already adds probabilistic output to Ensemble-Stat for MET version 10.1.0. As of MET version 10.0.0, the ensemble relative frequency probabilities are computed for each categorical threshold (cat_thresh) entry listed in the ensemble (ens) dictionary. And those gridded relative frequencies (i.e. probabilities) are written to the NetCDF output file.

For this issue, we'll...

  • Leave the ens logic in place as is for MET version 10.1.0. But note that the "ens" dictionary will be completely removed in MET version 11.0.0 by Remove ensemble post-processing from the Ensemble-Stat tool. #1908.
  • Add support for "cat_thresh" being defined in the "fcst" dictionary. For each "cat_thresh", ensemble_stat will compute the relative frequencies on the fly, write them to the gridded NetCDF output file, and also verify them against the provided gridded or point observations. When writing the NetCDF output, make sure to NOT write the same variable name twice in case the user has requested relative frequencies for the same field in both the "ens" and "fcst" dictionaries.

This change makes Ensemble-Stat slightly more confusing in MET version 10.1.0, but it'll be simplified when we remove the "ens" dictionary entirely in MET version 11.0.0.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.
No sub-issues needed.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2793541

Define the Metadata

Assignee

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added component: application code requestor: NOAA/EMC NOAA Environmental Modeling Center type: new feature Make it do something new priority: blocker Blocker labels Feb 18, 2020
@JohnHalleyGotway JohnHalleyGotway added this to the MET 9.1 milestone Feb 18, 2020
@JohnHalleyGotway JohnHalleyGotway self-assigned this Feb 18, 2020
@ghost
Copy link

ghost commented Feb 18, 2020

BSS and ROC are currently required by EMC global ensemble group. They are also interested in Reliability diagram.

@JohnHalleyGotway JohnHalleyGotway modified the milestones: MET 9.1, MET 10.0 Jun 24, 2020
@JohnHalleyGotway JohnHalleyGotway added the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Sep 10, 2020
@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Nov 5, 2020
@JohnHalleyGotway
Copy link
Collaborator Author

On 11/19/2020, we had a telecon with NOAA/EMC staff to discuss ensemble-related verification metrics. The key point here is that the ensemble probabilities need to be defined relative to climatology bins rather than absolute thresholds.
The following statistics need to be supported like this:

  • RPSS
  • CRPSS
  • Brier Skill Score
  • Reliability diagram
  • ROC curve
  • Economic value curve

@j-opatz
Copy link
Contributor

j-opatz commented Feb 2, 2021

Email chain of 1/29/21 resulted in confirmation of scores/output requested by EMC. These are given in priority order, highest first:

  • CRPSS
  • RPSS
  • BSS
  • ROC
  • Reliability
  • Brier Score and Resolution (potentially done if BSS is already created)
  • Economic Value

JohnHalleyGotway added a commit that referenced this issue Mar 1, 2022
JohnHalleyGotway added a commit that referenced this issue Mar 1, 2022
…ether the total column should be summed or averaged. Previously, they were always summed since the climo bins were used to SUBSET the matched pairs. In Ensemble-Stat, the full set of pairs can now thresholded multiple times based on the climo bins. As such, the TOTAL value for each input should remain constant. Rather then summing those totals, they should now be averaged (but this is the average of a constant value).
@JohnHalleyGotway JohnHalleyGotway linked a pull request Mar 1, 2022 that will close this issue
15 tasks
@JohnHalleyGotway JohnHalleyGotway changed the title Enhance Ensemble-Stat to compute probabilistic statistics on the fly for each threshold and report the mean of stats across those thresholds. Enhance Ensemble-Stat to compute probabilistic statistics for user-defined or climatology-based thresholds. Mar 2, 2022
@JohnHalleyGotway JohnHalleyGotway linked a pull request Mar 2, 2022 that will close this issue
@BinbinZhou-NOAA
Copy link

BinbinZhou-NOAA commented Mar 2, 2022 via email

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Mar 2, 2022

@BinbinZhou-NOAA, yes, this should finally be doing what you were looking for a few years ago.

You can run Ensemble-Stat, providing a climo mean and standard deviation in the config file, and request probabilistic output. If you provide explicit thresholds using the prob_cat_thresh option, then those thresholds will be applied. If you leave prob_cat_thresh empty, but do request climo bins, then it'll do what you want.

Both of these examples are now tested in the unit tests, and I'll list some sample PSTD output files below to demonstrate:

  1. For APCP/24 with prob_cat_thresh = [ >0, >0.1, >0.25, >0.5, >1.0, >1.5, >2.0 ]; we get this output file:
    ensemble_stat_20100101_120000V_pstd.txt
    Note in that output file, the FCST_VAR columns has strings like this: PROB(APCP_24>1.0)

  2. For TMP/Z2 with prob_cat_thresh = []; and climo_cdf.cdf_bins = 10; we get this output file:
    ensemble_stat_NCEP_1.0DEG_20120410_120000V_pstd.txt
    Note in that output file, the VX_MASK column includes the "BIN#" or "BIN_MEAN" string. Also, note that FCST_VAR and FCST_LEV just state "TMP" and "Z2". So this is the result of thresholding relative to the climo distribution 9 times (>=CDP10, >=CDP20, ..., >=CDP90).

And its the "BIN_MEAN" output that you're really looking for. Once @jprestop has installed met-10.1.0-beta6 on WCOSS (released today), please test out this functionality to confirm that it makes sense.

I'm just realizing that for the BIN_MEAN line, the OBS_THRESH column is currently set to ">=CDP10.00000". I'm realizing that should probably be set to something else. Perhaps just set it to "NA"... or perhaps set it in some way to indicate the number of bins used? What do you think?

@BinbinZhou-NOAA
Copy link

BinbinZhou-NOAA commented Mar 2, 2022 via email

JohnHalleyGotway added a commit that referenced this issue Mar 3, 2022
…ut for climo distribution percentile thresholds. Instead of writing OBS_THRESH using the value from the first bin, explicitly write it as '==CDP####' where #### is 100/climo_cdf.n_bin. That better indicates the climo bins used to compute the mean statistics.
@BinbinZhou-NOAA
Copy link

BinbinZhou-NOAA commented Mar 3, 2022 via email

@JohnHalleyGotway JohnHalleyGotway removed a link to a pull request Mar 3, 2022
15 tasks
@JohnHalleyGotway JohnHalleyGotway linked a pull request Mar 3, 2022 that will close this issue
15 tasks
@JohnHalleyGotway
Copy link
Collaborator Author

@BinbinZhou-NOAA I wanted to mention that @j-opatz and I discussed this more internally and decided that we'd prefer to report the OBS_THRESH for the PSTD BIN_MEAN line as "==CDP10" instead of ">=CDP10". I realize it's a very small difference, but I'd rather make it now while I remember all the details than have to remember how to explain those details later!

I committed this change today via #2082. And that small change will be included in the official release (not beta6). Figured you wouldn't have any objections.

As for testing, it seems to me you could do the following:

  1. Presumably, you've already setup Ensemble-Stat previously to generate probability forecasts for >=CDP10, >=CDP20, >=CDP30, ..., >=CDP90.
  2. Then you ran those probability forecasts through Point-Stat to generate PSTD output.
  3. You loaded that PSTD output into METviewer and then plotted the mean across those thresholds.

You could run that same data directly through Ensemble-Stat now and look at the PSTD line with BIN_MEAN in the VX_MASK column.

Those BIN_MEAN PSTD stats should match the mean of the stats as computed by METviewer, at least for a single case. And this assumes you're using a consistent masking region and interpolation method in Point-Stat and Ensemble-Stat.

It seems to me that that'd be a good way to validate the new logic in Ensemble-Stat. If you have the time to run that comparison by next Monday or Tuesday, that'd be really helpful for us.

@BinbinZhou-NOAA
Copy link

BinbinZhou-NOAA commented Mar 3, 2022 via email

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Mar 3, 2022

Binbin,

Yes, you'd configure Ensemble-Stat (or Gen-Ens-Prod) to run with:

ensemble_flag = {
...
   frequency = TRUE;
...
};

And in the ensemble dictionary, you'd reference the CDP thresholds, something like this:

ens = {
   cat_thresh = [ >=CDP10, >=CDP20, >=CDP30, >=CDP40, >=CDP50, >=CDP60, >=CD70, >=CDP80, >=CDP90 ]; 
   field = [ 
     { name = "TMP"; level = "P500"; } // or whatever field you're testing
   ];
}

In the Point-Stat config file, you'll verify the NetCDF output from Ensemble-Stat. It'll look something like this:

fcst = {
   level = "(*,*)"; // same level setting for all fields
   prob = TRUE; // same prob flag for all fields
   cat_thresh = ==0.10; // same PCT thresholds for all fields

   field = [
      { name = "TMP_P500_ENS_FREQ_geCDP10"; },
      { name = "TMP_P500_ENS_FREQ_geCDP20"; },
      { name = "TMP_P500_ENS_FREQ_geCDP30"; },
      { name = "TMP_P500_ENS_FREQ_geCDP40"; },
      { name = "TMP_P500_ENS_FREQ_geCDP50"; },
      { name = "TMP_P500_ENS_FREQ_geCDP60"; },
      { name = "TMP_P500_ENS_FREQ_geCDP70"; },
      { name = "TMP_P500_ENS_FREQ_geCDP80"; },
      { name = "TMP_P500_ENS_FREQ_geCDP90"; }
   ];
}

@BinbinZhou-NOAA
Copy link

BinbinZhou-NOAA commented Mar 4, 2022 via email

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Mar 4, 2022

@BinbinZhou-NOAA yes, that error message indicates that you're using a climo-based threshold without sufficient climatology data (mean and standard deviation) provided in the config file.

This could happen for one of two reasons:

  1. Your MET configuration file does not include the "climo_mean" and "climo_stdev" settings needed to derive this.
  2. You did provide that data in the config file, but there is some bug in the MET code for how/when to set/check it.

Once you confirmed that the climo_mean and climo_stdev data has been provided, please point me to where/what command you're running so I can debug further.

@BinbinZhou-NOAA
Copy link

BinbinZhou-NOAA commented Mar 4, 2022 via email

@JohnHalleyGotway
Copy link
Collaborator Author

Binbin, my apologies for sending your bad directions. You should be able to use the gen_ens_prod tool instead of the ensemble_stat to compute those climo-based probabilities.

Here's why...
The Ensemble-Stat config file contains settings for the "ens", "fcst", "obs", "climo_mean", and "climo_stdev" dictionaries. Note that the "ens" dictionary will be removed in MET version 11.0.0 since it is replaced by Gen-Ens-Prod. The climatology data corresponds to the "fcst" and "obs" dictionaries. It does NOT correspond to the "ens" dictionary. That's why using ">=CDP10" did not work inside the "ens" dictionary... Ensemble-Stat isn't reading any climo data for those fields.

However, the Gen-Ens-Prod config file contains settings for the "ens", "climo_mean", and "climo_stdev" dictionaries. And the climatology data read there really does correspond to the "ens" dictionaries.

In run_way2.sh, please try swapping out your call to Ensemble-Stat with a call to Gen-Ens-Prod instead.

I did some work on setting this up more. Please take a look in /u/John.H.Gotway/test_way2.

@BinbinZhou-NOAA
Copy link

BinbinZhou-NOAA commented Mar 7, 2022 via email

@BinbinZhou-NOAA
Copy link

The comparion between the 2ways shows that the results are verify closed.
Thanks!

Binbin

@JohnHalleyGotway
Copy link
Collaborator Author

Whew! That's excellent news and a relief. Thank you very much for testing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: Ensemble Verification MET: Probability Verification reporting: DTC NOAA R2O NOAA Research to Operations DTC Project requestor: NOAA/EMC NOAA Environmental Modeling Center required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone type: new feature Make it do something new
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants