Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EnsembleStat, no output reported when >1 ensemble members has no data (i.e. field filled with -9999) #1475

Closed
19 tasks
j-opatz opened this issue Aug 27, 2020 · 2 comments
Assignees
Labels
MET: Ensemble Verification priority: blocker Blocker reporting: DTC NOAA BASE NOAA Office of Atmospheric Research DTC Project requestor: NCAR National Center for Atmospheric Research type: bug Fix something that is not working

Comments

@j-opatz
Copy link
Contributor

j-opatz commented Aug 27, 2020

Describe the Problem

After running EnsembleStat across a seven member ensemble, utilizing python embedding and a METplus wrapper to extract the data from the user's netCDF file, the output files were empty. Realizing that three of the members did not have values for the selected hour of analysis, the built-in ens_thresh and vld_thresh variables were set to 0.1, low enough that this should have solved the issue, but the output was still empty. Only after removing those three ensemble members from the list to process did the tool produce output.

Expected Behavior

When lowering the ens_thresh and vld_thresh variable thresholds appropriately (~0), the ensemble members that did have field data present in them should have still been processed against the observation data set field.

Environment

Describe your runtime environment:
*1. Machine: Eyewall server
*2. version: METplus 3.1, MET 9.1

To Reproduce

Describe the steps to reproduce the behavior:
*1. Dataset is located in /d1/projects/nrl_aerosol/input: icap_2016081500_aod.nc (forecast file, 7 members) and AGGR_HOURLY_20160815T1200_1deg_global_archive.nc (observation file)
*2. Python processing files are located in /d1/projects/nrl_aerosol: forecast_embedded.py (for forecast file) and analysis_embedded.py (for observation file)
*3. Utilize standard wrapped EnsembleStat config files for MET and METplus wrapper, using the following conventions for program arguments: /d1/projects/nrl_aerosol/input/icap_{init?fmt=%Y%m%d%H}aod.nc:total_aod:{valid?fmt=%Y%m%d%H%M}:0 to the forecast_embedded.py file (this command is repeated 7 times, with the final digit increasing by 1 up to 6) and /d1/projects/nrl_aerosol/input/AGGR_HOURLY{valid?fmt=%Y%m%d}T{valid?fmt=%H%M}_1deg_global_archive.nc:aod_nrl_total:Mean to the analysis_embedded.py file
*4. Number of ensemble members is 7 (ENS_STAT_N_MEMBERS=7), set ens_thresh and vld_thresh to 0.1
*4. Error is the in the absence of any data in the output files

Relevant Deadlines

October 2020 would be helpful - blocking progress if needed

Funding Source

2700021

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Review projects and select relevant Repository and Organization ones
  • Select milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

Bugfix Checklist

See the METplus Workflow for details.

  • Complete the issue definition above.
  • Fork this repository or create a branch of master_<Version>.
    Branch name: bugfix_<Issue Number>_master_<Version>_<Description>
  • Fix the bug and test your changes.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into master_<Version> and link the pull request to this issue.
    Pull request: bugfix <Issue Number> master_<Version> <Description>
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Complete the steps above to fix the bug on the develop branch and link the pull request to this issue.
    Branch name: bugfix_<Issue Number>_develop_<Description>
    Pull request: bugfix <Issue Number> develop <Description>
  • Close this issue.
@j-opatz j-opatz added the type: bug Fix something that is not working label Aug 27, 2020
@j-opatz j-opatz added this to the MET Future Versions milestone Aug 27, 2020
@j-opatz j-opatz added component: ensemble vx priority: low Low Priority requestor: NCAR National Center for Atmospheric Research labels Aug 27, 2020
@JohnHalleyGotway
Copy link
Collaborator

Listed below is an excerpt from the user's guide chapter about Ensemble-Stat:
https://dtcenter.github.io/MET/Users_Guide/ensemble-stat.html

Note that the ens_thresh and vld_thresh options are defined in the "ens" dictionary and apply to the derivation of ensemble products, like mean and spread. They are NOT defined in the "fcst" or "obs" dictionaries. For each observation (whether point or gridded data), if any of the ensemble members contain missing data, that point is excluded from the verification. This can be seen here:

For gridded data: https://github.com/dtcenter/MET/blob/master_v9.1/met/src/tools/core/ensemble_stat/ensemble_stat.cc#L1956

For point data:

So this is not a bug. The code is doing what it was designed to do... exclude from the verification any points where one of the ensemble members contains bad data.

If we would like the code to something else in this case, recommend rewriting this issue to define exactly what it should do. Missing data for ensemble members causes real problems. For example, there's no obvious way to compute an observation rank when when some of the members contain missing data.

Excerpt from documentation:

The ens dictionary defines which ensemble fields should be processed.

When summarizing the ensemble, compute a ratio of the number of valid ensemble fields to the total number of ensemble members. If this ratio is less than the ens_thresh, then quit with an error. This threshold must be between 0 and 1. Setting this threshold to 1 will require that all ensemble members be present to be processed.

When summarizing the ensemble, for each grid point compute a ratio of the number of valid data values to the number of ensemble members. If that ratio is less than vld_thresh, write out bad data. This threshold must be between 0 and 1. Setting this threshold to 1 will require each grid point to contain valid data for all ensemble members.

@TaraJensen TaraJensen added priority: blocker Blocker alert: NEED MORE DEFINITION Not yet actionable, additional definition required and removed priority: low Low Priority labels Sep 15, 2020
@j-opatz j-opatz closed this as completed Sep 16, 2020
@JohnHalleyGotway
Copy link
Collaborator

After debugging this issue more, we created #1494 to address it.

@TaraJensen TaraJensen removed the alert: NEED MORE DEFINITION Not yet actionable, additional definition required label Sep 22, 2020
@TaraJensen TaraJensen added the reporting: DTC NOAA BASE NOAA Office of Atmospheric Research DTC Project label Dec 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: Ensemble Verification priority: blocker Blocker reporting: DTC NOAA BASE NOAA Office of Atmospheric Research DTC Project requestor: NCAR National Center for Atmospheric Research type: bug Fix something that is not working
Projects
None yet
Development

No branches or pull requests

3 participants