-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Series-Analysis to compute the BRIERCL statistic from the PSTD line type #2003
Comments
@j-opatz regarding Series-Analysis, I took a look starting at this line of code. And I see that "BRIERCL" is NOT included as one of the output stat types. However on this line of code, I see that Series-Analysis is storing the climo information, assuming it has been provided. So adding it to the output to Series-Analysis should be relatively straight-forward... assuming there isn't some logical issue that arises in the computation of that statistic over the time series of data. But I don't anticipate that. I don't think it can be added to the NetCDF matched pairs output from Grid-Stat. BRIERCL is just a statistic derived from an Nx2 contingency table, just like the BRIER score is. So it's computed by aggregating over "something". In Grid-Stat, it's a spatial aggregation over multiple grid points. In Series-Analysis its (typically) a temporal aggregation, separately for each grid point. That being said... I suppose it's possible that there's some climo-related field that already is (or could be) computed during the processing logic... and that could be added to the NetCDF output. But I'd need you to clarify exactly what that is. |
Thanks for looking into this @JohnHalleyGotway, and it seems like you've grasped the situation perfectly. An output from series-analysis would be sufficient, and I understand that a similar output from grid-stat isn't doable, given the aggregation need. As this is using a gen-ens-prod --> series-analysis route, it's a good outcome that BRIERCL be accessible via series-analysis. |
…imo mean and standard deviation fields found. Also add the following to the list of PSTD stats: BRIERCL, BRIERCL_NCL, BRIERCL_NCU, and BSS_SMPL.
…ml to test running Series-Analysis with probability data and climo.
…at we need it for the derivation of climo probs.
… output. But I have 2 concerns... doing a deep copy of cdf_info for each grid point seems like a lot of wasted space. Consider changing PairBase::cdf_info into an unallocated pointer instead. Also deriving the probability by sampling some number of times from the climo distribution seems unnecessarily compuationally expensive. We do this to mimic existing NOAA/EMC logic. But it sure does seem like a computing the inverse of the CDF would be much simpler.
…h a PairBase::climo_cdf_ptr pointer to one. This is to needed avoid creating separate ClimoCDFInfo objects for each grid point in Series-Analysis... since we have a PairDataPoint object for each. Using pointers should consume much less memory.
…Analysis application code to hande the ClimoCDFInfo pointer.
… config files to indicate the logic I'm adding to the tool. If block_size <= 0, automatically set it the full dimension of the verification grid.
… config files, including the direct_prob boolean option. Still need to actually add the code for the latter.
…nalysis, need to store the aggregation object in the map BEFORE storing the pointer to the CDF thresh array. The opposite doesn't work and caused stat_analysis in unit_climatology_1.0deg to fail.
…'t allocate PairDataPoint objects until the grid is actually defined.
…rDataPoint objects. ci-run-unit
… it to 0. That way it'll automatically resize to the dimension of the grid. Doing this make it run about 20 seconds faster, which offsets the additional run of Series-Analysis.
…from Series-Analysis config files since it has no impact on the output. That only applies to Grid-Stat and Point-Stat. But add entries for climo_cdf.direct_prob to all Point-Stat and Grid-Stat config files since those tools do derive climo probabilities. Do not add it to Ensemble-Stat config files since Ensemble-Stat does not derive climo probs.
…Tweaking a PB2NC log message to clarify that the reported time range is the input observation timestamps.
Describe the New Feature
In working with CPC on NMME data and Probability of Exceedance (POE) calculations, there have been numerous times where a score (i.e. Brier) will line up correctly, but the skill score (i.e. BSS) will not. In these situations, having access to a 2D field for the reference value, BRIERCL, would be incredibly useful for diagnosing where the discrepancies are coming from between CPC and MET output.
This was discussed with Tara, and both parties agreed that while it would be a useful diagnosis tool, it may not necessarily belong in the full release (i.e., this may be a feature in beta only). More input will be needed from the C++ engineering members.
Acceptance Testing
NMME data from CPC can be used. Located on Kiowa.
METplus GenEnsProd config file: /d1/projects//CPC_data/scripts/GenEnsProd-SA_NMME.conf
Python scripts to ingest files: /d1/projects/CPC_data/scripts/forecast_read-in_CFSv2.py and /d1/projects//CPC_data/scripts/preprocessFun.py
Ensemble input file: /d1/projects/CPC_data/input/NMME/new_data/raw_fcst/.fcst.nc
GenEnsProd output files: /d1/projects/CPC_data/output/NMME_out/GenEnsProd-SA/.nc
METplus SeriesAnalysis config file: /d1/projects/CPC_data/scripts/SA_testing_for_CFSv2_GeorgeFix.conf
Climo mean field input file: /d1/projects/CPC_data/input/NMME/new_data/ghcn_cams.1x1.1982-2010.mon.clim.nc
Climo Stddev field input file: /d1/projects/CPC_data/input/NMME/new_data/ghcn_cams.1x1.1982-2010.mon.stddev.nc
Obs field input file: /d1/projects/CPC_data/input/NMME/new_data/ghcn_cams.1x1.1982-2020.mon.nc
Where to request this output is open for suggestion: because the workflow is GenEnsProd > SeriesAnalysis, it would be ideal if the PSTD line type could provide support for a BRIERCL column request in series-analysis output. I can also see a path forward in grid-stat, using an nc_pairs_flag option.
Time Estimate
Will require engineer input
Issues should represent approximately 1 to 3 days of work.
Sub-Issues
Consider breaking the new feature down into sub-issues.
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
2702691
Define the Metadata
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
New Feature Checklist
See the METplus Workflow for details.
Branch name:
feature_<Issue Number>_<Description>
Pull request:
feature <Issue Number> <Description>
Select: Reviewer(s) and Linked issues
Select: Repository level development cycle Project for the next official release
Select: Milestone as the next official version
The text was updated successfully, but these errors were encountered: