Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify the interpretation of the message_type_group_map values to support the use of regular expressions. #1974

Closed
9 of 21 tasks
JohnHalleyGotway opened this issue Nov 17, 2021 · 5 comments · Fixed by #1999
Closed
9 of 21 tasks
Assignees
Labels
MET: PreProcessing Tools (Point) reporting: DTC NCAR Base NCAR Base DTC Project requestor: Community General Community required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone type: enhancement Improve something that it is currently doing
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Nov 17, 2021

Describe the Enhancement

This issue arose via METplus Discussions dtcenter/METplus#1232. While the user was able to run madis2nc to compute time summaries, he was NOT able to get Point-Stat to read them to verify forecasts of daily temperature min/max.

I was able to replicate the problem using the sample data he provided in this comment. Close inspection reveals that madis2nc is writing the output level values as bad data. Next I inspected the output from the nightly build and found the same to be true there.

# on kiowa
ncdump -v obs_lvl NB20211116/MET-develop/test_output/madis2nc/metar_20120409_time_summary.nc
 obs_lvl = _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,

In general, Point/Ensemble-Stat have no way of processing observations with a bad level value.

However non-time-summary output from madis2nc does work in Point/Ensemble-Stat because of special handling for "surface" message types. The non-time-summary madis2nc output for METAR inputs has message_type = ADPSFC. However the time-summary output sets has message_type = ADPSFC_MIN_030000 (for example). Since that string is NOT included in the surface entry of the message_type_group_map, Point/Ensemble-Stat cannot process those observations.

message_type_group_map = [
   { key = "SURFACE"; val = "ADPSFC,SFCSHP,MSONET,ADPSFC_MIN_030000,ADPSFC_MAX_030000"; },

This task is to modify the processing of each entry in the comma-separated "val" string. Interpret each entry as a regular expression instead of just doing string matching. Care must be give to differentiate between commas inside of RE's versus those that separate the list items.

Once that works, consider updating the message_type_group_map settings in default config file to match any message_type that begins with the specified string.

Time Estimate

1 day?

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2702691

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required: @hsoh-u
  • Select scientist(s) or no scientist required: none required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Organization level Project for support of the current coordinated release
  • Select Repository level Project for development toward the next official release or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next bugfix version

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added type: bug Fix something that is not working priority: high requestor: Community General Community alert: NEED ACCOUNT KEY Need to assign an account key to this issue required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone MET: PreProcessing Tools (Point) labels Nov 17, 2021
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.1.0 milestone Nov 17, 2021
@hsoh-u
Copy link
Collaborator

hsoh-u commented Nov 18, 2021

The problem is that the pressure or height for the obs data are missing from the input data. Please try it without time summary. Most obs_lvl and obs_hgt are missing value.

@JohnHalleyGotway
Copy link
Collaborator Author

@hsoh-u thanks for pointing that out. Yes, I see in the METAR output from the madis2nc unit tests, that obs_lvl = obs_hgt = NA. I tested using these outputs as input to Point-Stat, using both the raw output (metar_2012040912_F000.nc) and the time summary output (metar_20120409_time_summary.nc).

The first run does produce matched pairs because when verifying against ADPSFC message types, the code DOES NOT actually check the observation level value.

The second run does NOT produce matched pairs because the message type is NOT "ADPSFC"... it's "ADPSFC_MIN_030000" or "ADPSFC_MAX_030000". So the question is, how should we handle this situation?

Instead of checking for message_type = ADPSFC, should we check that it BEGINS with "ADPSFC" instead?

@JohnHalleyGotway
Copy link
Collaborator Author

@hsoh-u please take a look at this comment. Perhaps I should rewrite this issue to support regular expressions when processing the message_type_group_map entries. What do you think?

@hsoh-u
Copy link
Collaborator

hsoh-u commented Nov 18, 2021

Yes, please rewrite issue. Is the regular expression (ADPSFC to ADPSFC*) applied to time summary only or any cases?

@JohnHalleyGotway JohnHalleyGotway added type: enhancement Improve something that it is currently doing and removed type: bug Fix something that is not working labels Nov 19, 2021
@JohnHalleyGotway JohnHalleyGotway changed the title Madis2NC time summary output writes bad data for the output level value. Modify the interpretation of the message_type_group_map values to support the use of regular expressions. Nov 19, 2021
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Dec 14, 2021

@hsoh-u I was thinking of the "-pcprx" command line option for pcp_combine. The default regular expression to match everything is defined on this line. The "-pcprx" command line option can override that default. But on this line, we check whether or not each string matches that specified regular expression.

You could consider defining this function:

   StringArray::has_reg_exp(const std::string, bool forward=true) const;

It'd be very similar to the existing StringArray::has(...) function. However rather than just checking to see if the input string occurs in the StringArray elements, check to see if it matches any element when processed as a regular expression by calling the "check_reg_exp()" function.

However my concern is speed. We may find the check_reg_exp() function to be slow... I'm not sure. But hopefully this would provide a robust, general purpose solution.

@TaraJensen TaraJensen added reporting: DTC NCAR Base NCAR Base DTC Project and removed alert: NEED ACCOUNT KEY Need to assign an account key to this issue labels Dec 16, 2021
@hsoh-u hsoh-u linked a pull request Jan 6, 2022 that will close this issue
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: PreProcessing Tools (Point) reporting: DTC NCAR Base NCAR Base DTC Project requestor: Community General Community required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone type: enhancement Improve something that it is currently doing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants