Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ascii file list parsing logic. #1484

Closed
19 tasks done
JohnHalleyGotway opened this issue Sep 8, 2020 · 4 comments · Fixed by #1486, #1487 or #1501
Closed
19 tasks done

Fix ascii file list parsing logic. #1484

JohnHalleyGotway opened this issue Sep 8, 2020 · 4 comments · Fixed by #1486, #1487 or #1501
Assignees
Labels
requestor: NCAR National Center for Atmospheric Research type: bug Fix something that is not working

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Sep 8, 2020

Describe the Problem

Tina Kalb ran into unexpected behavior when running pcp_combine in the following way:

pcp_combine -add bad.grib bad.nc -field 'name="Z"; level="P500";'

DEBUG 2: Since the "-field" command line option was used, parsing the command line arguments as a list of files.
DEBUG 2: Performing derivation command (sum) for 17032 files.
DEBUG 1: Reading data (name="Z"; level="P500";) from input file: GRIB
ERROR  : 
ERROR  : grd_file_type() -> file does not exist "GRIB"
ERROR  : 

bad.grib.txt

The input "bad.grib" file is processed by the parse_ascii_file_list() function and one of the first 10 words is "/" which is counted as a valid path. So it's entire contents is treated as a file list and pcp_combine attempts to process the first word ("GRIB") as the path to a file.

Expected Behavior

We should have better logic to distinguish between file lists and data files. An input GRIB file should not be processed as being a file list.

Environment

Describe your runtime environment:

  1. Repeatable on multiple environments. Originally found on eyewall but replicated on Mac.

To Reproduce

Describe the steps to reproduce the behavior:

  1. Save the attached "bad.grib.txt" file as "bad.grib".
  2. Use met-9.1/bin/pcp_combine to run:
pcp_combine -add bad.grib bad.nc -field 'name="Z"; level="P500";'
  1. Observe runtime error:
ERROR  : grd_file_type() -> file does not exist "GRIB"

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required: John Halley Gotway
  • Select scientist(s) or no scientist required: None required but Tina Kalb can review the PR.

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Review projects and select relevant Repository and Organization ones
  • Select milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

Bugfix Checklist

See the METplus Workflow for details.

  • Complete the issue definition above.
  • Fork this repository or create a branch of master_<Version>.
    Branch name: bugfix_<Issue Number>_master_<Version>_<Description>
  • Fix the bug and test your changes.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into master_<Version> and link the pull request to this issue.
    Pull request: bugfix <Issue Number> master_<Version> <Description>
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Complete the steps above to fix the bug on the develop branch and link the pull request to this issue.
    Branch name: bugfix_<Issue Number>_develop_<Description>
    Pull request: bugfix <Issue Number> develop <Description>
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added the type: bug Fix something that is not working label Sep 8, 2020
@JohnHalleyGotway JohnHalleyGotway added this to the MET 9.1.1 (bugfix) milestone Sep 8, 2020
@JohnHalleyGotway JohnHalleyGotway self-assigned this Sep 8, 2020
@JohnHalleyGotway JohnHalleyGotway added component: application code requestor: NCAR National Center for Atmospheric Research labels Sep 9, 2020
JohnHalleyGotway added a commit that referenced this issue Sep 9, 2020
…le(). The former returns true for '/' while the latter does not. This technically fixes the current issue, but not in a particularly robust way. Recommend that a more complete solution be used for develop.
JohnHalleyGotway added a commit that referenced this issue Sep 9, 2020
…op branch. This is to sync them up. However, I'd like to find a more robust solution for distinguishing file lists.
JohnHalleyGotway added a commit that referenced this issue Sep 9, 2020
… list parsing functionality into vx_data2d_factory/parse_file_list.cc. Had to update includes for tc_gen and gsidens2orank to handle this change. Next, I need to tweak the parse_file_list logic slightly.
@JohnHalleyGotway
Copy link
Collaborator Author

Fixed this for master_v9.1 with a very small one-line change.
Fixed this for develop using a larger but better set of changes to make sure that we don't try to interpret a GRIB1/2, NetCDF, or HDF file as being an ASCII file list.

Added a DEBUG(5) log message to confirm the parsing logic being applied.
DEBUG 5: parse_ascii_file_list() -> File "bad.grib" of type FileType_Gb1 is not an ASCII file list.

JohnHalleyGotway added a commit that referenced this issue Sep 9, 2020
…ibraries need to be added to the gsi_tools linker lists.
JohnHalleyGotway added a commit that referenced this issue Sep 9, 2020
This was linked to pull requests Sep 9, 2020
JohnHalleyGotway added a commit that referenced this issue Sep 9, 2020
…le(). The former returns true for '/' while the latter does not. This technically fixes the current issue, but not in a particularly robust way. Recommend that a more complete solution be used for develop. (#1487)
JohnHalleyGotway added a commit that referenced this issue Sep 9, 2020
* Per #1484, porting the same bugfix over from master_v9.1 to the develop branch. This is to sync them up. However, I'd like to find a more robust solution for distinguishing file lists.

* Per #1484, split vx_util/get_filenames.cc into 2 pieces and move file list parsing functionality into vx_data2d_factory/parse_file_list.cc. Had to update includes for tc_gen and gsidens2orank to handle this change. Next, I need to tweak the parse_file_list logic slightly.

* Per #1484, after moving the parse_file_list() functions, the NetCDF libraries need to be added to the gsi_tools linker lists.

* Per #1484, update parse_ascii_file_list() to avoid parsing gridded data files as being ASCII file lists.
@JohnHalleyGotway
Copy link
Collaborator Author

The nightly build revealed that the fixes for this issue were not sufficient.

We had a few seemingly "random" NB failures recently. But looking at the results over the last week, I actually think there's a deeper problem caused by the changes I made to main_v9.1 and develop for:
#1484

On 9/17 and 9/15 in main_v3.1 and on 9/12 in develop, the tc_gen unit test failed because it interprets the input ATCF files as being file lists consisting of 4,327,693 input files to be processed!

I don't know why some days it runs fine while other it does not? But suffice it to say that the file list processing logic is not yet sufficient! Need to make sure that ASCII ATCF files are never accidentally processed as being files lists.

@JohnHalleyGotway
Copy link
Collaborator Author

This issue also presented itself in the nightly build on 9/26, when the main_v9.1 unit tests failed on Series-Analysis with:

/d3/projects/MET/MET_regression/main_v9.1/NB20200926/MET-main_v9.1/met/share/met/../../bin/series_analysis \
      -fcst   /d3/projects/MET/MET_test_data/unit_test/model_data/grib1/arw-fer-gep1/arw-fer-gep1_2012040900_F012.grib \
      -obs    /d3/projects/MET/MET_test_data/unit_test/model_data/grib1/arw-tom-gep0/arw-tom-gep0_2012040900_F012.grib \
      -out    /d3/projects/MET/MET_regression/main_v9.1/NB20200926/MET-main_v9.1/test_output/series_analysis/series_analysis_UPPER_AIR_TMP_120000L_20120409_120000V.nc \
      -config /d3/projects/MET/MET_regression/main_v9.1/NB20200926/MET-main_v9.1/test/config/SeriesAnalysisConfig \
      -v 1
DEBUG 1: Default Config File: /d3/projects/MET/MET_regression/main_v9.1/NB20200926/MET-main_v9.1/met/share/met/config/SeriesAnalysisConfig_default
DEBUG 1: User Config File: /d3/projects/MET/MET_regression/main_v9.1/NB20200926/MET-main_v9.1/test/config/SeriesAnalysisConfig
ERROR  : 
ERROR  : check_grib() -> unable to read header from input file "/"
ERROR  : 

I suspect that Series-Analysis tried to process the input GRIB file as if it were a file list and tried to process "/" as an input file.

@JohnHalleyGotway
Copy link
Collaborator Author

PR #1487 merged a limited fix for this issue into main_v9.1 on 9/9/20.
PR #1486 merged a better fix for this issue into develop branch on 9/9/20.
Since 9/9 the NB on main_v9.1 has failed on 9/17, 19, 23, 24, and 26 with errors related to this.
The NB did fail on develop on 9/13 but I don't know the reason. It's possible that error was unrelated to this issue.
For these reasons, I will port the changes for PR #1486 into the main_v9.1 branch. Might as well make the code consistent, especially since it's very likely that'll resolve these sporadic failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requestor: NCAR National Center for Atmospheric Research type: bug Fix something that is not working
Projects
None yet
1 participant