Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine tc_gen logic to compare forecast genesis events to all BEST track points. #1448

Closed
19 tasks
JohnHalleyGotway opened this issue Aug 7, 2020 · 18 comments · Fixed by #1633
Closed
19 tasks
Assignees
Labels
priority: blocker Blocker requestor: DTC/PAS DTC Physics Across Scales T&E type: enhancement Improve something that it is currently doing
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Aug 7, 2020

Describe the Enhancement

While testing tc_gen in the met-9.1_beta3 development release, Dan Halperin identified a short-coming in its logic. Currently tc_gen identifies genesis events in the forecast, BEST, and operational tracks. Each event is stored as a single location and time. The current matching logic logic compares only those points.

This task is to collaborate with Dan to refine this approach. Replace the current point-to-point comparison with a point-to-track comparison. Compare the forecast genesis event points to the entire set of BEST tracks. This will enable us to...

(1) Check for genesis forecasts from initializations AFTER the actual genesis event in the BEST track. These were previously counted as FALSE ALARMS but should instead be ignored.
(2) Implement NHC's operational logic for verifying Tropical Weather Outlooks.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

7790901 - TC-Gen key

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Review projects and select relevant Repository and Organization ones
  • Select milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added type: enhancement Improve something that it is currently doing component: application code labels Aug 7, 2020
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.0 milestone Aug 7, 2020
@JohnHalleyGotway JohnHalleyGotway added the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Sep 10, 2020
@TaraJensen TaraJensen added priority: blocker Blocker and removed alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle priority: high labels Sep 22, 2020
@JohnHalleyGotway
Copy link
Collaborator Author

On 20201204, the team had a telecon:
Kathryn, Dan A, Dan H, John HG, and Tara

In met-9.1, TC-Gen defines genesis event points from the ADECK and BDECK data sources. Then it compares those points to populate the contingency table. This task changes that logic. Instead of comparing point to point, we'll now compare forecast genesis event points to all BEST track points.

A PowerPoint from Dan A illustrating the requested refinements is attached. This task is to enhance TC-Gen to support 2 different methodologies for populating contingency tables. Add a configuration file option to control what logic is applied... one or both of them.

tc_gen_algorithm_details.pptx

Need to decide how to distinguish between them in the output.
FCST_VAR = OBS_VAR = GENESIS
Recommend choosing a different name for the second set of logic.

Kathryn: Make the "discarding" of genesis events based on initialization time be a configurable option with the default being "discard".

Also need to correct the use of the operational CARQ model in the tc-gen logic. The logic should be this...

  • for a forecast genesis event, look for a BEST track match.
  • if none is found, then search the CARQ for a match.
  • if a CARQ match is found, store the CARQ storm id.
  • lookup that storm id in the BEST tracks, and compare the forecast genesis to BEST track genesis to determine how to categorize that pair (hit, miss, or false alarm).
  • so the pair is ALWAYS between the forecast and BEST track, never with CARQ.

In this issue, we're refining how to "pair" the track data first, and then "classify" those pairs second.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Dec 17, 2020

Email chain from 12/17/20...

John HG's Question:

The new logic is...

  • identify forecast genesis points
  • read BEST tracks and 0-hour CARQ track points
  • filter the forecast genesis points based on config file entries
    ... WHAT NEXT? ...
    Should I also filter the BEST and CARQ data prior to determining matching?
    Or should I define matches using the unfiltered data?
    But if so, it seems like I should filter down the BEST tracks before calculating the number of misses right?

Dan H's answer:

My first thought is to match using the unfiltered BEST and CARQ data, then filter out the events based on the config file. If we filter the BEST and CARQ data before matching, then we may end up with unwarranted false alarms.

For example, let's say there's a GFS forecast genesis event that matches to BEST storm AL032020. If the config file options result in filtering out AL032020 and the BEST and CARQ filtering occurs first, then the GFS forecast would end up incorrectly being classified as a false alarm because no BEST or CARQ match was found.

If the BEST and CARQ filtering occurs after the matching, then the forecast would be discarded.

I think we should filter the BEST data before calculating the misses.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Dec 18, 2020

We no longer need operational genesis definition criteria in the TC-Gen config file. So DELETE...

oper_genesis = {
   technique   = "CARQ";
   category    = [ "TD", "TS" ];
   vmax_thresh = NA;
   mslp_thresh = NA;
}

But we do still need to know the name of the operational technique. So ADD...

oper_technique = "CARQ";

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Dec 18, 2020

Let's say we've found a matching BEST track. But when we search that BEST track, no points actually meet the genesis criteria because the user has setup the TC-Gen config file in a stringent way. Is that just a false alarm... a genesis forecast for which there is no qualifying BEST track genesis event? That would be the simplest way of handling it in the current logic.

@halperin-erau
Copy link

halperin-erau commented Dec 18, 2020 via email

@halperin-erau
Copy link

halperin-erau commented Dec 28, 2020 via email

JohnHalleyGotway added a commit that referenced this issue Dec 28, 2020
JohnHalleyGotway added a commit that referenced this issue Dec 29, 2020
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Dec 29, 2020

Well I have all the code changes I think are needed for this updated logic, but I'm getting pretty different results! Running the old and new logic on 2016 AL storms using default config file settings, here's the results:
In met-9.1...

DEBUG 3: For SUI1 model, contingency table hits = 10, false alarms = 493, and misses = 286.
# Here are the 10 HITS:
DEBUG 4: SUI1 20160525_120000 initialization, 51 lead, 20160527_150000 genesis at (28.4, -73.6) is a HIT for BEST 20160527_180000 genesis at (28.3, -74.4).
DEBUG 4: SUI1 20160813_120000 initialization, 36 lead, 20160815_000000 genesis at (8.5, -23.5) is a HIT for CARQ 20160815_120000 genesis at (9.2, -23.7).
DEBUG 4: SUI1 20160818_000000 initialization, 66 lead, 20160820_180000 genesis at (11.6, -16.3) is a HIT for CARQ 20160820_180000 genesis at (12.4, -15.8).
DEBUG 4: SUI1 20160912_120000 initialization, 33 lead, 20160913_210000 genesis at (12.2, -21.1) is a HIT for CARQ 20160913_060000 genesis at (13, -20).
DEBUG 4: SUI1 20160912_120000 initialization, 117 lead, 20160917_090000 genesis at (14.4, -20) is a HIT for CARQ 20160918_000000 genesis at (12.5, -20.9).
DEBUG 4: SUI1 20160917_000000 initialization, 33 lead, 20160918_090000 genesis at (11.9, -23.6) is a HIT for CARQ 20160918_120000 genesis at (12.7, -22.9).
DEBUG 4: SUI1 20160922_120000 initialization, 93 lead, 20160926_090000 genesis at (10.6, -39.6) is a HIT for CARQ 20160925_180000 genesis at (8.1, -39.4).
DEBUG 4: SUI1 20160927_000000 initialization, 30 lead, 20160928_060000 genesis at (13.4, -57.3) is a HIT for BEST 20160928_120000 genesis at (13.4, -59.8).
DEBUG 4: SUI1 20161002_000000 initialization, 36 lead, 20161003_120000 genesis at (22.9, -59.9) is a HIT for BEST 20161004_060000 genesis at (23.2, -59.8).
DEBUG 4: SUI1 20161111_120000 initialization, 93 lead, 20161115_090000 genesis at (10.9, -77.9) is a HIT for CARQ 20161115_120000 genesis at (11.6, -77.6).

Updated version...

DEBUG 3: For filter 1 (AL_BASIN) SUI1 model, technique 1 contingency table hits = 2, false alarms = 501, and misses = 269.
# Here are the 2 HITS:
DEBUG 4: SUI1 20160927_000000 initialization, 36 lead, 20160928_120000 BEST track AL142016 genesis at (13.40000, -59.80000) and 20160928_060000 forecast genesis at (13.40000, -57.30000) is a technique 1 HIT with a genesis time offset of 6 hours and location offset of 270.72126 km.
DEBUG 4: SUI1 20161002_000000 initialization, 54 lead, 20161004_060000 BEST track AL152016 genesis at (23.20000, -59.80000) and 20161003_120000 forecast genesis at (22.90000, -59.90000) is a technique 1 HIT with a genesis time offset of 18 hours and location offset of 34.93146 km.

So 10 hits went down to 2. Of the 10 HITS, 4 had genesis times of 00, 06, 12, or 18 while 6 had genesis times of 03, 09, 15, or 21. Those non-6 hourly times have 0 chance of matching since the BEST and CARQ track points are only 6-hourly!

So what should we do? Should we only analyze 6-hourly forecast track points instead of including the 3-hourly ones? So any forecast genesis at 03, 09, 15, or 21 would most likely be shifted 3 hours forward.

@halperin-erau
Copy link

halperin-erau commented Dec 29, 2020 via email

@JohnHalleyGotway
Copy link
Collaborator Author

Definitely easier to only include track points for valid hours 00, 06, 12, and 18 rather than interpolating. So I'll go that route for now.
OTTO in 2016 is a pesky little storm. It started in the Atlantic as AL162016 but eventually moved to the East Pac as EP222016. Since the genesis occurred in the Atlantic, I'd like to make tc_gen smart enough to figure that out and discard the EP222016 track and corresponding genesis event entirely. When a storm moves from one basin to another, it seems like we really only care about the track/genesis event for the original basin.
Will work on that today.

JohnHalleyGotway added a commit that referenced this issue Dec 30, 2020
… the memory myself. This makes the implmentation of TrackInfoArray::erase_storm_id() very easy. Replace n_tracks() function with n() in several places.
JohnHalleyGotway added a commit that referenced this issue Dec 30, 2020
…load_dland.h/.cc to load_tc_data.h/.cc and add code to read the basin file.
@JohnHalleyGotway
Copy link
Collaborator Author

Making progress, but still trying to reconcile why 10 previous hits are reduced to 3.
For example, in the previous version, we had:

DEBUG 4: SUI1 20160813_120000 initialization, 36 lead, 20160815_000000 genesis at (8.5, -23.5) is a HIT for CARQ 20160815_120000 genesis at (9.2, -23.7).

But the new logic looks for a BEST track match for this time (20160815_000000), but none is present. So it looks for a 0-hour CARQ track point and does find this one:

AL, 06, 2016081500, 01, CARQ,   0,  97N,  204W,  20, 1009, DB,  34, NEQ,    0,    0,    0,    0, 1010,  120,  60,   0,   0,   L,   0,   X, 270,  11,     INVEST, S,

But those locations are 366 km apart, which is >300, so they do NOT MATCH.
Increasing the radius to 500km, and then they do match.

@JohnHalleyGotway
Copy link
Collaborator Author

John, consider the following changes:

  • Since you reimplemented TrackInfoArray to use a vector, consider doing the same for TrackInfo, storing the TrackPoints in a vector.
  • TC-Gen supports vx_mask, but consider supporting a basin_mask = threshold as well. For example, basin_mask = ==1|==2 (to match AL or EP). We are now reading in the basin data anyway, might as well use it to make life easier for users.
  • When reading the basin data, the mapping of ints to basin abbrev is hard-coded. Would probably be better to update the basin data file and use that to define the mapping instead.

@halperin-erau
Copy link

halperin-erau commented Dec 30, 2020 via email

@halperin-erau
Copy link

halperin-erau commented Dec 30, 2020 via email

JohnHalleyGotway added a commit that referenced this issue Dec 30, 2020
…sages and add lots of details to the tc_gen documentation.
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Jan 6, 2021

See comments from 1/6/21 meeting here:
#1430 (comment)

@JohnHalleyGotway JohnHalleyGotway linked a pull request Jan 22, 2021 that will close this issue
10 tasks
JohnHalleyGotway added a commit that referenced this issue Jan 23, 2021
* Per #1448, many changes for TC-Gen. Replace the oper_genesis dictionary with the oper_technique string. Add genesis_init_diff config entry. Update config_constants.h accordingly and the tc_gen_conf_info.h/.cc to parse the updated config entries.

* Per #1448, large overhaul of the tc_gen matching logic. This work is not yet complete. Still need to compute categorical MISSES but the current version does compile.

* Per #1448, add GenesisInfoArray::has_storm_id() function and remove the unused set_dland() function.

* Per #1448, more updates. Define the best genesis events while parsing the best tracks. We need to know the best genesis events in order to count up the forecast misses.

* Per #1448, lots more changes for tc_gen. Create a PairDataGenesis class to store genesis pairs. This will be needed to write a matched pair line type.

* Per #1448, minor tweaks to log messages.

* Per #1448, update PairDataGenesis class to store the BEST track Storm ID since the forecast genesis do not have meaningful Storm ID's.

* Per #1448, in GenesisInfoArray::add(), do NOT store multiple genesis events for the same storm, but do print a useful Debug(3) log message about it.

* Per #1448, update PairDataGenesis::has_case() logic to check the storm id and initialization time but NOT require an exact forecast hour match.

* Per #1448, update the tc_gen log messages to more concisely and consistently report the storm id.

* Per #1448, update the PairDataGenesis logic a bit to have all the misses and hits in chronological order.

* Per #1448, add genesis_init_diff entry.

* Per #1448, set the default genesis_init_diff entry to 48 hours since that's what Dan H used in his examples.

* Per #1448, work on comments and log messages.

* Per #1448, reimplement TrackInfoArray as a vector instead of managing the memory myself. This makes the implmentation of TrackInfoArray::erase_storm_id() very easy. Replace n_tracks() function with n() in several places.

* Per #1448, add valid_freq and basin_file config entries. Also rename load_dland.h/.cc to load_tc_data.h/.cc and add code to read the basin file.

* Per #1448, add GenesisInfoArray::erase_storm_id().

* Per #1448, update tc_gen code to handle new config options.

* Per #1448, had my units wrong. Was processing seconds when I thought it was hours!

* Per #1448, making test TC-Gen config file consistent with the default.

* Per #1448, also track the obs valid times.

* Per #1448, switch from tech1/tech2 to dev/ops methods. Update log messages and add lots of details to the tc_gen documentation.

* Per #1430, in tc_gen enable dev_method_flag, ops_method_flag, ci_alpha, and output_flag to be specified separately for each filter. Also add nc_pairs_flag and genesis_track_points_window config options. Add config constants entries for these options and update tc_gen to handle all of these changes.

* Per #1430, consolidate the parse_grid_mask() code a bit to avoid redundancy.:

* Per #1430, just cleaning up some messy comments.

* Per #1430, adding hooks for writing NetCDF output file.

* Per #1430, update DataPlane::set_size() function to take a 3rd argument to specify how the DataPlane should be initialized.

* Per #1430, update the nc_pairs_flag options and update the code to parse them.

* Per #1430, update the TrackInfo class to track and report the min/max warm core information.

* Per #1430, current state of development. Still a work in progress. I'm getting runtime segfaults when testing and I still need to NOT overcount the BEST track hits.

* Per #1430, committing changes described by #1430 (comment)

* Per #1430, forgot to rename genesis_match_window to genesis_hit_window as it is in the code.

* Per #1430, chaning GenesisInfo to just inherit directly from TrackInfo. Frankly, I should have thought of this a LONG time ago.

* Per #1430, change the default desc setting from NA to ALL and add the best_unique_flag option.

* Per #1430, simplify the logic now that GenesisInfo is derived from TrackInfo. Also support the best_unique_flag config option.

* Per #1430, instead of storing 12 individual DataPlane objects, store them in a map to make writing their output more convenient.

* Per #1430, updating documentation and comments.

* Per #1430, more doc updates.

* Per #1430, update unit test to only write NetCDF counts for the AL_BASIN and not the other filters.

* Per #1430, fix parsing logic for nc_pairs_flag = TRUE.

* Per #1430, fix bug. Check the VxOpt.NcInfo before calling write_nc(), not the top-level one.

* Per #1430, the docker build of tc_gen failed.

* Per #1430, working on DockerHub compilation.

* Per #1430, getting DockerHub build working.

* One more try.

* Per #1597, add hooks for new GENMPR stat line type.

* Per #1597, add config file option and column definitions for the GENMPR line type.

* Per #1597, finish writing the GENMPR line type.

* Per #1597, change the default output grid from a global 5 degree to global 1 degree grid.

* Per #1597, change GENMPR output columns to GEN_TDIFF and INIT_TDIFF since they're reported in HHMMSS format instead of seconds. Also, tweak the config file for the tc-gen unit test.

* Per #1597, have to add GENMPR header columns for Stat-Analysis and test scripts to handle it.

* Per #1597, update Stat-Analysis to handle the GENMPR line type.

* Per #1597, user's guide updates for the GENMPR and NetCDF output file.

* Per #1597, add AGEN_INIT and AGEN_FHR columns.

* Per #1597, add AGEN_INIT and AGEN_FHR columns.

* Per #1597, remove the AGEN_TIME and BGEN_TIME columns from the GENMPR line type and instead write the genesis times to the FCST_VALID_BEG/END and OBS_VALID_BEG/END header columns.

* Remove some unused output column name definitions. There are a remnant from very early versions of MET which included the CTP, CFP, and COP line types.

* Per #1597, update config file options to use dev_hit_radius, dev_hit_window, and opt_hit_tdiff. Also update log message to switch from 'lead' to 'forecast hour'.

* Per #1626, add met_regrid_nearest() utility function since I'm calling it twice.

* Per #1626, update the basin_global_tenth_degree.nc basin definition file to include basin name abbreviations.

* Per #1626, update load_tc_data.h/.cc to also read the basin abbreviations from the NetCDF basin file.

* Per #1626, add TC-Gen config file options for init_inc, init_exc, and basin_mask. Updated the library and application code, and updated the user's guide.
JohnHalleyGotway added a commit that referenced this issue Jan 24, 2021
* Getting rid of compiler warnings in PB2NC by replacing several instances of the NULL pointer with the nul character (\0) instead.

* Fix typo in config_options.rst.

* Feature 1408 var_name_for_grib_code (#1617)

* #1408 Added get_var_id

* #1408 Check variable name in the configuration to use the variable name instewad of grib code

* #1408 Added point2grid_ascii2nc_surfrad_DW_PSP_by_name

* Feature 1580 2d time (#1616)

* #1580 Added get_grid_from_lat_lon_vars

* #1580 Added get_grid_from_lat_lon_vars and support 2D time variable

* #1580 Support int type variable without scale_factor and add_offset attributes

* #1580 Support 2D time variable. Implemented filtering by valid_time

* #1580 Bug fix: read time with dimension 0

* #1580 Support time variable with no dimension

* #1580 Initial release

* #1580 Added point2grid_2D_time

* #1580 Check project attribute for GOES

* #1580 Changed NULL to 0 to avoid co,pilation warning

* #1580 Added point2grid_2D_time

* #1580 Added "point2grid configuration file" section

* #1580 Changed to_grid for point2grid_NCCF_UK & point2grid_2D_time

Co-authored-by: Howard Soh <[email protected]>
Co-authored-by: John Halley Gotway <[email protected]>

* feature 1580 nccf (#1619)

* #1580 Correct the precision at _apply_scale_factor

* #1580 Added unit test plot_data_plane_NCCF_time

* #1580 Changed argument type to double at _apply_scale_factor(double)

* Bugfix 1618 develop pb2nc (#1623)

Co-authored-by: Howard Soh <[email protected]>

* Feature 1624 OBS_COMMAND (#1625)

* Per #1627, add grid_data.regrid config option for PlotPointObs and update the tool to do the requested regridding. Still need to update the docs.

* Per #1627, update docs about grid_data.regrid config option for PlotPointObs.

* Per #1627, add another call to plot_point_obs to exercise the new regrid functionality.

* Feature 1624 obs_command second try (#1629)

* Per #1624, define OBS_COMMAND.

* Per #1624, unset the test-specific environment variables after completing the run.

* Per #1624, after PR #1625 merged these changes into develop, they caused 2 unexpected diffs in the NB output. These were caused by enviornment variables being unset after each test. Updating unit_netcdf.xml and unit_point2grid.xml to define more test-specific environment variables to reproduce previous NB output.

* Organizing NB climatology and point2grid output files into the appopriate directories rather than having them at the top-level directory.

* Update pull_request_template.md

* Update the point2grid unit tests to write their temp files to the point2grid subdirectory instead of the top-level test output directory.

* Update appendixC.rst

Split the definition of H_RATE and POD

* Feature 1626 tc_gen (#1633)

* Per #1448, many changes for TC-Gen. Replace the oper_genesis dictionary with the oper_technique string. Add genesis_init_diff config entry. Update config_constants.h accordingly and the tc_gen_conf_info.h/.cc to parse the updated config entries.

* Per #1448, large overhaul of the tc_gen matching logic. This work is not yet complete. Still need to compute categorical MISSES but the current version does compile.

* Per #1448, add GenesisInfoArray::has_storm_id() function and remove the unused set_dland() function.

* Per #1448, more updates. Define the best genesis events while parsing the best tracks. We need to know the best genesis events in order to count up the forecast misses.

* Per #1448, lots more changes for tc_gen. Create a PairDataGenesis class to store genesis pairs. This will be needed to write a matched pair line type.

* Per #1448, minor tweaks to log messages.

* Per #1448, update PairDataGenesis class to store the BEST track Storm ID since the forecast genesis do not have meaningful Storm ID's.

* Per #1448, in GenesisInfoArray::add(), do NOT store multiple genesis events for the same storm, but do print a useful Debug(3) log message about it.

* Per #1448, update PairDataGenesis::has_case() logic to check the storm id and initialization time but NOT require an exact forecast hour match.

* Per #1448, update the tc_gen log messages to more concisely and consistently report the storm id.

* Per #1448, update the PairDataGenesis logic a bit to have all the misses and hits in chronological order.

* Per #1448, add genesis_init_diff entry.

* Per #1448, set the default genesis_init_diff entry to 48 hours since that's what Dan H used in his examples.

* Per #1448, work on comments and log messages.

* Per #1448, reimplement TrackInfoArray as a vector instead of managing the memory myself. This makes the implmentation of TrackInfoArray::erase_storm_id() very easy. Replace n_tracks() function with n() in several places.

* Per #1448, add valid_freq and basin_file config entries. Also rename load_dland.h/.cc to load_tc_data.h/.cc and add code to read the basin file.

* Per #1448, add GenesisInfoArray::erase_storm_id().

* Per #1448, update tc_gen code to handle new config options.

* Per #1448, had my units wrong. Was processing seconds when I thought it was hours!

* Per #1448, making test TC-Gen config file consistent with the default.

* Per #1448, also track the obs valid times.

* Per #1448, switch from tech1/tech2 to dev/ops methods. Update log messages and add lots of details to the tc_gen documentation.

* Per #1430, in tc_gen enable dev_method_flag, ops_method_flag, ci_alpha, and output_flag to be specified separately for each filter. Also add nc_pairs_flag and genesis_track_points_window config options. Add config constants entries for these options and update tc_gen to handle all of these changes.

* Per #1430, consolidate the parse_grid_mask() code a bit to avoid redundancy.:

* Per #1430, just cleaning up some messy comments.

* Per #1430, adding hooks for writing NetCDF output file.

* Per #1430, update DataPlane::set_size() function to take a 3rd argument to specify how the DataPlane should be initialized.

* Per #1430, update the nc_pairs_flag options and update the code to parse them.

* Per #1430, update the TrackInfo class to track and report the min/max warm core information.

* Per #1430, current state of development. Still a work in progress. I'm getting runtime segfaults when testing and I still need to NOT overcount the BEST track hits.

* Per #1430, committing changes described by #1430 (comment)

* Per #1430, forgot to rename genesis_match_window to genesis_hit_window as it is in the code.

* Per #1430, chaning GenesisInfo to just inherit directly from TrackInfo. Frankly, I should have thought of this a LONG time ago.

* Per #1430, change the default desc setting from NA to ALL and add the best_unique_flag option.

* Per #1430, simplify the logic now that GenesisInfo is derived from TrackInfo. Also support the best_unique_flag config option.

* Per #1430, instead of storing 12 individual DataPlane objects, store them in a map to make writing their output more convenient.

* Per #1430, updating documentation and comments.

* Per #1430, more doc updates.

* Per #1430, update unit test to only write NetCDF counts for the AL_BASIN and not the other filters.

* Per #1430, fix parsing logic for nc_pairs_flag = TRUE.

* Per #1430, fix bug. Check the VxOpt.NcInfo before calling write_nc(), not the top-level one.

* Per #1430, the docker build of tc_gen failed.

* Per #1430, working on DockerHub compilation.

* Per #1430, getting DockerHub build working.

* One more try.

* Per #1597, add hooks for new GENMPR stat line type.

* Per #1597, add config file option and column definitions for the GENMPR line type.

* Per #1597, finish writing the GENMPR line type.

* Per #1597, change the default output grid from a global 5 degree to global 1 degree grid.

* Per #1597, change GENMPR output columns to GEN_TDIFF and INIT_TDIFF since they're reported in HHMMSS format instead of seconds. Also, tweak the config file for the tc-gen unit test.

* Per #1597, have to add GENMPR header columns for Stat-Analysis and test scripts to handle it.

* Per #1597, update Stat-Analysis to handle the GENMPR line type.

* Per #1597, user's guide updates for the GENMPR and NetCDF output file.

* Per #1597, add AGEN_INIT and AGEN_FHR columns.

* Per #1597, add AGEN_INIT and AGEN_FHR columns.

* Per #1597, remove the AGEN_TIME and BGEN_TIME columns from the GENMPR line type and instead write the genesis times to the FCST_VALID_BEG/END and OBS_VALID_BEG/END header columns.

* Remove some unused output column name definitions. There are a remnant from very early versions of MET which included the CTP, CFP, and COP line types.

* Per #1597, update config file options to use dev_hit_radius, dev_hit_window, and opt_hit_tdiff. Also update log message to switch from 'lead' to 'forecast hour'.

* Per #1626, add met_regrid_nearest() utility function since I'm calling it twice.

* Per #1626, update the basin_global_tenth_degree.nc basin definition file to include basin name abbreviations.

* Per #1626, update load_tc_data.h/.cc to also read the basin abbreviations from the NetCDF basin file.

* Per #1626, add TC-Gen config file options for init_inc, init_exc, and basin_mask. Updated the library and application code, and updated the user's guide.

Co-authored-by: hsoh-u <[email protected]>
Co-authored-by: Howard Soh <[email protected]>
Co-authored-by: John Halley Gotway <[email protected]>
Co-authored-by: j-opatz <[email protected]>
JohnHalleyGotway added a commit that referenced this issue Jan 26, 2021
* Getting rid of compiler warnings in PB2NC by replacing several instances of the NULL pointer with the nul character (\0) instead.

* Fix typo in config_options.rst.

* Feature 1408 var_name_for_grib_code (#1617)

* #1408 Added get_var_id

* #1408 Check variable name in the configuration to use the variable name instewad of grib code

* #1408 Added point2grid_ascii2nc_surfrad_DW_PSP_by_name

* Feature 1580 2d time (#1616)

* #1580 Added get_grid_from_lat_lon_vars

* #1580 Added get_grid_from_lat_lon_vars and support 2D time variable

* #1580 Support int type variable without scale_factor and add_offset attributes

* #1580 Support 2D time variable. Implemented filtering by valid_time

* #1580 Bug fix: read time with dimension 0

* #1580 Support time variable with no dimension

* #1580 Initial release

* #1580 Added point2grid_2D_time

* #1580 Check project attribute for GOES

* #1580 Changed NULL to 0 to avoid co,pilation warning

* #1580 Added point2grid_2D_time

* #1580 Added "point2grid configuration file" section

* #1580 Changed to_grid for point2grid_NCCF_UK & point2grid_2D_time

Co-authored-by: Howard Soh <[email protected]>
Co-authored-by: John Halley Gotway <[email protected]>

* feature 1580 nccf (#1619)

* #1580 Correct the precision at _apply_scale_factor

* #1580 Added unit test plot_data_plane_NCCF_time

* #1580 Changed argument type to double at _apply_scale_factor(double)

* Bugfix 1618 develop pb2nc (#1623)

Co-authored-by: Howard Soh <[email protected]>

* Feature 1624 OBS_COMMAND (#1625)

* Per #1627, add grid_data.regrid config option for PlotPointObs and update the tool to do the requested regridding. Still need to update the docs.

* Per #1627, update docs about grid_data.regrid config option for PlotPointObs.

* Per #1627, add another call to plot_point_obs to exercise the new regrid functionality.

* Feature 1624 obs_command second try (#1629)

* Per #1624, define OBS_COMMAND.

* Per #1624, unset the test-specific environment variables after completing the run.

* Per #1624, after PR #1625 merged these changes into develop, they caused 2 unexpected diffs in the NB output. These were caused by enviornment variables being unset after each test. Updating unit_netcdf.xml and unit_point2grid.xml to define more test-specific environment variables to reproduce previous NB output.

* Organizing NB climatology and point2grid output files into the appopriate directories rather than having them at the top-level directory.

* Update pull_request_template.md

* Update the point2grid unit tests to write their temp files to the point2grid subdirectory instead of the top-level test output directory.

* Update appendixC.rst

Split the definition of H_RATE and POD

* Feature 1626 tc_gen (#1633)

* Per #1448, many changes for TC-Gen. Replace the oper_genesis dictionary with the oper_technique string. Add genesis_init_diff config entry. Update config_constants.h accordingly and the tc_gen_conf_info.h/.cc to parse the updated config entries.

* Per #1448, large overhaul of the tc_gen matching logic. This work is not yet complete. Still need to compute categorical MISSES but the current version does compile.

* Per #1448, add GenesisInfoArray::has_storm_id() function and remove the unused set_dland() function.

* Per #1448, more updates. Define the best genesis events while parsing the best tracks. We need to know the best genesis events in order to count up the forecast misses.

* Per #1448, lots more changes for tc_gen. Create a PairDataGenesis class to store genesis pairs. This will be needed to write a matched pair line type.

* Per #1448, minor tweaks to log messages.

* Per #1448, update PairDataGenesis class to store the BEST track Storm ID since the forecast genesis do not have meaningful Storm ID's.

* Per #1448, in GenesisInfoArray::add(), do NOT store multiple genesis events for the same storm, but do print a useful Debug(3) log message about it.

* Per #1448, update PairDataGenesis::has_case() logic to check the storm id and initialization time but NOT require an exact forecast hour match.

* Per #1448, update the tc_gen log messages to more concisely and consistently report the storm id.

* Per #1448, update the PairDataGenesis logic a bit to have all the misses and hits in chronological order.

* Per #1448, add genesis_init_diff entry.

* Per #1448, set the default genesis_init_diff entry to 48 hours since that's what Dan H used in his examples.

* Per #1448, work on comments and log messages.

* Per #1448, reimplement TrackInfoArray as a vector instead of managing the memory myself. This makes the implmentation of TrackInfoArray::erase_storm_id() very easy. Replace n_tracks() function with n() in several places.

* Per #1448, add valid_freq and basin_file config entries. Also rename load_dland.h/.cc to load_tc_data.h/.cc and add code to read the basin file.

* Per #1448, add GenesisInfoArray::erase_storm_id().

* Per #1448, update tc_gen code to handle new config options.

* Per #1448, had my units wrong. Was processing seconds when I thought it was hours!

* Per #1448, making test TC-Gen config file consistent with the default.

* Per #1448, also track the obs valid times.

* Per #1448, switch from tech1/tech2 to dev/ops methods. Update log messages and add lots of details to the tc_gen documentation.

* Per #1430, in tc_gen enable dev_method_flag, ops_method_flag, ci_alpha, and output_flag to be specified separately for each filter. Also add nc_pairs_flag and genesis_track_points_window config options. Add config constants entries for these options and update tc_gen to handle all of these changes.

* Per #1430, consolidate the parse_grid_mask() code a bit to avoid redundancy.:

* Per #1430, just cleaning up some messy comments.

* Per #1430, adding hooks for writing NetCDF output file.

* Per #1430, update DataPlane::set_size() function to take a 3rd argument to specify how the DataPlane should be initialized.

* Per #1430, update the nc_pairs_flag options and update the code to parse them.

* Per #1430, update the TrackInfo class to track and report the min/max warm core information.

* Per #1430, current state of development. Still a work in progress. I'm getting runtime segfaults when testing and I still need to NOT overcount the BEST track hits.

* Per #1430, committing changes described by #1430 (comment)

* Per #1430, forgot to rename genesis_match_window to genesis_hit_window as it is in the code.

* Per #1430, chaning GenesisInfo to just inherit directly from TrackInfo. Frankly, I should have thought of this a LONG time ago.

* Per #1430, change the default desc setting from NA to ALL and add the best_unique_flag option.

* Per #1430, simplify the logic now that GenesisInfo is derived from TrackInfo. Also support the best_unique_flag config option.

* Per #1430, instead of storing 12 individual DataPlane objects, store them in a map to make writing their output more convenient.

* Per #1430, updating documentation and comments.

* Per #1430, more doc updates.

* Per #1430, update unit test to only write NetCDF counts for the AL_BASIN and not the other filters.

* Per #1430, fix parsing logic for nc_pairs_flag = TRUE.

* Per #1430, fix bug. Check the VxOpt.NcInfo before calling write_nc(), not the top-level one.

* Per #1430, the docker build of tc_gen failed.

* Per #1430, working on DockerHub compilation.

* Per #1430, getting DockerHub build working.

* One more try.

* Per #1597, add hooks for new GENMPR stat line type.

* Per #1597, add config file option and column definitions for the GENMPR line type.

* Per #1597, finish writing the GENMPR line type.

* Per #1597, change the default output grid from a global 5 degree to global 1 degree grid.

* Per #1597, change GENMPR output columns to GEN_TDIFF and INIT_TDIFF since they're reported in HHMMSS format instead of seconds. Also, tweak the config file for the tc-gen unit test.

* Per #1597, have to add GENMPR header columns for Stat-Analysis and test scripts to handle it.

* Per #1597, update Stat-Analysis to handle the GENMPR line type.

* Per #1597, user's guide updates for the GENMPR and NetCDF output file.

* Per #1597, add AGEN_INIT and AGEN_FHR columns.

* Per #1597, add AGEN_INIT and AGEN_FHR columns.

* Per #1597, remove the AGEN_TIME and BGEN_TIME columns from the GENMPR line type and instead write the genesis times to the FCST_VALID_BEG/END and OBS_VALID_BEG/END header columns.

* Remove some unused output column name definitions. There are a remnant from very early versions of MET which included the CTP, CFP, and COP line types.

* Per #1597, update config file options to use dev_hit_radius, dev_hit_window, and opt_hit_tdiff. Also update log message to switch from 'lead' to 'forecast hour'.

* Per #1626, add met_regrid_nearest() utility function since I'm calling it twice.

* Per #1626, update the basin_global_tenth_degree.nc basin definition file to include basin name abbreviations.

* Per #1626, update load_tc_data.h/.cc to also read the basin abbreviations from the NetCDF basin file.

* Per #1626, add TC-Gen config file options for init_inc, init_exc, and basin_mask. Updated the library and application code, and updated the user's guide.

* Fixing Fortify warnings for 'Poor Style: Variable Never Used' in 6 files.

* Fix Fortify warnings for 'Uninitialized variable' in tc_gen.cc and point2grid.cc.

* Fix Fortify warnings for 'Poor Style: Redundant Initialization' in plot_point_obs.cc and point2grid.cc.

* Feature 1346 valid time attr (#1634)

* #1346 get_att_value_unixtime supports yyyymmdd_hhmmss, too

* #1346 Check valid_time & init_time attributes, too

* #1346 Check valid_time & init_time attributes, too

Co-authored-by: Howard Soh <[email protected]>

* Feature 1473 python errors (#1615)

* Added sample script to read ascii data and create an xarray.

* Disabled use_xarray exit for testing.

* Get attrs from DataArray if using xarray.

* Removed some comments.

* Revised error messages for use with both numpy and xarray.

* Removing commented out code.

Co-authored-by: David Fillmore <[email protected]>
Co-authored-by: johnhg <[email protected]>

* Feature 1630 zero obs (#1637)

* Per #1630, update ascii2nc to change zero observations from an error (which returns bad status) to a warning message.

* Per #1630, update point2grid to read an empty input file and write fields of 0's or bad data to the output. Change previous error message to warning. Also, update LOTS of warning and error log messages to make them consistent.

* Per #1630, need to initialize the dataplanes before the loop (for when there are no obs) and within each loop iteration (for when there are multiple fields to process).

* Bugfix 1638 develop climo cdf (#1639)

* Per #1638, correct the order of arguments in the call to the normal_cdf() utility function.

* Per #1638, update the logic in derive_climo_prob(). For CDP thresholds, the constant climo probability should be based on the inequality type where less-than-types match the threshold percentile value while greater-than-types are 1.0 minus the threshold percentile.

* Per #1638, update normal_cdf() to initialize the output CDF field using the climo mean field instead of the observation data field. This makes the timestamps consistent for the climo mean, stdev, and cdf variables in the Grid-Stat NetCDF matched pairs output file.

* Update tc_gen.cc

Co-authored-by: hsoh-u <[email protected]>
Co-authored-by: Howard Soh <[email protected]>
Co-authored-by: John Halley Gotway <[email protected]>
Co-authored-by: j-opatz <[email protected]>
Co-authored-by: David Fillmore <[email protected]>
Co-authored-by: David Fillmore <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: blocker Blocker requestor: DTC/PAS DTC Physics Across Scales T&E type: enhancement Improve something that it is currently doing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants