Feature 2654 ascii2nc polar buoy support (#2846)

* Added iabp data type, and modified file_handler to filter based on time range, which was added as a command line option * handle time using input year, hour, min, and doy * cleanup and switch to position day of year for time computations * Added an ascii2nc unit test for iabp data * Added utility scripts to pull iabp data from the web and find files in a time range * Modified iabp_handler to always output a placeholder 'location' observation with value 1 * added description of IABP data python utility scripts * Fixed syntax error * Fixed Another syntax error. * Slight reformat of documentation * Per #2654, update the Makefiles in scripts/python/utility to include all the python scripts that should be installed. * Per #2654, remove unused code from get_iabp_from_web.py that is getting flagged as a bug by SonarQube. * Per #2654, fix typo in docs --------- Co-authored-by: John Halley Gotway <[email protected]> Co-authored-by: MET Tools Test Account <[email protected]>
dtcenter · Apr 10, 2024 · 2a26d59 · 2a26d59
1 parent 3d543fc
commit 2a26d59
Show file tree

Hide file tree

Showing 14 changed files with 1,105 additions and 15 deletions.
diff --git a/docs/Users_Guide/reformat_point.rst b/docs/Users_Guide/reformat_point.rst
@@ -458,6 +458,8 @@ While initial versions of the ASCII2NC tool only supported a simple 11 column AS
 
 • `International Soil Moisture Network (ISMN) Data format <https://ismn.bafg.de/en/>`_.
 
+• `International Arctic Buoy Programme (IABP) Data format <https://iabp.apl.uw.edu/>`_.
+
 • `AErosol RObotic NEtwork (AERONET) versions 2 and 3 format <http://aeronet.gsfc.nasa.gov/>`_
 
 • Python embedding of point observations, as described in :numref:`pyembed-point-obs-data`. See example below in :numref:`ascii2nc-pyembed`.
@@ -522,6 +524,8 @@ Once the ASCII point observations have been formatted as expected, the ASCII fil
          netcdf_file
          [-format ASCII_format]
          [-config file]
+         [-valid_beg time]
+         [-valid_end time]
          [-mask_grid string]
          [-mask_poly file]
          [-mask_sid file|list]
@@ -541,21 +545,25 @@ Required Arguments for ascii2nc
 Optional Arguments for ascii2nc
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-3. The **-format ASCII_format** option may be set to "met_point", "little_r", "surfrad", "wwsis", "airnowhourlyaqobs", "airnowhourly", "airnowdaily_v2", "ndbc_standard", "ismn", "aeronet", "aeronetv2", "aeronetv3", or "python". If passing in ISIS data, use the "surfrad" format flag.
+3. The **-format ASCII_format** option may be set to "met_point", "little_r", "surfrad", "wwsis", "airnowhourlyaqobs", "airnowhourly", "airnowdaily_v2", "ndbc_standard", "ismn", "iabp", "aeronet", "aeronetv2", "aeronetv3", or "python". If passing in ISIS data, use the "surfrad" format flag.
 
 4. The **-config file** option is the configuration file for generating time summaries.
 
-5. The **-mask_grid** string option is a named grid or a gridded data file to filter the point observations spatially.
+5. The **-valid_beg** time option in YYYYMMDD[_HH[MMSS]] format sets the beginning of the retention time window.
 
-6. The **-mask_poly** file option is a polyline masking file to filter the point observations spatially.
+6. The **-valid_end** time option in YYYYMMDD[_HH[MMSS]] format sets the end of the retention time window.
 
-7. The **-mask_sid** file|list option is a station ID masking file or a comma-separated list of station ID's to filter the point observations spatially. See the description of the "sid" entry in :numref:`config_options`.
+7. The **-mask_grid** string option is a named grid or a gridded data file to filter the point observations spatially.
 
-8. The **-log file** option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file.
+8. The **-mask_poly** file option is a polyline masking file to filter the point observations spatially.
 
-9. The **-v level** option indicates the desired level of verbosity. The value of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity above 1 will increase the amount of logging.
+9. The **-mask_sid** file|list option is a station ID masking file or a comma-separated list of station ID's to filter the point observations spatially. See the description of the "sid" entry in :numref:`config_options`.
 
-10. The **-compress level** option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.
+10. The **-log file** option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file.
+
+11. The **-v level** option indicates the desired level of verbosity. The value of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity above 1 will increase the amount of logging.
+
+12. The **-compress level** option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.
 
 An example of the ascii2nc calling sequence is shown below:
 
@@ -1203,3 +1211,34 @@ For how to use the script, issue the command:
 .. code-block:: none
 
    python3 MET_BASE/python/utility/print_pointnc2ascii.py -h
+
+IABP retrieval Python Utilities
+====================================
+
+`International Arctic Buoy Programme (IABP) Data <https://iabp.apl.uw.edu/>`_ is one of the data types supported by ascii2nc.  A utility script that pulls all this data from the web and stores it locally, called get_iabp_from_web.py is included.  This script accesses the appropriate webpage and downloads the ascii files for all buoys.  It is straightforward, but can be time intensive as the archive of this data is extensive and files are downloaded one at a time.
+
+The script can be found at:
+
+.. code-block:: none
+
+   MET_BASE/python/utility/get_iabp_from_web.py
+
+For how to use the script, issue the command:
+
+.. code-block:: none
+
+   python3 MET_BASE/python/utility/get_iabp_from_web.py -h
+
+Another IABP utility script is included for users, to be run after all files have been downloaded using get_iabp_from_web.py.  This script examines all the files and lists those files that contain entries that fall within a user specified range of days.  It is called find_iabp_in_timerange.py.
+
+The script can be found at:
+
+.. code-block:: none
+
+   MET_BASE/python/utility/find_iabp_in_timerange.py
+
+For how to use the script, issue the command:
+
+.. code-block:: none
+
+   python3 MET_BASE/python/utility/find_iabp_in_timerange.py -h
diff --git a/internal/test_unit/xml/unit_ascii2nc.xml b/internal/test_unit/xml/unit_ascii2nc.xml
@@ -211,4 +211,19 @@
     </output>
   </test>
 
+  <test name="ascii2nc_iabp">
+    <exec>&MET_BIN;/ascii2nc</exec>
+    <param> \
+    -format iabp  \
+    -valid_beg 20140101 -valid_end 20140201 \
+    &DATA_DIR_OBS;/iabp/090629.dat \
+    &DATA_DIR_OBS;/iabp/109320.dat  \
+    &DATA_DIR_OBS;/iabp/109499.dat \
+    &OUTPUT_DIR;/ascii2nc/iabp_20140101_20140201.nc
+    </param>
+    <output>
+      <point_nc>&OUTPUT_DIR;/ascii2nc/iabp_20140101_20140201.nc</point_nc>
+    </output>
+  </test>
+
 </met_test>
diff --git a/scripts/python/utility/Makefile.am b/scripts/python/utility/Makefile.am
@@ -26,8 +26,11 @@
 pythonutilitydir = $(pkgdatadir)/python/utility
 
 pythonutility_DATA = \
+	build_ndbc_stations_from_web.py \
+	find_iabp_in_timerange.py \
+	get_iabp_from_web.py \
 	print_pointnc2ascii.py \
-	build_ndbc_stations_from_web.py
+	rgb2ctable.py
 
 EXTRA_DIST = ${pythonutility_DATA}
 

diff --git a/scripts/python/utility/Makefile.in b/scripts/python/utility/Makefile.in
@@ -311,8 +311,11 @@ top_builddir = @top_builddir@
 top_srcdir = @top_srcdir@
 pythonutilitydir = $(pkgdatadir)/python/utility
 pythonutility_DATA = \
+	build_ndbc_stations_from_web.py \
+	find_iabp_in_timerange.py \
+	get_iabp_from_web.py \
 	print_pointnc2ascii.py \
-	build_ndbc_stations_from_web.py
+	rgb2ctable.py
 
 EXTRA_DIST = ${pythonutility_DATA}
 MAINTAINERCLEANFILES = Makefile.in

diff --git a/scripts/python/utility/find_iabp_in_timerange.py b/scripts/python/utility/find_iabp_in_timerange.py
@@ -0,0 +1,241 @@
+ #!/usr/bin/env python3
+
+from optparse import OptionParser
+import urllib.request
+import datetime
+from datetime import date
+import os
+import shutil
+import shlex
+import errno
+from subprocess import Popen, PIPE
+
+
+
+#----------------------------------------------
+def usage():
+   print("Usage: find_iabp_in_timerange.py -s yyyymmdd -e yyyymmdd [-d PATH]")
+
+#----------------------------------------------
+def is_date_in_range(input_date, start_date, end_date):
+   return start_date <= input_date <= end_date
+
+#----------------------------------------------
+def lookFor(name, inlist, filename, printWarning=False):
+    rval = -1
+    try:
+        rval = inlist.index(name)
+    except:
+        if printWarning:
+            print(name, " not in header line, file=", filename)
+
+    return rval
+
+#----------------------------------------------
+def pointToInt(index, tokens, filename):
+    if index < 0 or index >= len(tokens):
+        print("ERROR index out of range ", index)
+        return -1
+    return int(tokens[index])
+
+#----------------------------------------------
+def pointToFloat(index, tokens, filename):
+    if index < 0 or index >= len(tokens):
+        print("ERROR index out of range ", index)
+        return -99.99
+    return float(tokens[index])
+
+#----------------------------------------------
+class StationHeader:
+    def __init__(self, headerLine, filename):
+        tokens = headerLine.split()
+        self._ok = True
+        self._idIndex = lookFor('BuoyID', tokens, filename, True)
+        self._yearIndex = lookFor('Year', tokens, filename, True)
+        self._hourIndex = lookFor('Hour', tokens, filename, True)
+        self._minuteIndex = lookFor('Min', tokens, filename, True)
+        self._doyIndex = lookFor('DOY', tokens, filename, True)
+        self._posdoyIndex = lookFor('POS_DOY', tokens, filename, True)
+        self._latIndex = lookFor('Lat', tokens, filename, True)
+        self._lonIndex = lookFor('Lon', tokens, filename, True)
+        self._bpIndex = lookFor('BP', tokens, filename, False)
+        self._tsIndex = lookFor('Ts', tokens, filename, False)
+        self._taIndex = lookFor('Ta', tokens, filename, False)
+        self._ok = self._idIndex != -1 and self._yearIndex != -1 and self._hourIndex != -1 \
+            and self._minuteIndex != -1 and self._doyIndex != -1 and self._posdoyIndex != -1 \
+                and self._latIndex != -1 and self._lonIndex != -1
+        if not self._ok:
+            print("ERROR badly formed header line")
+
+#----------------------------------------------
+class Station:
+    def __init__(self, line, filename, stationHeader):
+        self._ok = True
+        tokens = line.split()
+        self._id = pointToInt(stationHeader._idIndex, tokens, filename)
+        if self._id < 0:
+            self._ok = False
+        self._year = pointToInt(stationHeader._yearIndex, tokens, filename)
+        if self._year < 0:
+            self._ok = False
+        self._hour = pointToInt(stationHeader._hourIndex, tokens, filename)
+        if self._hour < 0:
+            self._ok = False
+        self._minute = pointToInt(stationHeader._minuteIndex, tokens, filename)
+        if self._minute < 0:
+            self._ok = False
+        self._doy = pointToFloat(stationHeader._doyIndex, tokens, filename)
+        if self._doy < 0:
+            self._ok = False
+        if self._doy > 365:
+            self._ok = False
+        self._posdoy = pointToFloat(stationHeader._posdoyIndex, tokens, filename)
+        if self._posdoy < 0:
+            self._ok = False
+        if self._posdoy > 365:
+            self._ok = False
+        self._lat = pointToFloat(stationHeader._latIndex, tokens, filename)
+        if self._lat == -99.99:
+            self._ok = False
+        self._lon = pointToFloat(stationHeader._lonIndex, tokens, filename)
+        if self._lon == -99.99:
+            self._ok = False
+        if stationHeader._bpIndex >= 0:
+            self._pressure = pointToFloat(stationHeader._bpIndex, tokens, filename)
+        else:
+            self._pressure = -99.99
+        if stationHeader._tsIndex >= 0:
+            self._tempsurface = pointToFloat(stationHeader._tsIndex, tokens, filename)
+        else:
+            self._tempsurface = -99.99
+        if stationHeader._taIndex >= 0:
+            self._tempair = pointToFloat(stationHeader._taIndex, tokens, filename)
+        else:
+            self._tempair = -99.99
+
+        if self._ok:
+            d = datetime.datetime(self._year, 1, 1) + datetime.timedelta(self._doy - 1)
+            self._month = d.month
+            self._day = d.day
+        else:
+            self._month = -1
+            self._day = -1
+    def timeInRange(self, start_date, end_date):
+        if self._ok:
+            input_date = date(self._year, self._month, self._day)
+            return is_date_in_range(input_date, start_date, end_date)
+        else:
+            return False
+
+#----------------------------------------------
+class StationTimeSeries:
+    def __init__(self, stationHeader):
+        self._stationHeader = stationHeader
+        self._data = []
+    def add(self, line, filename):
+        s = Station(line, filename, self._stationHeader)
+        if s._ok:
+            self._data.append(s)
+    def print(self):
+        print("Nothing")
+    def hasTimesInRange(self, start_date, end_date):
+        for s in self._data:
+            if (s.timeInRange(start_date, end_date)):
+                return True
+        return False
+
+#----------------------------------------------
+def doCmd(cmd, debug=False):
+    #print(cmd)
+  my_env = os.environ.copy()
+  args = shlex.split(cmd)
+  proc = Popen(args, stdout=PIPE, stderr=PIPE, env=my_env)
+  out, err = proc.communicate()
+  exitcode = proc.returncode
+  if exitcode == 0:
+    return str(out)
+  else:
+    if debug:
+        print("Command failed ", cmd)
+    return ""
+
+#----------------------------------------------
+def getdatafilenames(aDir):
+    if (os.path.exists(aDir)):
+        allFiles = [name for name in os.listdir(aDir) \
+                    if not os.path.isdir(os.path.join(aDir, name))]
+        return [s for s in allFiles if '.dat' in s]
+    else:
+        return []
+
+#----------------------------------------------
+def run2(data_path, start, end):
+
+    if (data_path[0:2] != "./" and data_path[0] != '/'):
+        inpath = "./" + data_path
+    else:
+        inpath = data_path
+
+    print("data_path = ", inpath)
+
+    # could put testing here to make sure strings will convert
+    print("start = ", start)
+    print("end = ", end)
+
+    y0 = int(start[0:4])
+    m0 = int(start[4:6])
+    d0 = int(start[6:8])
+
+    y1 = int(end[0:4])
+    m1 = int(end[4:6])
+    d1 = int(end[6:8])
+
+    print("Looking for file with data in range ", y0, m0, d0, " to ", y1, m1, d1)
+
+    # read each file that ends in .dat
+    stationfiles = getdatafilenames(inpath)
+    stationfiles.sort()
+
+    print("We have ", len(stationfiles), " data files to look at")
+    start_date = date(y0, m0, d0)
+    end_date = date(y1, m1, d1)
+
+    for i in range(len(stationfiles)):
+
+        #print("Looking at ", stationfiles[i])
+        with open(inpath + "/" + stationfiles[i], 'r') as file:
+            data_all = file.read()
+        file.close()
+        lines = data_all.splitlines()
+
+        # first line is a header, remaining lines are a time series
+        sh = StationHeader(lines[0], stationfiles[i])
+        if sh._ok:
+            lines = lines[1:]
+            st = StationTimeSeries(sh)
+            for l in lines:
+                st.add(l, stationfiles[i])
+
+            if (st.hasTimesInRange(start_date, end_date)):
+                print(stationfiles[i])
+
+#----------------------------------------------
+def create_parser_options(parser):
+    parser.add_option("-d", "--data_path", dest="data_path",
+            default="./iabp_files", help=" path to the station files (.dat) (default: ./iabp_files)")
+    parser.add_option("-s", "--start", dest="start",
+            default="notset", help=" starting yyyymmdd.  Must be set")
+    parser.add_option("-e", "--end", dest="end",
+            default="notset", help=" ending yyyymmdd.  Must be set")
+    return parser.parse_args()
+
+#----------------------------------------------
+if __name__ == "__main__":
+  usage_str = "%prog [options]"
+  parser = OptionParser(usage = usage_str)
+  options, args = create_parser_options(parser)
+  if (options.start == "notset" or options.end == "notset"):
+     usage()
+     exit(0)
+  run2(options.data_path, options.start, options.end)
+  exit(0)