Skip to content

Commit

Permalink
Feature 2654 ascii2nc polar buoy support (#2846)
Browse files Browse the repository at this point in the history
* Added iabp data type, and modified file_handler to filter based on time range, which was added as a command line option

* handle time using input year, hour, min, and doy

* cleanup and switch to position day of year for time computations

* Added an ascii2nc unit test for iabp data

* Added utility scripts to pull iabp data from the web and find files in a time range

* Modified iabp_handler to always output a placeholder 'location' observation with value 1

* added description of IABP data python utility scripts

* Fixed syntax error

* Fixed Another syntax error.

* Slight reformat of documentation

* Per #2654, update the Makefiles in scripts/python/utility to include all the python scripts that should be installed.

* Per #2654, remove unused code from get_iabp_from_web.py that is getting flagged as a bug by SonarQube.

* Per #2654, fix typo in docs

---------

Co-authored-by: John Halley Gotway <[email protected]>
Co-authored-by: MET Tools Test Account <[email protected]>
  • Loading branch information
3 people authored Apr 10, 2024
1 parent 3d543fc commit 2a26d59
Show file tree
Hide file tree
Showing 14 changed files with 1,105 additions and 15 deletions.
53 changes: 46 additions & 7 deletions docs/Users_Guide/reformat_point.rst
Original file line number Diff line number Diff line change
Expand Up @@ -458,6 +458,8 @@ While initial versions of the ASCII2NC tool only supported a simple 11 column AS

• `International Soil Moisture Network (ISMN) Data format <https://ismn.bafg.de/en/>`_.

• `International Arctic Buoy Programme (IABP) Data format <https://iabp.apl.uw.edu/>`_.

• `AErosol RObotic NEtwork (AERONET) versions 2 and 3 format <http://aeronet.gsfc.nasa.gov/>`_

• Python embedding of point observations, as described in :numref:`pyembed-point-obs-data`. See example below in :numref:`ascii2nc-pyembed`.
Expand Down Expand Up @@ -522,6 +524,8 @@ Once the ASCII point observations have been formatted as expected, the ASCII fil
netcdf_file
[-format ASCII_format]
[-config file]
[-valid_beg time]
[-valid_end time]
[-mask_grid string]
[-mask_poly file]
[-mask_sid file|list]
Expand All @@ -541,21 +545,25 @@ Required Arguments for ascii2nc
Optional Arguments for ascii2nc
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

3. The **-format ASCII_format** option may be set to "met_point", "little_r", "surfrad", "wwsis", "airnowhourlyaqobs", "airnowhourly", "airnowdaily_v2", "ndbc_standard", "ismn", "aeronet", "aeronetv2", "aeronetv3", or "python". If passing in ISIS data, use the "surfrad" format flag.
3. The **-format ASCII_format** option may be set to "met_point", "little_r", "surfrad", "wwsis", "airnowhourlyaqobs", "airnowhourly", "airnowdaily_v2", "ndbc_standard", "ismn", "iabp", "aeronet", "aeronetv2", "aeronetv3", or "python". If passing in ISIS data, use the "surfrad" format flag.

4. The **-config file** option is the configuration file for generating time summaries.

5. The **-mask_grid** string option is a named grid or a gridded data file to filter the point observations spatially.
5. The **-valid_beg** time option in YYYYMMDD[_HH[MMSS]] format sets the beginning of the retention time window.

6. The **-mask_poly** file option is a polyline masking file to filter the point observations spatially.
6. The **-valid_end** time option in YYYYMMDD[_HH[MMSS]] format sets the end of the retention time window.

7. The **-mask_sid** file|list option is a station ID masking file or a comma-separated list of station ID's to filter the point observations spatially. See the description of the "sid" entry in :numref:`config_options`.
7. The **-mask_grid** string option is a named grid or a gridded data file to filter the point observations spatially.

8. The **-log file** option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file.
8. The **-mask_poly** file option is a polyline masking file to filter the point observations spatially.

9. The **-v level** option indicates the desired level of verbosity. The value of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity above 1 will increase the amount of logging.
9. The **-mask_sid** file|list option is a station ID masking file or a comma-separated list of station ID's to filter the point observations spatially. See the description of the "sid" entry in :numref:`config_options`.

10. The **-compress level** option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.
10. The **-log file** option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file.

11. The **-v level** option indicates the desired level of verbosity. The value of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity above 1 will increase the amount of logging.

12. The **-compress level** option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.

An example of the ascii2nc calling sequence is shown below:

Expand Down Expand Up @@ -1203,3 +1211,34 @@ For how to use the script, issue the command:
.. code-block:: none
python3 MET_BASE/python/utility/print_pointnc2ascii.py -h
IABP retrieval Python Utilities
====================================

`International Arctic Buoy Programme (IABP) Data <https://iabp.apl.uw.edu/>`_ is one of the data types supported by ascii2nc. A utility script that pulls all this data from the web and stores it locally, called get_iabp_from_web.py is included. This script accesses the appropriate webpage and downloads the ascii files for all buoys. It is straightforward, but can be time intensive as the archive of this data is extensive and files are downloaded one at a time.

The script can be found at:

.. code-block:: none
MET_BASE/python/utility/get_iabp_from_web.py
For how to use the script, issue the command:

.. code-block:: none
python3 MET_BASE/python/utility/get_iabp_from_web.py -h
Another IABP utility script is included for users, to be run after all files have been downloaded using get_iabp_from_web.py. This script examines all the files and lists those files that contain entries that fall within a user specified range of days. It is called find_iabp_in_timerange.py.

The script can be found at:

.. code-block:: none
MET_BASE/python/utility/find_iabp_in_timerange.py
For how to use the script, issue the command:

.. code-block:: none
python3 MET_BASE/python/utility/find_iabp_in_timerange.py -h
15 changes: 15 additions & 0 deletions internal/test_unit/xml/unit_ascii2nc.xml
Original file line number Diff line number Diff line change
Expand Up @@ -211,4 +211,19 @@
</output>
</test>

<test name="ascii2nc_iabp">
<exec>&MET_BIN;/ascii2nc</exec>
<param> \
-format iabp \
-valid_beg 20140101 -valid_end 20140201 \
&DATA_DIR_OBS;/iabp/090629.dat \
&DATA_DIR_OBS;/iabp/109320.dat \
&DATA_DIR_OBS;/iabp/109499.dat \
&OUTPUT_DIR;/ascii2nc/iabp_20140101_20140201.nc
</param>
<output>
<point_nc>&OUTPUT_DIR;/ascii2nc/iabp_20140101_20140201.nc</point_nc>
</output>
</test>

</met_test>
5 changes: 4 additions & 1 deletion scripts/python/utility/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,11 @@
pythonutilitydir = $(pkgdatadir)/python/utility

pythonutility_DATA = \
build_ndbc_stations_from_web.py \
find_iabp_in_timerange.py \
get_iabp_from_web.py \
print_pointnc2ascii.py \
build_ndbc_stations_from_web.py
rgb2ctable.py

EXTRA_DIST = ${pythonutility_DATA}

Expand Down
5 changes: 4 additions & 1 deletion scripts/python/utility/Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -311,8 +311,11 @@ top_builddir = @top_builddir@
top_srcdir = @top_srcdir@
pythonutilitydir = $(pkgdatadir)/python/utility
pythonutility_DATA = \
build_ndbc_stations_from_web.py \
find_iabp_in_timerange.py \
get_iabp_from_web.py \
print_pointnc2ascii.py \
build_ndbc_stations_from_web.py
rgb2ctable.py

EXTRA_DIST = ${pythonutility_DATA}
MAINTAINERCLEANFILES = Makefile.in
Expand Down
241 changes: 241 additions & 0 deletions scripts/python/utility/find_iabp_in_timerange.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
#!/usr/bin/env python3

from optparse import OptionParser
import urllib.request
import datetime
from datetime import date
import os
import shutil
import shlex
import errno
from subprocess import Popen, PIPE



#----------------------------------------------
def usage():
print("Usage: find_iabp_in_timerange.py -s yyyymmdd -e yyyymmdd [-d PATH]")

#----------------------------------------------
def is_date_in_range(input_date, start_date, end_date):
return start_date <= input_date <= end_date

#----------------------------------------------
def lookFor(name, inlist, filename, printWarning=False):
rval = -1
try:
rval = inlist.index(name)
except:
if printWarning:
print(name, " not in header line, file=", filename)

return rval

#----------------------------------------------
def pointToInt(index, tokens, filename):
if index < 0 or index >= len(tokens):
print("ERROR index out of range ", index)
return -1
return int(tokens[index])

#----------------------------------------------
def pointToFloat(index, tokens, filename):
if index < 0 or index >= len(tokens):
print("ERROR index out of range ", index)
return -99.99
return float(tokens[index])

#----------------------------------------------
class StationHeader:
def __init__(self, headerLine, filename):
tokens = headerLine.split()
self._ok = True
self._idIndex = lookFor('BuoyID', tokens, filename, True)
self._yearIndex = lookFor('Year', tokens, filename, True)
self._hourIndex = lookFor('Hour', tokens, filename, True)
self._minuteIndex = lookFor('Min', tokens, filename, True)
self._doyIndex = lookFor('DOY', tokens, filename, True)
self._posdoyIndex = lookFor('POS_DOY', tokens, filename, True)
self._latIndex = lookFor('Lat', tokens, filename, True)
self._lonIndex = lookFor('Lon', tokens, filename, True)
self._bpIndex = lookFor('BP', tokens, filename, False)
self._tsIndex = lookFor('Ts', tokens, filename, False)
self._taIndex = lookFor('Ta', tokens, filename, False)
self._ok = self._idIndex != -1 and self._yearIndex != -1 and self._hourIndex != -1 \
and self._minuteIndex != -1 and self._doyIndex != -1 and self._posdoyIndex != -1 \
and self._latIndex != -1 and self._lonIndex != -1
if not self._ok:
print("ERROR badly formed header line")

#----------------------------------------------
class Station:
def __init__(self, line, filename, stationHeader):
self._ok = True
tokens = line.split()
self._id = pointToInt(stationHeader._idIndex, tokens, filename)
if self._id < 0:
self._ok = False
self._year = pointToInt(stationHeader._yearIndex, tokens, filename)
if self._year < 0:
self._ok = False
self._hour = pointToInt(stationHeader._hourIndex, tokens, filename)
if self._hour < 0:
self._ok = False
self._minute = pointToInt(stationHeader._minuteIndex, tokens, filename)
if self._minute < 0:
self._ok = False
self._doy = pointToFloat(stationHeader._doyIndex, tokens, filename)
if self._doy < 0:
self._ok = False
if self._doy > 365:
self._ok = False
self._posdoy = pointToFloat(stationHeader._posdoyIndex, tokens, filename)
if self._posdoy < 0:
self._ok = False
if self._posdoy > 365:
self._ok = False
self._lat = pointToFloat(stationHeader._latIndex, tokens, filename)
if self._lat == -99.99:
self._ok = False
self._lon = pointToFloat(stationHeader._lonIndex, tokens, filename)
if self._lon == -99.99:
self._ok = False
if stationHeader._bpIndex >= 0:
self._pressure = pointToFloat(stationHeader._bpIndex, tokens, filename)
else:
self._pressure = -99.99
if stationHeader._tsIndex >= 0:
self._tempsurface = pointToFloat(stationHeader._tsIndex, tokens, filename)
else:
self._tempsurface = -99.99
if stationHeader._taIndex >= 0:
self._tempair = pointToFloat(stationHeader._taIndex, tokens, filename)
else:
self._tempair = -99.99

if self._ok:
d = datetime.datetime(self._year, 1, 1) + datetime.timedelta(self._doy - 1)
self._month = d.month
self._day = d.day
else:
self._month = -1
self._day = -1
def timeInRange(self, start_date, end_date):
if self._ok:
input_date = date(self._year, self._month, self._day)
return is_date_in_range(input_date, start_date, end_date)
else:
return False

#----------------------------------------------
class StationTimeSeries:
def __init__(self, stationHeader):
self._stationHeader = stationHeader
self._data = []
def add(self, line, filename):
s = Station(line, filename, self._stationHeader)
if s._ok:
self._data.append(s)
def print(self):
print("Nothing")
def hasTimesInRange(self, start_date, end_date):
for s in self._data:
if (s.timeInRange(start_date, end_date)):
return True
return False

#----------------------------------------------
def doCmd(cmd, debug=False):
#print(cmd)
my_env = os.environ.copy()
args = shlex.split(cmd)
proc = Popen(args, stdout=PIPE, stderr=PIPE, env=my_env)
out, err = proc.communicate()
exitcode = proc.returncode
if exitcode == 0:
return str(out)
else:
if debug:
print("Command failed ", cmd)
return ""

#----------------------------------------------
def getdatafilenames(aDir):
if (os.path.exists(aDir)):
allFiles = [name for name in os.listdir(aDir) \
if not os.path.isdir(os.path.join(aDir, name))]
return [s for s in allFiles if '.dat' in s]
else:
return []

#----------------------------------------------
def run2(data_path, start, end):

if (data_path[0:2] != "./" and data_path[0] != '/'):
inpath = "./" + data_path
else:
inpath = data_path

print("data_path = ", inpath)

# could put testing here to make sure strings will convert
print("start = ", start)
print("end = ", end)

y0 = int(start[0:4])
m0 = int(start[4:6])
d0 = int(start[6:8])

y1 = int(end[0:4])
m1 = int(end[4:6])
d1 = int(end[6:8])

print("Looking for file with data in range ", y0, m0, d0, " to ", y1, m1, d1)

# read each file that ends in .dat
stationfiles = getdatafilenames(inpath)
stationfiles.sort()

print("We have ", len(stationfiles), " data files to look at")
start_date = date(y0, m0, d0)
end_date = date(y1, m1, d1)

for i in range(len(stationfiles)):

#print("Looking at ", stationfiles[i])
with open(inpath + "/" + stationfiles[i], 'r') as file:
data_all = file.read()
file.close()
lines = data_all.splitlines()

# first line is a header, remaining lines are a time series
sh = StationHeader(lines[0], stationfiles[i])
if sh._ok:
lines = lines[1:]
st = StationTimeSeries(sh)
for l in lines:
st.add(l, stationfiles[i])

if (st.hasTimesInRange(start_date, end_date)):
print(stationfiles[i])

#----------------------------------------------
def create_parser_options(parser):
parser.add_option("-d", "--data_path", dest="data_path",
default="./iabp_files", help=" path to the station files (.dat) (default: ./iabp_files)")
parser.add_option("-s", "--start", dest="start",
default="notset", help=" starting yyyymmdd. Must be set")
parser.add_option("-e", "--end", dest="end",
default="notset", help=" ending yyyymmdd. Must be set")
return parser.parse_args()

#----------------------------------------------
if __name__ == "__main__":
usage_str = "%prog [options]"
parser = OptionParser(usage = usage_str)
options, args = create_parser_options(parser)
if (options.start == "notset" or options.end == "notset"):
usage()
exit(0)
run2(options.data_path, options.start, options.end)
exit(0)
Loading

0 comments on commit 2a26d59

Please sign in to comment.