Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Dictionary #237

Open
megpritch opened this issue Jun 23, 2022 · 12 comments
Open

Data Dictionary #237

megpritch opened this issue Jun 23, 2022 · 12 comments

Comments

@megpritch
Copy link
Collaborator

megpritch commented Jun 23, 2022

Ongoing data dictionary for land and river hdf5 files, as well as useful rhdf5 commands.

Missing or Yet To Be Found

  • Withdrawals. In the UCI, they are stored under the EXT SOURCE block (see code below), but, these names (DIVR, DIVA, EXTNL, and OUTDGT) do not appear in the list of the contents of the h5 file h5dump -n OR1_7700_7980.h5
  • Discharges (i.e. point source). Similar to withdrawals they appear in the EXT SOURCES block,

Code 1: EXT SOURCE -> DIVERSIONS

*** DIVERSIONS
WDM3  3007 DIVR     ENGLZERO          SAME RCHRES   1     EXTNL  OUTDGT 1
WDM3  3008 DIVA     ENGLZERO          SAME RCHRES   1     EXTNL  OUTDGT 2
@megpritch
Copy link
Collaborator Author

megpritch commented Jun 23, 2022

terminal commands to know:

@megpritch
Copy link
Collaborator Author

megpritch commented Jun 23, 2022

HDF5 file structure:

  • “HDF” stands for “Hierarchical Data Format”
  • hdf5 is a file-based database used in scientific applications, including hsp2.
  • hdf5 files (.h5) include two main members, groups and datasets
  • Groups are the main organizational unit, similar to folders in a directory
  • Datasets are similar to arrays in structure and function
  • Within groups and datasets, there are attributes, which are similar to entries in an 2d-array or table
  • Group names in the OR1_7700_7980.h5 file:
h5ls("OR1_7700_7980.h5", recursive = FALSE)
  group       name     otype dclass dim
0     /    CONTROL H5I_GROUP           
1     /    FTABLES H5I_GROUP           
2     /     RCHRES H5I_GROUP           
3     /    RESULTS H5I_GROUP           
4     /   RUN_INFO H5I_GROUP           
5     / TIMESERIES H5I_GROUP       

Listing subgroups within the main groups:
CONTROL: EXT_SOURCES, GLOBAL, OP_SEQUENCE,
FTABLES: FT001,
RCHRES: ADCALC/PARAMETERS, GENERAL/ACTIVITY, GENERAL/INFO, HYDR/PARAMETERS, HYDR/SAVE, HYDR/STATES
RESULTS: RCHRES_R001/HYDR,
RUN_INFO: LOGFILE
TIMESERIES: LAPSE_Table, SEASONS_Table, SUMMARY, Saturated_Vapor_Pressure_Table, TS####

  • The h5read command in rhdf5 can be used with these file paths and an additional "/table" to view what is in each part of the file. For example:
  • "table" is the only thing that has actual data in it, which is why it has to always be mentioned at the end
h5read("OR1_7700_7980.h5", "/CONTROL/EXT_SOURCES/table")
   index SVOL SVOLNO SMEMN SMEMSB SSYST SGAPST   MFACTOR TRAN   TVOL  TGRPN
1      0    * TS1000  EVAP     31  ENGL   ZERO 1.000e+00 SAME RCHRES       
2     27    * TS3010  SNO3     31  ENGL   ZERO 1.000e+00  DIV RCHRES INFLOW
3     28    * TS3061  SFAS     31  ENGL   ZERO 7.027e-06  DIV RCHRES INFLOW
4     29    * TS3062  SFAC     31  ENGL   ZERO 7.027e-06  DIV RCHRES INFLOW
5     30    *  TS011  WATR     31  ENGL   ZERO 1.000e+00 SAME RCHRES INFLOW
6     31    *  TS012  HEAT     31  ENGL   ZERO 1.000e+00 SAME RCHRES INFLOW
7     32    *  TS013  DOXY     31  ENGL   ZERO 1.000e+00 SAME RCHRES INFLOW
8     33    *  TS021  SAND     31  ENGL   ZERO 1.000e+00 SAME RCHRES INFLOW
9     34    *  TS022  SILT     31  ENGL   ZERO 1.000e+00 SAME RCHRES INFLOW
10    35    *  TS023  CLAY     31  ENGL   ZERO 1.000e+00 SAME RCHRES INFLOW

@glenncampagna
Copy link
Collaborator

glenncampagna commented Jun 24, 2022

Group and Dataset Findings

  • The RESULTS group contains /RCHRES_R001/HYDR/table which is a dataset with multiple entries including a timestamp:
h5dump -d "/RESULTS/RCHRES_R001/HYDR/table" OR1_7700_7980.h5
HDF5 "OR1_7700_7980.h5" {
DATASET "/RESULTS/RCHRES_R001/HYDR/table" {
   DATATYPE  H5T_COMPOUND {
      H5T_STD_I64LE "index";
      H5T_IEEE_F32LE "DEP";
      H5T_IEEE_F32LE "IVOL";
      H5T_IEEE_F32LE "O1";
      H5T_IEEE_F32LE "O2";
      H5T_IEEE_F32LE "O3";
      H5T_IEEE_F32LE "OVOL1";
      H5T_IEEE_F32LE "OVOL2";
      H5T_IEEE_F32LE "OVOL3";
      H5T_IEEE_F32LE "PRSUPY";
      H5T_IEEE_F32LE "RO";
      H5T_IEEE_F32LE "ROVOL";
      H5T_IEEE_F32LE "SAREA";
      H5T_IEEE_F32LE "TAU";
      H5T_IEEE_F32LE "USTAR";
      H5T_IEEE_F32LE "VOL";
      H5T_IEEE_F32LE "VOLEV";
   }
   DATASPACE  SIMPLE { ( 315576 ) / ( H5S_UNLIMITED ) }
   DATA {
   (0): {
         441766800000000000,
         0.241507,
         8.84726,
         0,
         0,
         2.05518,
         0,
         0,
         0.12294,
         0,
         2.05518,
         0.12294,
         67.4737,
         0.0234426,
         0.109986,
         15.7943,
         0
      },
  • The TIMESERIES group contains TS1001 which appears to be a dataset with a timestamp and one other value:
h5dump -d "/TIMESERIES/TS1001/table" OR1_7700_7980.h5
HDF5 "OR1_7700_7980.h5" {
DATASET "/TIMESERIES/TS1001/table" {
   DATATYPE  H5T_COMPOUND {
      H5T_STD_I64LE "index";
      H5T_IEEE_F64LE "values";
   }
   DATASPACE  SIMPLE { ( 13515 ) / ( H5S_UNLIMITED ) }
   DATA {
   (0): {
         441763200000000000,
         8.29279
      },
   (1): {
         441849600000000000,
         15.7014
      },
   (2): {
         441936000000000000,
         26.9599
      },
   (3): {
         442022400000000000,
         28.2724
  • We generated test10.h5 from a model run and are exploring its components. It appears there are new groups in the file and subgroups within RCHRES_R001 and other places that weren't in our last h5 file:
h5ls("test10.h5", recursive = FALSE)
  group       name     otype dclass dim
0     /    CONTROL H5I_GROUP           
1     /    FTABLES H5I_GROUP           
2     /      GENER H5I_GROUP           
3     /     IMPLND H5I_GROUP           
4     /     PERLND H5I_GROUP           
5     /     RCHRES H5I_GROUP           
6     /    RESULTS H5I_GROUP           
7     /   RUN_INFO H5I_GROUP           
8     / TIMESERIES H5I_GROUP  
did1 <- H5Dopen(fid, "/RESULTS/RCHRES_R001/CONS/table")
cons1 <- H5Dread(did1, bit64conversion= "double")
head(cons1)
         index CONS1_COADDR CONS1_COADEP CONS1_COADWT CONS1_CON CONS1_ICON
1 1.893060e+17            0            0            0  999.2728          0
2 1.893096e+17            0            0            0  998.5554          0
3 1.893132e+17            0            0            0  997.8396          0
4 1.893168e+17            0            0            0  997.1254          0
5 1.893204e+17            0            0            0  996.4130          0
6 1.893240e+17            0            0            0  995.7021          0
  CONS1_OCON1 CONS1_OCON2 CONS1_ROCON
1           0           0           0
2           0           0           0
3           0           0           0
4           0           0           0
5           0           0           0
6           0           0           0

did2 <- H5Dopen(fid, "/RESULTS/RCHRES_R001/HTRCH/table")
htrch1 <- H5Dread(did2, bit64conversion= "double")
head(htrch1)
         index   AIRTMP    HTEXCH    IHEAT OHEAT1 OHEAT2 QBED      QCON
1 1.893060e+17 25.41425 -62032380 54077.12      0      0    0 -72.60050
2 1.893096e+17 25.41425 -60429864 53439.25      0      0    0 -70.96387
3 1.893132e+17 24.96470 -59227876 53393.18      0      0    0 -70.31579
4 1.893168e+17 24.96470 -57743728 53347.14      0      0    0 -68.75593
5 1.893204e+17 24.71495 -56504028 53301.15      0      0    0 -67.76141
6 1.893240e+17 24.71495 -55128412 53255.20      0      0    0 -66.27591
       QEVAP    QLONGW QPREC QSOLAR    QTOTAL ROHEAT       TW
1 -108.19046 -58.56945     0      0 -239.3604      0 59.19260
2 -104.23579 -57.84451     0      0 -233.0442      0 58.43417
3 -100.47568 -57.48961     0      0 -228.2811      0 57.69154
4  -96.87755 -56.80472     0      0 -222.4382      0 56.96812
5  -93.45057 -56.33280     0      0 -217.5448      0 56.26089
6  -90.17333 -55.68602     0      0 -212.1353      0 55.57148

@rburghol
Copy link
Contributor

rburghol commented Jun 28, 2022

@glenncampagna The The HD5 extraction error is definitely caused by the first missing WDM. The first error says that it couldn't find that prad WDM file. Let's check the issue, perhaps I failed to include a link to allow you to download it? The second error is evidence that the the model did not run... and thus there can be no results if there was no model run. That's a good piece of information for us as well, meaning that the entire data structure may not be established when the H5 file is first created.

@rburghol
Copy link
Contributor

rburghol commented Jun 29, 2022

Per the notes about extra data components in test 10. There are two that are specifically related to land simulations, so they won't show up in a river simulation: PRLND, and IMPLND represent pervious land surface, and impervious land surface respectively.

For example:

h5ls("test10.h5", recursive = FALSE)
  group       name     otype dclass dim
0     /    CONTROL H5I_GROUP           
1     /    FTABLES H5I_GROUP           
2     /      GENER H5I_GROUP           
3     /     IMPLND H5I_GROUP           
4     /     PERLND H5I_GROUP           
5     /     RCHRES H5I_GROUP           
6     /    RESULTS H5I_GROUP           
7     /   RUN_INFO H5I_GROUP           
8     / TIMESERIES H5I_GROUP          

@glenncampagna
Copy link
Collaborator

glenncampagna commented Jun 29, 2022

Influence of UCI on output hdf5 file:

  • All WDM files needed for the run are called under the FILES section of the UCI, the test case UCI for example:
FILES
<FILE>  <UN#>***<----FILE NAME------------------------------------------------->
WDM1       21   met_A51037.wdm
WDM2       22   prad_A51037.wdm
WDM3       23   ps_sep_div_ams_p532sova_2021_OR1_7700_7980.wdm
WDM4       24   OR1_7700_7980.wdm
MESSU      25   OR1_7700_7980.ech
           26   OR1_7700_7980.out
           31   OR1_7700_7980.tau
END FILES

This is why we originally got an error relating to the prad_A51037.wdm file, because the UCI called it but we didn't have it downloaded to the directory.

  • The additional groups found in test10.h5 (comment above) that are not found in A51800.h5 (FTABLES, GENER, IMPLND, RCHRES) are there because those sections appear in the test10 UCI but do not appear in the A51800 UCI. Conclusion: for a group to appear in the output hdf5, it must be included in the input UCI
h5ls("A51800.h5", recursive =FALSE)
  group       name     otype dclass dim
0     /    CONTROL H5I_GROUP           
1     /     PERLND H5I_GROUP           
2     /    RESULTS H5I_GROUP           
3     /   RUN_INFO H5I_GROUP           
4     / TIMESERIES H5I_GROUP   

UCI files:
https://raw.githubusercontent.com/HARPgroup/HSPsquared/master/tests/test_cbp_land/forA51800.uci
https://raw.githubusercontent.com/respec/HSPsquared/master/tests/test10/HSP2results/test10.uci

@juliabruneau
Copy link
Contributor

juliabruneau commented Jun 29, 2022

DEFINITIONS: River uci

From: https://github.com/respec/HSPsquared/tree/master/docs = the manual for HSPF

FTABLES = a collection of function tables

  • The geometric and hydraulic properties of a RCHRES are summarized in a function table (FTABLE)

RCHRES = free-flowing reach or mixed reservoir

  • All water entering the RCHRES must be assigned a category

RESULTS/RCHRES_R001/HYDR/below/table

HYDR = hydraulic behavior

PLTGEN = output time series to a list file

IVOL = the inflow to each “category” (set by RCHRES) is input as time series and, IVOL is computed as the sum

DEP = depth at specified location [ft]

OVOL = “exit-specific output time-series” - the actual withdrawals are available in the appropriate member of the existing OVOL [ac.ft/ivld]

PRSUPY = volume of water contributed by precipitation on surface [ac.ft/ivld]

RO = outflow rate at of interval [ft3/s or m3/s]

ROVOL = total volume of outflow from RCHRES [ac.ft/ivld]

SAREA = surface area of the water in RCHRES [ac or ha]

TAU = bed shear stress [lb/ft2]

USTAR = shear velocity [ft/s]

VOLEV = volume of water lost by evaporation [ac.ft]

O = rates of outflow through individual exits [ft3/s]

DELT = simulation time interval (min)

PLS = pervious land segment, often referenced in variable definitions

@nicoledarling
Copy link
Contributor

nicoledarling commented Jun 29, 2022

DEFINITIONS: A51800.h5

This includes what the data points look like from the HDFView tables
Definitions from HSPF manual: https://github.com/respec/HSPsquared/blob/master/docs/HSPF_v12.2_manual%2Bnav.pdf

RESULTS/PERLND_P001/PWATER/table

AGWET = evapotranspiration from groundwater [in/interval]
Data: often zero, but occasional periods of values

AGWI = active groundwater inflow [in/ivld]
Data: often zero, but occasional periods of values

AGWLI = active groundwater lateral inflow [in/ivld]
Data: zeros

AGWO = active groundwater outflow [in/interval]
Data: often values, but occasional periods of zeros

AGWS = active groundwater storage at the start of the interval [in]
Data: usually values but some zeros

BASET = E-T taken for active groundwater outflow (baseflow) [in/ivld]
Data: zeros

CEPE = evap. From interception storage [in/ivld]
Data: often zero, but occasional periods of values

CEPS = interception storage [in]
Data: approx equal amount of values and zeros

GWVS = index to groundwater slope [in]
Data: negative values and zeros in the beginning, but positive mostly

IFWI = interflow inflow (excluding lateral) [in/ivld]
Data: approx equal amount of values and zeros

IFWLI = interflow lateral inflow [in/ivld]
Data: zeros

IFWO = interflow outflow [in/interval]
Data: approx equal amount of values and zeros

IFWS = interflow storage at the start of the interval [in]
Data: approx equal amount of values and zeros

IGWI = inflow to inactive (deep) GW [in/ivld]
Data: zeros

INFFAC = factor to account for frozen ground effects, if applicable [none]
Data: usually always 1.0, but some periods when 0.9…

INFIL = infiltration to the soil [in/ivld]
Data: zeros

LZET = E-T from lower zone [in/ivld] (does E-T mean evapotranspiration here?)
Data: mostly zeros but occasional positive values lower than 10

LZI = lower zone inflow [in/ivld]
Data: zeros

LZLI = lower zone lateral inflow [in/ivld]
Data: zeros

LZS = initial lower zone storage [in]
Data: All values are approximately = 5 , but slightly less (4.97499..), and values change very slightly

PERC = percolation from upper to lower zone [in/ivld]
Data: zeros

PERO = total outflow from PLS [in/ivld] (what is PLS? - It is a “Pervious Land Segment”)
Data: values begin at approximately 0.003 but slowly decrease

PERS = total water stored in the PLS [in]
Data: values start around 6.5 and seem to steadily decrease

PET = potential E-T, adjusted for snow cover and air temperature [in/ivld]
Data: mostly zeros, occasional values less than 10

PETADJ = adjustment factor for potential ET [no units (fraction)]
Data: Values are all either 0, 0.5, or 1

SUPY = water supply to soil surface [in/ivld]
Data: zeros

SURI = surface inflow [in/ivld]
Data: zeros

SURLI = surface lateral inflow [in/ivld]
Data: zeros

SURO = surface outflow [in/ivld]
Data: zeros

SURS = surface detention storage [in]
Data: zeros

TAET = total simulated E-T [in/ivld]
Data: mix of zeros and values less than 10

TGWS = total groundwater storage [in]
Data: values start at ~ 1 and decrease steadily

UZET = E-T from upper zone [in/ivld]
Data: zeros and small decimals

UZI = upper zone inflow [in/ivld]
Data: zeros

UZLI = upper zone lateral inflow [in/ivld]
Data: zeros

UZS = upper zone storage [in]
Data: Values start at 0.6 and decrease very slowly

**RESULTS/PERLND_P001/ATEMP/table

AIRTMP = corrected air temperature [degrees F]
Data: all positive values approximately around 20-80, no zeros

GATMP = air temperature at gage [degrees F]
Data: all positive values approximately around 20-50, no zeros

RESULTS/PERLND_P001/SNOW/table

ALBEDO = reflectivity of snowpack (only available if SNOPFG = 0) [none]
Data: mostly all zeros or very close to zero

CONVINX = snow cover index [in]
Data: all data points are 0.106

DEWTMP = dew point [degrees F]
Data: No zeros, data ranges from 20-90

DULL = dullness index of snowpack, available if SNOPFG = 0) [none]
Data: all zero or close to zero

MELT = quantity of melt from PACKF [in/ivld]
Data: mostly all zero, range from 0-3

NEGHTS = negative heat storage [in]
Data: all zero or very close to zero

PACK = total contents of pack (water equivalent) [in]
Data: ranges from 0-0.6

PACKF = frozen contents of the pack (snow and ice) [in]
Data: mostly all zero

PACKI = ice in pack [in]
Data: mostly all zeros

PACKW = liquid water in pack [in]
Data: mostly all zeros

PAKTMP = mean temperature of the snowpack [degrees F]
Data: all data points are 32

PDEPTH = pack depth [in]
Data: mostly all zeros

PRAIN = rainfall directly onto the snowpack [in/ivld]
Data: all zero

RAINF = rainfall [in/ivld]
Data: mostly all zero or close to zero

RDENPF = relative density of frozen contents of pack (PACKF/PDEPTH) [none]
Data: all “NaN”

SKYCLR = fraction of sky assumed clear [none]
Data: range from 0.1-1.0

SNOCOV = fraction of land segment covered by pack [none]
Data: range from 0-1

SNOTMP = max air temperature for which snowfall occurs [degrees F]
Data: either 32.0 or 33.0

SNOWE = evaporation from PACKF (sublimation) [in/ivld]
Data: almost all zero

SNOWF = snowfall [in/invld]
Data: almost all zero or close to zero

WYIELD = water yielded by the pack (released to the land-surface) [in/ivld]
Data: almost all zero or close to zero

XLNMLT = maximum increment to ice in pack [in]
Data: some zeros, some range from 0-0.3


DEFINITIONS: OR1_7700_7980.h5

RESULTS/RCHRES_R001/HYDR

DEP = depth of water [ft]
Data: ranges from 0-4, no zeros

IVOL = sum of inflows to the RCHRES [ac.ft/ivld]
Data: ranges from 1-20

O1 = Rate of outflow through exit 1 [cfs]
Data: all zero

O2 = Rate of outflow through exit 2 [cfs]
Data: all zero

O3 = Rate of outflow through exit 3 [cfs]
Data: ranges from 1-1600

OVOL1 = volume of outflow through exit 1 [ac.ft/ivld]
Data: all zero

OVOL2 = volume of outflow through exit 2 [ac.ft/ivld]
Data: all zero

OVOL3 = volume of outflow through exit 3 [ac.ft/ivld]
Data: positive values usually between 5 and 10

PRSUPY = volume of water contributed by precipitation on surface [ac.ft/ivld]
Data: mix of zeros and small values less than 1

RO = Total rate of outflow from RCHRES [cfs]
Data: ranges from 2 to ~110, no zeros

ROVOL = total volume of outflow from RCHRES [ac.ft/ivld]
Data: Ranges a little above zero to ~ 385, no zeros

SAREA = surface area (of reach) [ac]
Data: Ranges from ~60 to almost 200, no zeros

TAU = bed shear stress [lb/ft^2]
Data: all decimals, no zeros

USTAR = shear velocity [ft/s]
Data: All values ~ 0.3, no zeros

VOL = volume at the end of the interval
Data: Ranges from ~ 15 to < 900, no zeros

VOLEV = volume of water lost by evaporation [ac-ft/ivld]
Data: Mix of zeros and decimals with a couple values ~ 5

@juliabruneau
Copy link
Contributor

Note on the Units above:

IVLD = 'interval of the data'

  • for a dataset with a time interval of 1 hour => units would become [in/hr]

@glenncampagna
Copy link
Collaborator

DEQ Index for hdf5 file storage

Url : http://deq1.bse.vt.edu:81/p6/out/
To navigate to the directory in terminal : cd /media/model/p6/out/
We recently created : cd /media/model/p6/out/land/hsp2_2022/eos/ for storage of our land segment model run outputs

@glenncampagna
Copy link
Collaborator

VAhydro/CBP river segment naming convention

3 parts: [river basin/order] [this segment id] [next downstream seg id]
So, for OR1_7700_7980 : 7700 is the current seg id, and 7980 is the next downstream seg

@glenncampagna
Copy link
Collaborator

Important DSNs

  • Land Segments:
DSN Description Variable Name Units
111 Runoff SURO in/hr
  • River Segments:
DSN Description Variable Name Units
11 Inflow to the IVOL ac-ft/hr
111 Total outflow from the river seg ROVOL ac-ft/hr
3000 Point source inflow IVOL ac-ft/hr
3007 Withdrawal DIVR  
3008 Withdrawal (agricultural?) DIVA  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants