REST / RomDataSource, metrics and analysis #1354

rburghol · 2024-09-12T13:54:52Z

Analysis Steps

Overview

Using RomDataSource to retrieve/set properties: https://github.com/HARPgroup/meta_model/blob/main/scripts/river/hsp_hydr_analysis.R
- Also uses vahydro_post_metric_to_scenprop() which streamlines the addition of metrics to a model/scenario result summary
Using om_vahydro_metric_grid() to retrieve large amounts of data.
Where should our met data analytics live?
- In the code base
- In the workflow
- In the VAHydro database
What met data analytics will we record (@COBrogan has a bunch!)
Some examples from previous projects
- Add previous met analytics which were not previously workflowed into new met workflows
  - Command is: $NLDAS_ROOT/sqa_met $i met2date $lseg_ftype $model_version "$refresh_years"
    - src: https://github.com/HARPgroup/model_meteorology/blob/main/sh/sqa_met
    - calls:
      - summarize_met
      - qa_met
  - Add to raster_met/wdm/analyze

Data Model & REST

Data Model Outline

Feature (USGS Gage Coverage, or Land Segment, or vahydro River Segment)
- Model (lm_simple, stormVolPlot, cbp-6.0, cbp-6.1, ...)
  - Scenarios: (stormVol_prism, stormVol_nldas2_tiled, simple_lm_nldas2, ...)
    - Scenarios all correspond to a file named [scenario].con in our meteorology config directory on the server, Ex: http://deq1.bse.vt.edu:81/p6/vadeq/config/control/met/stormVol_prism.con
    - Metrics:
      - precip_annual_max_in
      - precip_annual_max_year
      - precip_annual_mean_in
      - precip_annual_min_in
      - precip_annual_min_year
      - precip_hourly_max_in
      - precip_daily_max_in
      - l90_precip_in
        
        Use the IHA package: suppressPackageStartupMessages(library(IHA))
        
        Install IHA - see "Installation" on front page of https://github.com/HARPgroup/hydro-tools
        
        Use group2 to calculate L90 (and a host of other things): https://github.com/HARPgroup/meta_model/blob/342bc955819efd7e0e1f25f3c8364102d2f99237/scripts/river/hsp_hydr_analysis.R#L261
met_file: this will point to a meteorology data file such as http://deq1.bse.vt.edu:81/met/stormVol_prism/precip/

REST Find feature, Model and Scenario

Feature
- Function RomFeature to retrieve object
  - Arguments:
    - hydrocode: unique identifier, ex: usgs_ws_01613900, N51113
    - bundle: general feature class, ex: watershed, landunit
    - ftype: specific feature type (source of data), ex: usgs_full_drainage, cbp6_landseg
- Use: feature <- RomFeature$new(ds, list( hydrocode=coverage_hydrocode, ftype=coverage_ftype, bundle=coverage_bundle), TRUE)
- Returns: a feature object of class RomFeature
- See also: *Detail View of REST Feature Query below
Model
- Function om_model_object()
  - Arguments:
    - feature: an object of class RomFeature
    - model_version: model overall version, ex: pf-1.0, cbp-6.1
  - Use: model <- om_model_object(ds, feature, model_version)
Scenario:
- Function: om_get_model_scenario()
- Arguments:
  - model
  - scenario_name
- Use: scenario <- om_get_model_scenario(ds, model, scenario_name)
Metric:
- Function: vahydro_post_metric_to_scenprop()
- Use: vahydro_post_metric_to_scenprop(pid, varkey, propcode, propname, propvalue, ds)
- Arguments:
  - pid: scenario unique integer value pid
  - varkey: use om_class_Constant for most decimal attributes
  - propcode:
  - propname: required, descriptive variale name
  - propvalue: required
  - ds: RomDatasource
- Ex: vahydro_post_metric_to_scenprop(scenario$pid, 'om_class_Constant', NULL, 'num_records', numrecs, ds)

Detail View of REST Feature Query

feature <- RomFeature$new(
  ds,
  list(
    hydrocode=coverage_hydrocode,
    ftype=coverage_ftype,
    bundle=coverage_bundle
  ),
  TRUE
)

Using om_vahydro_metric_grid to retrieve data


# GET VAHydro 1.0 RIVERSEG l90_Qout DATA
mdf <- data.frame(
  'model_version' = c('cbp-6.1'),
  'runid' = c('stormVol_prism'),
  'metric' = c('precip_annual_max_in'),
  'runlabel' = c('precip_annual_max_in')
)
met_data <- om_vahydro_metric_grid(
  metric = metric, runids = mdf, bundle="watershed", ftype="usgs_full_drainage",
  base_url = paste(site,'entity-model-prop-level-export',sep="/"),
  ds = ds
)

The text was updated successfully, but these errors were encountered:

nathanielf22 · 2024-10-10T01:48:47Z

Here is the current function that I updated a bit tonight. This works for any PRISM, daymet, and NLDAS2 data, such as:
read.csv("http://deq1.bse.vt.edu:81/met/stormVol_prism/precip/usgs_ws_01613900-PRISM-all.csv")

When calculating a metric, you can run something like this to return the lowest 90-day flow:
summary_analytics(prism)$l90_precip_in

`library(tidyr)
library(sqldf)
library(zoo)

For ANY data

summary_analytics <- function(df){
df <-
separate (data = df,
col = obs_date,
into = c("obs_date","obs_time"),
sep = " ")
df <-
separate (data = df,
col = obs_date,
into = c("obs_year","obs_month","obs_day"),
sep = "-")

#creating a yearly summary with each year and its total precip
yearly.summary <-
sqldf(
"SELECT obs_year, SUM(precip_in) AS total_precip
FROM df
GROUP BY obs_year"
)

#summary analytics

precip_annual_max_in <-
max(yearly.summary$total_precip)

precip_annual_max_year <-
yearly.summary$obs_year[which.max(yearly.summary$total_precip)]

precip_annual_mean_in <-
mean(yearly.summary$total_precip)

#For min values and years, we can exclude the first and last row since the
#current year and first years are incomplete data
precip_annual_min_in <-
min(yearly.summary$total_precip[c(-nrow(yearly.summary),-1)])
precip_annual_min_year <-
yearly.summary$obs_year[which.min
(yearly.summary$total_precip[c(-nrow
(yearly.summary),-1)])]

Create daily summary to use for all data. This makes hourly data daily sums.

daily.summary <-
sqldf(
"SELECT obs_year, obs_month, obs_day, SUM(precip_in) AS total_precip
FROM df
GROUP BY obs_year, obs_month, obs_day"
)
precip_daily_max_in <- max(daily.summary$total_precip)

#if else statement evaluates the amount of unique hours and if 24,
#then hourly max is taken. If not, hourly max is NA
if(length(unique(df$hr)) == 24){
precip_hourly_max_in <- max(df$precip_in)
} else {
precip_hourly_max_in <- NA
}
#Alternatively to a null value for hourly precip in daily data,
#we could look at the rainfall distribution table for a
#type II storm and multiply by the max P(t)/P(24) value.
#this function uses NULL for daily data

#l90 using zoo package
l90_precip_in <-
min(rollapply(daily.summary$total_precip, width = 90, FUN = mean,
fill = NA, align = "right"),
na.rm = TRUE)

#Qout_zoo <- zoo(as.numeric(df$precip_in), order.by = df$tstime)
#Qout_g2 <- data.frame(group2(Qout_zoo))

#l90 done using zoo package. IHA did not work.

#makes data frame with all 8 metrics
metrics<- data.frame(precip_annual_max_in = precip_annual_max_in,
precip_annual_max_year = precip_annual_max_year,
precip_annual_mean_in = precip_annual_mean_in,
precip_annual_min_in = precip_annual_min_in,
precip_annual_min_year = precip_annual_min_year,
precip_daily_max_in = precip_daily_max_in,
precip_hourly_max_in = precip_hourly_max_in,
l90_precip_in = l90_precip_in)
}`

I can run through this in detail next week since I know this is a lot to look at. I'll be out of town tomorrow afternoon through Sunday and likely out of service, but I will do my best to respond before I leave or after I get back this weekend. In the meantime, I wanted to make sure this is stored here.

rburghol · 2024-10-10T17:02:28Z

@nathanielf22 hey Nate, thanks a bunch for pushing this up before you headed out. Very much look forward to working with this, and congratulations on the progress.

nathanielf22 · 2024-10-25T13:46:31Z

10/25 meeting:

Review REST services to input metrics
Github push troubleshooting
Review VAHydro_metric_script.R
new: yearly_metrics.R

rburghol · 2024-10-25T16:06:42Z

@nathanielf22 if you can post up the input variables that created that strange output, I will check on my machine. But, from my recollection, I think that we used the wrong coverage_ftype, and in my testing with another coverage (details below), I got correct results. The ftype should be usgs_full_drainage for those USGS gage watershed features.


scenario_name <- 'daymet2'
coverage_hydrocode <- 'usgs_ws_01673550'
coverage_bundle <- 'watershed'
coverage_ftype <- 'usgs_full_drainage'
model_version <- 'cbp-6.1'
met_file <- 'http://deq1.bse.vt.edu:81/met/daymet/precip/usgs_ws_01673550-daymet-all.csv'

rburghol · 2024-10-25T16:12:48Z

@nathanielf22 -- I have one observation about the script as is. I tested it in full, and it works! The only thing you should update is the variable name summary should be changed to something else. The reason being, there is an existing R function called summary(), and when you assign the output of summary_analytics() to summary that function gets overwritten, and that is a fairly important function that is used a good deal. precip_summary might be a good option :).

nathanielf22 · 2024-11-14T13:29:44Z

@rburghol @COBrogan -- The code we put together last Friday to integrate into the workflow has an issue with the daymet and NLDAS data and should be updated. when including the n>=365days limitation, daymet excludes certain years where it only has 364 days as shown below:

In addition, NLDAS has an issue because yearly summary creation happens before daily summaries, which I switched around.
This data is for gage usgs_ws_01613900.

COBrogan · 2024-11-14T13:47:39Z

@nathanielf22 The daymet dataset will always have 365 days in each year. They always exclude December 31st on leap years (but include February 29th, which is an unusual set-up). But, remember that daymet was based in the GMT time zone. That means that each day in EST runs from 7AM to 7AM the next day. So, when grouping data by year, it matters if you are grouping by tstime or tsendtime.

In this case, we can trace through our routines and see that we are grouping data by tsendtime (a confusing by-product of how PRISM operates and an assumption we made about daymet based on a daymet-PRISM regression). That means that leap years will always have 366 data points and the year after will have 364. To better demonstrate this, we are essentially beginning our year at 7 AM January 2nd and ending it on 7AM Dec 31. So, since a daymet leap year does not have Dec 31st, it must have 364 days.

My suggestion for now is to include years >= 364 days but let's discuss at our next meeting!

For reference, note that daymet has 365 days each year when grouping by tstime:

select count(ts.rast),extract(year from to_timestamp(ts.tstime))
from dh_timeseries_weather as ts
left join dh_variabledefinition var on ts.varid = var.hydroid
where var.varkey = 'daymet_mod_daily'
group by extract(year from to_timestamp(ts.tstime));

 count | extract
-------+---------
   365 |    1980
   365 |    1981
   365 |    1982
   365 |    1983
   365 |    1984
   365 |    1985
   365 |    1986
   365 |    1987
   365 |    1988
   365 |    1989
   365 |    1990
   365 |    1991

rburghol assigned rburghol, COBrogan, nathanielf22, mwdunlap2004 and ilonah22 Sep 12, 2024

rburghol closed this as completed Sep 23, 2024

rburghol reopened this Sep 25, 2024

rburghol mentioned this issue Sep 25, 2024

Week 09/16 - 09/23 #1355

Open

29 tasks

nathanielf22 unassigned rburghol, COBrogan, nathanielf22, mwdunlap2004 and ilonah22 Oct 3, 2024

rburghol mentioned this issue Nov 8, 2024

Workflow geo HARPgroup/model_meteorology#65

Open

rburghol mentioned this issue Nov 19, 2024

Meteorology Analytics HARPgroup/model_meteorology#94

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST / RomDataSource, metrics and analysis #1354

REST / RomDataSource, metrics and analysis #1354

rburghol commented Sep 12, 2024 •

edited

Loading

nathanielf22 commented Oct 10, 2024

rburghol commented Oct 10, 2024

nathanielf22 commented Oct 25, 2024 •

edited

Loading

rburghol commented Oct 25, 2024 •

edited

Loading

rburghol commented Oct 25, 2024

nathanielf22 commented Nov 14, 2024

COBrogan commented Nov 14, 2024

REST / RomDataSource, metrics and analysis #1354

REST / RomDataSource, metrics and analysis #1354

Comments

rburghol commented Sep 12, 2024 • edited Loading

Analysis Steps

Overview

Data Model & REST

Data Model Outline

REST Find feature, Model and Scenario

Detail View of REST Feature Query

Using om_vahydro_metric_grid to retrieve data

nathanielf22 commented Oct 10, 2024

For ANY data

Create daily summary to use for all data. This makes hourly data daily sums.

rburghol commented Oct 10, 2024

nathanielf22 commented Oct 25, 2024 • edited Loading

rburghol commented Oct 25, 2024 • edited Loading

rburghol commented Oct 25, 2024

nathanielf22 commented Nov 14, 2024

COBrogan commented Nov 14, 2024

rburghol commented Sep 12, 2024 •

edited

Loading

nathanielf22 commented Oct 25, 2024 •

edited

Loading

rburghol commented Oct 25, 2024 •

edited

Loading