-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEO_MET_MODEL option 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis #60
Comments
Current workflow: |
FYI @ilonah22 I've merged this branch ( |
@rburghol How do I specify what
|
@COBrogan looks like you set it ip correctly to me -- you may want to verify the Feel free to edit the geo.config file to debug and maybe also grep the files to see if I hard-coded that param by accident somewhere, like in |
Oh!! @COBrogan I Just saw you set the config in |
@rburhol The
This command is a product of the last line of the
|
I've been running into a few issues while trying to run the storm workflow on a different gage, but this is the only one I have gotten really stuck on so far while running stormEventsLM_cmd.R:
I have been making a few minor changes, but nothing seems to be working, so I was wondering if you had a similar issue or if this error means I messed something up in one of the earlier steps. |
Hi @ilonah22 . If you pull the
I also have gotten everything up and running in the meta model. We should now be able to use the model on the server to run these, which will make it much easier to run multiple scenarios (no need to enter file paths, copy+paste files, etc.). Do you have some availability this afternoon to go over that? Maybe around 1:30? We could get you set-up in the model so you can run gages easily |
@rburghol after some testing, it looks like @ilonah22 also doesn't have the write permissions to
I tried
|
@ilonah22 I really like what I'm seeing, these are effective ways to compare multiple times series, and also two highlight the areas that you think we should pay more attention to. I would be interested in a similar thing using different rainfall data steps. With that in mind, what rainfall data sets were you using for these? |
FYI working on multi gage model run but issues with sftp call in sbatch: MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT MODEL_BIN SCRIPT_DIR
gage_coverage_file="usgsgageList.csv"
gage_coverage_SQLfile=usgsgageList.sql
gageSQL="
\\set fname '/tmp/${gage_coverage_file}' \n
copy ( select hydrocode
FROM dh_feature
WHERE bundle = 'watershed' AND ftype = 'usgs_full_drainage'
) to :'fname';"
# turn off the expansion of the asterisk
set -f
echo -e $gageSQL > $gage_coverage_SQLfile
cat $gage_coverage_SQLfile | psql -h dbase2 -d drupal.dh03
sftp dbase2:"/tmp/${gage_coverage_file}" "/home/cobrogan/${gage_coverage_file}"
filename="/home/cobrogan/${gage_coverage_file}"
for i in `cat $filename`; do
echo $i
done
filename="/home/cobrogan/${gage_coverage_file}"
for i in `cat $filename`; do
echo "Running: sbatch /opt/model/meta_model/run_model raster_met stormVol_prism \"$i\" auto geo"
sbatch /opt/model/meta_model/run_model raster_met stormVol_prism "$i" auto geo
done |
@COBrogan One thing to check is that you have done If that doesn't fix it for you, paste your error, message up here and I'll take a look . |
@rburghol All of this is using PRISM, how would I use a different source? Just changing stormVol_prism to stormVol_daymet? |
@ilonah22 Yep. Theoretically you can call stormVol_daymet and stormVol_nldas2 as the scenario. I haven't tested these yet so it would be interesting to see if they work. |
That makes sense, but I don't have the permissions to create a home directory for myself in dbase2 (nor could I sudo to do it). Would you mind setting that up Rob? Brendan said he could set one up in dbase1, but I don't know if that's ideal since our configs all point to 2 |
Thanks @ilonah22 . I'm working on debugging the remaining gages. Seems like minor stuff here and there. At least one gage was failing because there was an entire month of 0 cfs flow and |
@COBrogan, that sounds good, I really quickly found the gages with the highest and lowest mean r-squared to plot. Here is the highest, with no issues I can see just from looking at the plot: Here is the lowest (non-zero), where you can see that there are only 4 r-squared monthly values above 0 and multiple NA values: If this happening to one gage without breaking, that probably means it is happening to others. |
Morning, Great work. Wondering what the R2 metric vs. Drainage area relationship looks like. |
Thanks @ilonah22. What were the gage IDs for these two examples? And were you able to look at any of the residuals for the gage that had those high R^2 values? It might give us some good insight into how the regression is performing with large + small events across the months. It is interesting that there aren't many seasonal impacts at this particular gage. As for the missing R^2 values, I believe this is intentional. If there is insufficient data to conduct the regression, the algorithm will just return NA for that month. This could happen if the Storm Separation routine found 1 - 3 storms for that month (most likely due to a small gage record). We can look into it further, but the negative R^2 values also indicate that sample size is likely low. We can probably confirm this by looking at the residual plots for February (best performing month) and July (worst performing month)` |
@COBrogan this looks really cool to me. I think that it is probably time for us to get together a good data model for storing this data on VAHydro, so that we can easily do some analysis of summaries statistics over thewhole data set. |
I think at this point all gages have been run with a few exceptions:
|
@COBrogan - great summary, FYI I am currently doing a variation on calc_raster_ts.sh to handle resampling and then a further variation to handle resampling plus temporal dissagregation. I want to only have a single routine, but I feel like the complexity of of differences between the 3 queries warrants it. Currently tracking this in #72 though the complexity of the op probably warrants its own issue. |
@COBrogan , The highest R-squared gage was 01652500 and the lowest non-zero was 01671025. |
@ilonah22 I've kicked off a daymet batch run. We'll see tomorrow how many run successfully, but I definitely see the ratings files being generated so it's working for at least most of the gages! |
Update: Just started running NLDAS2. 184 gages successfully ran for daymet. One thing that will be worth identifying is any gages that ran for one method, but not for another. They could reveal errors in our meta model set-up or be interesting case studies. |
NLDAS has finished running and ran for 156 gages. I'm guessing some of the smaller gages did not run successfully, but this would be another interesting thing to look into as we track which gages ran for which methods. I think we should try to quantify the following:
@rburghol, you may be interested to see the performance difference in December. Could affect our recharge. It makes me very excited to see these model runs. Plot R Code
library(tidyverse)
#Assumes all ratings files are in a folder ratings/daymet/ or ratings/prism/
for(j in c("daymet","prism","nldas")){
print(j)
#Set path to read in ratings based on data source of outer loop
pathToread <- paste0("ratings/",j,"/")
for( i in list.files(pathToread) ){
#For each file in the directory, read in the file i as a csv
filei <- read.csv(paste0(pathToread,i))
#Store the gage number
filei$gage <- gsub(".*_(\\d+)-.*","\\1",i)
#Combine the file into a single data frame
if(!exists("combineFile")){
combineFile <- filei
}else{
combineFile <- rbind(combineFile,filei)
}
}
#Assign to a specific variable and delete the generic combineFile
assign(paste0('combineFile',j),combineFile)
rm(combineFile)
}
#Join the daymet and prism data together
joinData <- combineFileprism %>%
#Remove the row numbers (not important) and rename the r_squared for clarity
select(-X,prismRating = r_squared) %>%
#Join in the dayment data, but first rename the r_squared column for clarity.
#Join on gage and month
left_join(combineFiledaymet %>%
select(daymetRating = r_squared,gage,mo),
by = c("gage","mo")) %>%
#Join in the nldas data, but first rename the r_squared column for clarity.
#Join on gage and month
left_join(combineFilenldas %>%
select(nldasRating = r_squared,gage,mo),
by = c("gage","mo")) %>%
#Pivot it longer to have a column with the data source and one for the
#ratings, for plotting ease
pivot_longer(c(prismRating,daymetRating,nldasRating),
names_to = 'dataSource',
values_to = 'rating')
#Plot the data
ggplot(data = joinData) +
#Box plot, with month on the x-axis (note that it must be a factor) and rating
#on the y-axis. Create separate boxes for each datasource via color
geom_boxplot(aes(as.factor(mo),rating,color = dataSource)) +
xlab(element_blank()) + ylab("Adjusted R^2") +
#Limit the axis between 0 - 1, removing negative adjusted R squared values
coord_cartesian(ylim = c(0,1)) +
#Set a classic theme and give some gridlines for ease of reference
theme_classic() +
theme(
panel.grid.major.y = element_line(linetype = 3, color = "grey50")
) +
#Color the boxes
scale_color_manual(values = c("dodgerblue3", "violetred4","black")) |
These look interesting @COBrogan @ilonah22 -- I see the December drop off but these are still high explanatory variables, over 40% for the median is quite good and I would not expect much better anyhow. I am very interested in the comparisons of data sources for a single watersheds since that is where model improvement can occur. |
@COBrogan - I amend my above statement, I just realized that the median December NLDAS2 R^2 was only 25% while the |
I just finished determining which gages ran for which methods. 184 gages ran for daymet and nldas2, 188 gages ran for prism and 179 ran for all 3. I am planning on picking one or two gages that ran for all 3 to compare across the data sources. |
I finished making the combined dataset, including drainage area. Most of the code is the same as what @COBrogan mentioned above, but I added the part at the end for drainage area. Combined Data with Drainage Area
for(j in c("daymet","prism","nldas")){
print(j)
#Set path to read in ratings based on data source of outer loop
pathToread <- paste0("*filepath*/Ratings/",j,"/")
for( i in list.files(pathToread) ){
filei <- read.csv(paste0(pathToread,i))
#Store the gage number
filei$gage <- gsub(".*_(\\d+)-.*","\\1",i)
#Combine the file into a single data frame
if(!exists("combineFile")){
combineFile <- filei
}else{
combineFile <- rbind(combineFile,filei)
}
}
#Assign to a specific variable and delete the generic combineFile
assign(paste0('combineFile',j),combineFile)
rm(combineFile)
}
#Join the daymet and prism data together
joinData <- combineFileprism %>%
#Remove the row numbers (not important) and rename the r_squared for clarity
select(-X,prismRating = r_squared) %>%
#Join in the dayment data, but first rename the r_squared column for clarity.
#Join on gage and month
left_join(combineFiledaymet %>%
select(daymetRating = r_squared,gage,mo),
by = c("gage","mo")) %>%
#Join in the nldas data, but first rename the r_squared column for clarity.
#Join on gage and month
left_join(combineFilenldas %>%
select(nldasRating = r_squared,gage,mo),
by = c("gage","mo")) %>%
#Pivot it longer to have a column with the data source and one for the
#ratings, for plotting ease
pivot_longer(c(prismRating,daymetRating,nldasRating),
names_to = 'dataSource',
values_to = 'rating')
# Import and add drainage area from USGS
drainage_area <- dataRetrieval::readNWISsite(c(unique(joinData$gage)))
joinData <- sqldf(
"select a.mo as mo, a.gage as gageid, a.dataSource as dataSource, a.rating as rating,
b.drain_area_va as drainageArea_sqmi
from joinData as a
left outer join drainage_area as b
on (
a.gage = b.site_no
)"
)
# Remove "Ratings" from data source column
joinData$dataSource <- gsub("Rating","",joinData$dataSource) |
After I combined the data sets and added drainage area, I made a quick plot of r-squared vs. drainage area: It was a little hard to tell from that plot what the trend line looked like, so I also created a version that has adjusted y scale: I thought the difference in accuracy between small and large drainage areas would be bigger, but this is very preliminary and I will look into this a bit more on Wednesday. |
Here is the outline I showed on Monday for separating stormEventsLM_cmd.R: Outline:
On Monday we also decided it would be better to overwrite files rather than making new ones after each step, which will hopefully reduce the amount of extra files produced by separating this script. |
If we look at precipitation and stormflow data on an individual scale, it appears that the total volume of precipitation may be related to the volume in the storm hydrograph. This takes a slightly different approach than correlating weekly precip and flow because now we are honing in to individual storm events and looking more specifically at how precipitation increases flow:

Develop a methodology to separate out storm hydrographs to isolate individual storm events. See stormSep_USGS
Calculate storm hydrograph volume for each event. This is accomplished in stormSep_USGS.R
Analyze relationship between hydrograph volume and precipitation volume

See below for Culpepper gage 01665500. Looking at monthly precip and monthly storm hydrographs (lumping storms into months based solely on their start date. Storms that extend beyond the month are still included in their start month), we get pretty strong correlations in some of our winter months. October in particular has a very good correlation, close to 0.75. Running a power regression slightly worsens the R^2 values, but improves the residual error analysis:
Current thoughts are that we may see good relationships with this, but we may run into some travel time issues as we move downstream. Should only cause issues for systems that have more than 1-day of travel time, which are likely to be relatively large systems (greater than ~200 square miles based on super rough math of a circular watershed with a radius of 16 miles, which assumes travel time of 1-day at 1 ft/s). Visualizations will help us get a feel for how affected our results are by travel time as we expand this routine to more gages
Determine means to analyze which dataset performs the best - Following Workflow step
geo -> process -> 04_weekly_data
: Create R script to produce weekly mean value file #61 , we can select the "best" dataset as that that performs the best for that month. So, we can group our events up by month across the record period and run the regression for each month and determine performanceFind means to determine missing precipitation events (e.g. low precip volume) for corresponding hydrographs
Plot each storm hydrograph with precipitation and separate into its own steps
Analysis steps cannot depend on a list of storm events. Because these steps have been divided out, they must instead rely on a data.frame that has storm IDs built into it and the stormStats must have the corresponding ID
Important R Scripts: stormSeparate_USGS.R, plotStorm.R, plotStormSep_cmd.R, stormAnalysis_cmd.R, stormEventsLM_cmd.R, stormSep_cmd.R
In scenario con e.g.
p6/cbp_wsm/config/control/met/stormVol_prism.con
,GEO_MET_MODEL
must bestorm__volume
Several important variables defined in
geo.config
:Several important variables defined in scenario config e.g.
stormVol_prism.con
:Example Model Run No Plots
Example Model Run With Plots
geo
#6505_stormSeparate
= stormSep_cmd.R$STORM_EVENT_FLOW_FILE
Inputs
$COVERAGE_FLOW_FILE
$SEARCH_ALL_STORM
$BASELINE_FLOW_METHOD
$STORM_EVENT_FLOW_FILE
06_stormStatistics
= stormAnalysis_cmd.R$STORM_EVENT_STATS_FILE
Inputs
$STORM_EVENT_FLOW_FILE
$STORM_EVENT_STATS_FILE
02_model_storm
(was02_stormVolumeRegression
) calls stormEventsLM_cmd.R$MODEL_STATS
) and json output ($MODEL_JSON
) from mon_lm functionInputs
$DAILY_PRECIP_FILE
$STORM_EVENT_STATS_FILE
$STORM_EVENT_FLOW_FILE
$STORM_INCLUDE_DURATION
$STORMSEP_REGRESSION_METHOD
05_plot_storm
$STORMSEP_PLOT
in config is set to TRUEInputs
$STORM_EVENT_FLOW_FILE
$STORM_EVENT_STATS_FILE
$STORM_EVENT_PLOT_DIR
$USGS_GAGE
06_plot_stormVolume
$STORMSEP_PLOT
in config is set to TRUEInputs
$DAILY_PRECIP_FILE
$STORM_EVENT_FLOW_FILE
$STORM_EVENT_STATS_FILE
$STORM_EVENT_PRECIP_PLOT_DIR
$USGS_GAGE
The text was updated successfully, but these errors were encountered: