Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEO_MET_MODEL option 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis #60

Open
5 of 7 tasks
COBrogan opened this issue Jun 27, 2024 · 35 comments
Open
5 of 7 tasks
Assignees

Comments

@COBrogan
Copy link
Collaborator

COBrogan commented Jun 27, 2024

If we look at precipitation and stormflow data on an individual scale, it appears that the total volume of precipitation may be related to the volume in the storm hydrograph. This takes a slightly different approach than correlating weekly precip and flow because now we are honing in to individual storm events and looking more specifically at how precipitation increases flow:
stormPlot14

  • Develop a methodology to separate out storm hydrographs to isolate individual storm events. See stormSep_USGS

  • Calculate storm hydrograph volume for each event. This is accomplished in stormSep_USGS.R

  • Analyze relationship between hydrograph volume and precipitation volume
    See below for Culpepper gage 01665500. Looking at monthly precip and monthly storm hydrographs (lumping storms into months based solely on their start date. Storms that extend beyond the month are still included in their start month), we get pretty strong correlations in some of our winter months. October in particular has a very good correlation, close to 0.75. Running a power regression slightly worsens the R^2 values, but improves the residual error analysis:
    image
    Current thoughts are that we may see good relationships with this, but we may run into some travel time issues as we move downstream. Should only cause issues for systems that have more than 1-day of travel time, which are likely to be relatively large systems (greater than ~200 square miles based on super rough math of a circular watershed with a radius of 16 miles, which assumes travel time of 1-day at 1 ft/s). Visualizations will help us get a feel for how affected our results are by travel time as we expand this routine to more gages

  • Determine means to analyze which dataset performs the best - Following Workflow step geo -> process -> 04_weekly_data: Create R script to produce weekly mean value file #61 , we can select the "best" dataset as that that performs the best for that month. So, we can group our events up by month across the record period and run the regression for each month and determine performance

  • Find means to determine missing precipitation events (e.g. low precip volume) for corresponding hydrographs

  • Plot each storm hydrograph with precipitation and separate into its own steps

  • Analysis steps cannot depend on a list of storm events. Because these steps have been divided out, they must instead rely on a data.frame that has storm IDs built into it and the stormStats must have the corresponding ID

Important R Scripts: stormSeparate_USGS.R, plotStorm.R, plotStormSep_cmd.R, stormAnalysis_cmd.R, stormEventsLM_cmd.R, stormSep_cmd.R

In scenario con e.g. p6/cbp_wsm/config/control/met/stormVol_prism.con, GEO_MET_MODEL must be storm__volume

Several important variables defined in geo.config:

# Storm separation outputs:
STORM_EVENT_FLOW_FILE="$MET_EXPORT_DIR/$scenario/flow/${coverage}-stormevent-flow.csv"
STORM_EVENT_STATS_FILE="$MET_EXPORT_DIR/$scenario/flow/${coverage}-stormevent-stats.csv"
STORM_EVENT_PRECIP_PLOT_DIR="$MET_EXPORT_DIR/$scenario/plots/stormPlots/"

Several important variables defined in scenario config e.g. stormVol_prism.con:

#Set additional variables for Storm Separation Routine:
SEARCH_ALL_STORM="TRUE"
BASELINE_FLOW_METHOD="Month"
STORM_INCLUDE_DURATION=1
STORMSEP_REGRESSION_METHOD="LINEAR"
STORMSEP_PLOT="TRUE"

Example Model Run No Plots

MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT MODEL_BIN SCRIPT_DIR

cd /backup/meteorology

/opt/model/meta_model/run_model raster_met stormVol_prism usgs_ws_02021500 auto geo download
/opt/model/meta_model/run_model raster_met stormVol_prism usgs_ws_02021500 auto geo process
/opt/model/meta_model/run_model raster_met stormVol_prism usgs_ws_02021500 auto geo analyze

Example Model Run With Plots

MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT MODEL_BIN SCRIPT_DIR

cd /backup/meteorology

/opt/model/meta_model/run_model raster_met stormVolPlot_prism usgs_ws_02075500 auto geo download
/opt/model/meta_model/run_model raster_met stormVolPlot_prism usgs_ws_02075500 auto geo process
/opt/model/meta_model/run_model raster_met stormVolPlot_prism usgs_ws_02075500 auto geo analyze
  • geo
    • download
      • NA
    • process see Workflow geo #65
      • 05_stormSeparate = stormSep_cmd.R
        • Write out the full flow data with the stormIDs to create a file from which the storms may easily be extracted
        • Outputs a csv that contains the base flow, the total flow, and baseline flow used to parse out "too small storms" via $STORM_EVENT_FLOW_FILE
          Inputs
          1. The path to the USGS gage data downloaded in a previous step of geo e.g. $COVERAGE_FLOW_FILE
          2. allMinimaStorms = Considers ALL local minima as potential storm endpoints. Will distinguish between peaks in the case of a multi-day storm hydrograph if set to TRUE. Otherwise, a storm is only considered as two subsequent local minima that fall below baseline flow. Defined in Config file as $SEARCH_ALL_STORM
          3. baselineFlowOption = One of c("Water Year","Month","Calendar Year","Climate Year","All"). Defaults to "All". Determines how program will develop baseline flow. Uses horizontal line regression of baseflow and breaks it down based on all time, water year, calendar year, or month based on user selection. Defined in config as $BASELINE_FLOW_METHOD
          4. pathToWrite = the path to write out csv output files to $STORM_EVENT_FLOW_FILE
      • 06_stormStatistics = stormAnalysis_cmd.R
        • Outputs a csv that contains several statsitcs about each storm. This includes event_start, event_end, USGS storm volume, precip volume, rising/falling limbs, durations, etc. Writes to $STORM_EVENT_STATS_FILE
          Inputs
          1. The path to the storm separated data from step 02 for this gage e.g. $STORM_EVENT_FLOW_FILE
          2. pathToWrite = the path to write out csv output files to $STORM_EVENT_STATS_FILE
    • analyze
      • 02_model_storm (was 02_stormVolumeRegression) calls stormEventsLM_cmd.R
        • Takes in storm statistics for each event found in 02-stormSeparate as generated in 03-stormStatistics and compares the stream volume above baseflow to the precipitation volume that occurs during the storm hydrograph event separated in 02-stormSeparate. Can optionally take an input to evaluate volume X days before the event in addition to the event itself
        • Computes either a monthly linear or monthly log-linear regression to compare the volumes and outputs a csv of statistics ($MODEL_STATS) and json output ($MODEL_JSON) from mon_lm function
          Inputs
          1. The file path to the combined precip and flow data at that USGS gage $DAILY_PRECIP_FILE
          2. The file path to the statistics output by the stormSep_USGS script for each storm found in the hydrographs $STORM_EVENT_STATS_FILE
          3. The file path to the storms output by stormSep_USGS script $STORM_EVENT_FLOW_FILE
          4. The rolling duration of precip to include prior to the storm. So, at rollingDur = 1 all precip will be summed from the storm duration only. At rollingDur = 2, the precip will be summed for the storm duration AND will include 1-day prior to the storm. Defined in config file as $STORM_INCLUDE_DURATION
        1. STORMSEP_REGRESSION_METHOD= Should the regressions performed be power regression or linear regression? Defined in config file as $STORMSEP_REGRESSION_METHOD
      • 05_plot_storm
        • A script that will plot each individual storm event identified in 02-stormSeparate and will include the stream flow, baseflow, baseline flow, and regressions for the rising and falling limb. Only operates if $STORMSEP_PLOT in config is set to TRUE
          Inputs
          1. The path to the storm separated data from step 02 for this gage $STORM_EVENT_FLOW_FILE
          2. The path to the statistics generated for the storms $STORM_EVENT_STATS_FILE
          3. The directory to store the plots in, defined in config as $STORM_EVENT_PLOT_DIR
          4. The USGS gage name for plot ID defined in config as $USGS_GAGE
      • 06_plot_stormVolume
        • A script that will plot each individual storm event identified in 02-stormSeparate and will include the stream flow, baseflow, baseline flow, and precip from each data source. The plot will include the event and 1-week prior to and after the start/end of the event. Only operates if $STORMSEP_PLOT in config is set to TRUE
          Inputs
          1. Combined daily stream and precip data for a data source from $DAILY_PRECIP_FILE
          2. The path to the storm separated data from step 02 for this gage $STORM_EVENT_FLOW_FILE
          3. The path to the statistics generated for the storms $STORM_EVENT_STATS_FILE
          4. The directory to store the plots in, defined in config as $STORM_EVENT_PRECIP_PLOT_DIR
          5. The USGS gage name for plot ID defined in config as $USGS_GAGE
  • amalgamate
@rburghol rburghol changed the title Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis Workflow 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis Jun 28, 2024
@COBrogan
Copy link
Collaborator Author

COBrogan commented Jul 16, 2024

Current workflow:
Get USGS data: $ Rscript functions/usgsdata.R '02037500' "$HOME/usgsGageCMD.csv"
Run stormSep_cmd.R: Rscript ../../meta_model/scripts/met/stormSep_cmd.R "${HOME}/usgsGageCMD.csv" 'TRUE' 'Month' "${HOME}/Gage02037500_StormflowData.csv"
Run the analysis on the storms and output STATs: Rscript ../../meta_model/scripts/met/stormAnalysis_cmd.R "${HOME}/Gage02037500_StormflowData.csv" "${HOME}/Gage2037500_StormStats.csv"
Get linear regressions: Rscript ../../meta_model/scripts/met/stormEventsLM_cmd.R "${HOME}/usgs_ws_02037500-PRISM-all.csv" "${HOME}/Gage2037500_StormStats.csv" "${HOME}/Gage2037500_StormflowData.csv" 1 "${HOME}/modelJSON.json" "${HOME}/stormsepLMResults.csv"
Run plots:
Rscript ../../meta_model/scripts/met/plotStormEvent_cmd.R "${HOME}/usgs_ws_02037500-PRISM-all.csv" "${HOME}/Gage2037500_StormflowData.csv" "${HOME}/Gage2037500_StormStats.csv" "{$HOME}/stormPlotsNew" "02037500" "PRISM"

@COBrogan
Copy link
Collaborator Author

FYI @ilonah22 I've merged this branch (geo-stormSep) into main. I will be working on getting the meta model running tomorrow hopefully.

@COBrogan
Copy link
Collaborator Author

@rburghol How do I specify what GEO_MET_MODEL I'm running? I changed the following to set GEO_MET_MODEL to storm_volume
nano /opt/model/p6/cbp_wsm/config/control/met/PRISM.con
However, when I run the following process command, an echo check confirms that GEO_MET_MODEL is still set to simple_lm. Am I missing a config file somewhere?

/opt/model/meta_model/run_model raster_met PRISM usgs_ws_02021500 auto geo process 05

@rburghol
Copy link
Contributor

@COBrogan looks like you set it ip correctly to me -- you may want to verify the cbp get_config... command that is run in geo.config at the command line -- also make sure you run from the p6/vadeq directory.

Feel free to edit the geo.config file to debug and maybe also grep the files to see if I hard-coded that param by accident somewhere, like in raster.config

@rburghol
Copy link
Contributor

rburghol commented Jul 30, 2024

Oh!! @COBrogan I Just saw you set the config in cbp_wsm/config. That setting should be applied in vadeq/config/... -- it should all work then. If I indicated cbp_wsm in some of the instructions that was an error on my part -- apologies.

@COBrogan
Copy link
Collaborator Author

COBrogan commented Jul 31, 2024

@rburhol The storm_volume workflow seems to be working! I ran it with gage 02021500 and successfully generated the monthly rating file for PRISM. I haven't tested the plotting yet, but I want to work on that a little more locally first. However, I did encounter a few permissions errors that I wanted to note here:

  1. I had to manually change the model steps (e.g. analyze/02_model_storm) to ensure they had execute privilege for all users
  2. The install commands at the end of analyze/02_model_lm_simple and analyze_02_model_storm don't work for me unless I sudo as you. For instance:
install -D /tmp/usgs_ws_02021500_1722430958_12797/usgs_ws_02021500-PRISM-storm_volume-model.json /media/model/met/PRISM/stats/
install: cannot create regular file '/media/model/met/PRISM/stats/usgs_ws_02021500-PRISM-storm_volume-model.json': Permission denied

This command is a product of the last line of the analyze/02 steps, which are:

install -D $json_file "$MODEL_JSON" 
install -D $rate_file "$RATING_FILE"

@ilonah22
Copy link

I've been running into a few issues while trying to run the storm workflow on a different gage, but this is the only one I have gotten really stuck on so far while running stormEventsLM_cmd.R:

Error in (dateIndex - rollingDur - stormDuration + 2):dateIndex :
  argument of length 0
Calls: mapply -> <Anonymous>

I have been making a few minor changes, but nothing seems to be working, so I was wondering if you had a similar issue or if this error means I messed something up in one of the earlier steps.

@COBrogan
Copy link
Collaborator Author

Hi @ilonah22 . If you pull the main branch of the meta model repository, I suspect your error will be resolved. Just like we had trouble with the dates yesterday, I had some trouble today. This time I had to remove storms that occur after the precip record has ended. So, I had to add the following:

#Remove any storms that occur after precip record
stormStats <- stormStats[stormStats$endDate <= max(comp_data$obs_date),]
stormSepDF <- stormSepDF[stormSepDF$timestamp <= max(comp_data$obs_date),]

I also have gotten everything up and running in the meta model. We should now be able to use the model on the server to run these, which will make it much easier to run multiple scenarios (no need to enter file paths, copy+paste files, etc.). Do you have some availability this afternoon to go over that? Maybe around 1:30? We could get you set-up in the model so you can run gages easily

@COBrogan
Copy link
Collaborator Author

@rburghol after some testing, it looks like @ilonah22 also doesn't have the write permissions to /media/model/met/. She too was getting "Permission Denied" errors when trying to run the following (which worked fine for me now). If you have some availability, I'd appreciate the help in getting her set-up. And I'm happy to hop-on a call to learn more about the groups permissions:

/opt/model/meta_model/run_model raster_met stormVol_prism usgs_ws_01634500 auto geo download

I tried sudo chmod -R 775 /media/model/met/stormVol_prism but was told that this "Operation not permitted". I think I missing something. Ilonah is already a part of the harp group so she would have access to this folder as soon as the right (write and execute) permissions are set:

ls -l /media/model/met/
total 4
lrwxrwxrwx 1 rob      harpmodelers   25 Jul 26 13:35 PRISM -> /backup/meteorology/PRISM
lrwxrwxrwx 1 rob      harpmodelers   26 Jul 26 13:36 daymet -> /backup/meteorology/daymet
lrwxrwxrwx 1 rob      harpmodelers   26 Jul 26 13:35 nldas2 -> /backup/meteorology/nldas2
drwxr-sr-x 6 cobrogan harpmodelers 4096 Jul 31 16:57 stormVol_prism

@rburghol rburghol changed the title Workflow 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis GEO_MET_MODEL option 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis Aug 5, 2024
@ilonah22
Copy link

ilonah22 commented Aug 8, 2024

Here are some of my visual analyses of the ratings from 10 gages produced the by stormVol_prism workflow this week:

MonthlyRSquared
^ Each of the different colored lines represents one of the 10 gages I used, the red boxes show places where I thought I might see a trend.

MonthlyBoxplot
Seasonal Rsquared

After seeing the results of these analysis, right now my next planned steps are:

  • Run some more gages to ascertain the consistency of these trends on a larger scale and see if new trends arise with a larger sample size.
  • Further quantify some of the trends which this initial analysis has uncovered, such as the low r-squared in July and the high r-squared in February and October.

@rburghol
Copy link
Contributor

rburghol commented Aug 8, 2024

@ilonah22 I really like what I'm seeing, these are effective ways to compare multiple times series, and also two highlight the areas that you think we should pay more attention to. I would be interested in a similar thing using different rainfall data steps. With that in mind, what rainfall data sets were you using for these?

@COBrogan
Copy link
Collaborator Author

COBrogan commented Aug 8, 2024

FYI working on multi gage model run but issues with sftp call in sbatch:

MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT MODEL_BIN SCRIPT_DIR

gage_coverage_file="usgsgageList.csv"
gage_coverage_SQLfile=usgsgageList.sql
gageSQL="
\\set fname '/tmp/${gage_coverage_file}' \n
 

copy ( select hydrocode
FROM dh_feature
WHERE bundle = 'watershed' AND ftype = 'usgs_full_drainage'
) to :'fname';"
# turn off the expansion of the asterisk
set -f
echo -e $gageSQL > $gage_coverage_SQLfile 
cat $gage_coverage_SQLfile | psql -h dbase2 -d drupal.dh03
sftp dbase2:"/tmp/${gage_coverage_file}" "/home/cobrogan/${gage_coverage_file}"

filename="/home/cobrogan/${gage_coverage_file}"
for i in `cat $filename`; do
	echo $i
done

filename="/home/cobrogan/${gage_coverage_file}"
for i in `cat $filename`; do
  echo "Running: sbatch /opt/model/meta_model/run_model raster_met stormVol_prism \"$i\" auto geo"
  sbatch /opt/model/meta_model/run_model raster_met stormVol_prism "$i" auto geo 
done

@rburghol
Copy link
Contributor

rburghol commented Aug 8, 2024

@COBrogan One thing to check is that you have done ssh–copy-id to dbase2 (like we did yesterday for deq1), otherwise I don't know if it will allow you to do that file retrieval automatically.

If that doesn't fix it for you, paste your error, message up here and I'll take a look .

@ilonah22
Copy link

ilonah22 commented Aug 9, 2024

@rburghol All of this is using PRISM, how would I use a different source? Just changing stormVol_prism to stormVol_daymet?

@COBrogan
Copy link
Collaborator Author

COBrogan commented Aug 9, 2024

@ilonah22 Yep. Theoretically you can call stormVol_daymet and stormVol_nldas2 as the scenario. I haven't tested these yet so it would be interesting to see if they work.

@COBrogan
Copy link
Collaborator Author

COBrogan commented Aug 9, 2024

@COBrogan One thing to check is that you have done ssh–copy-id to dbase2 (like we did yesterday for deq1), otherwise I don't know if it will allow you to do that file retrieval automatically.

If that doesn't fix it for you, paste your error, message up here and I'll take a look .

That makes sense, but I don't have the permissions to create a home directory for myself in dbase2 (nor could I sudo to do it). Would you mind setting that up Rob? Brendan said he could set one up in dbase1, but I don't know if that's ideal since our configs all point to 2

@COBrogan
Copy link
Collaborator Author

I've kicked off a run of all our gage for this method. Got 156 gages so far. Some months have negative adjusted R^2, which is interesting. I think the adjustment has to do with sample size, so could make sense for a really poor fit with small n (e.g. perhaps a gage that only has a few years of data during our precip data source time period 1980 - present). Anyways, here's a sneak peak:
image

Same image, but now the R^2 axis is limited between 0 to 1
image

@ilonah22
Copy link

ilonah22 commented Aug 14, 2024

After Tuesday 8/13, it looks like 188 gages ran successfully for the prism stormsep workflow. Here's a plot of the density of the mean r-squared values for each gage and of the density of all of the r-squared values.

Rplot

R-squaredDensity_total

@COBrogan
Copy link
Collaborator Author

Thanks @ilonah22 . I'm working on debugging the remaining gages. Seems like minor stuff here and there. At least one gage was failing because there was an entire month of 0 cfs flow and hreg was failing on it. Fixed that, but still more bugs to track down. Thanks for the plots!

@ilonah22
Copy link

ilonah22 commented Aug 14, 2024

@COBrogan, that sounds good, I really quickly found the gages with the highest and lowest mean r-squared to plot.

Here is the highest, with no issues I can see just from looking at the plot:

MonthlyRHighest

Here is the lowest (non-zero), where you can see that there are only 4 r-squared monthly values above 0 and multiple NA values:

MonthlyRLowest
LowestDataTable

If this happening to one gage without breaking, that probably means it is happening to others.

@durelles
Copy link

Morning, Great work. Wondering what the R2 metric vs. Drainage area relationship looks like.

@COBrogan
Copy link
Collaborator Author

Thanks @ilonah22. What were the gage IDs for these two examples? And were you able to look at any of the residuals for the gage that had those high R^2 values? It might give us some good insight into how the regression is performing with large + small events across the months. It is interesting that there aren't many seasonal impacts at this particular gage.

As for the missing R^2 values, I believe this is intentional. If there is insufficient data to conduct the regression, the algorithm will just return NA for that month. This could happen if the Storm Separation routine found 1 - 3 storms for that month (most likely due to a small gage record). We can look into it further, but the negative R^2 values also indicate that sample size is likely low. We can probably confirm this by looking at the residual plots for February (best performing month) and July (worst performing month)`

@rburghol
Copy link
Contributor

@COBrogan this looks really cool to me. I think that it is probably time for us to get together a good data model for storing this data on VAHydro, so that we can easily do some analysis of summaries statistics over thewhole data set.

@COBrogan
Copy link
Collaborator Author

I think at this point all gages have been run with a few exceptions:

  1. usgs_ws_03208800. Recorded no data during the precip record of 1980 - Present and thus could not be run
  2. usgs_ws_02079500. Recorded no data during the precip record of 1980 - Present and thus could not be run
  3. usgs_ws_01636210. Recorded no data during the precip record of 1980 - Present and thus could not be run
  4. usgs_ws_01645784. Too small to run with PRISM under current methodology due to its 0.79 square mile DA
  5. usgs_ws_02050000 did not download, likely due to the high amounts of provisional data at this tidal gage
  6. usgs_ws_01658500 and 02037000 (Richmond Canal) something wrong in stormAnalysis, resulted in 18 warnings but process finished

@rburghol
Copy link
Contributor

@COBrogan - great summary, FYI I am currently doing a variation on calc_raster_ts.sh to handle resampling and then a further variation to handle resampling plus temporal dissagregation. I want to only have a single routine, but I feel like the complexity of of differences between the 3 queries warrants it. Currently tracking this in #72 though the complexity of the op probably warrants its own issue.

@ilonah22
Copy link

@COBrogan , The highest R-squared gage was 01652500 and the lowest non-zero was 01671025.

@COBrogan
Copy link
Collaborator Author

@ilonah22 I've kicked off a daymet batch run. We'll see tomorrow how many run successfully, but I definitely see the ratings files being generated so it's working for at least most of the gages!

@COBrogan
Copy link
Collaborator Author

Update: Just started running NLDAS2. 184 gages successfully ran for daymet. One thing that will be worth identifying is any gages that ran for one method, but not for another. They could reveal errors in our meta model set-up or be interesting case studies.

@COBrogan
Copy link
Collaborator Author

COBrogan commented Aug 28, 2024

NLDAS has finished running and ran for 156 gages. I'm guessing some of the smaller gages did not run successfully, but this would be another interesting thing to look into as we track which gages ran for which methods. I think we should try to quantify the following:

  • How many gages ran for each method? Can we get a table of the gages that ran for one method and not another and include their drainage area (which is accessible from the dataRetrieval pacakge)?
  • Looking at the image below, the performance of NLDAS clearly drops off during December and October. This is interesting. Are there gages that performed well with PRISM and daymet, but poorly with NLDAS in these months. If so, can we find a few hydrographs that show clear differences in our precip data?
  • Can we check residuals for the best performing gages to ensure a good fit with our data?
  • Are there any trends geographically e.g. during our best performing month April, are there geogrpahic trends in the ratings? (We need to provide WKT for the gage DAs for this, I'll work on that today)

@rburghol, you may be interested to see the performance difference in December. Could affect our recharge. It makes me very excited to see these model runs.

Plot R Code

library(tidyverse)
#Assumes all ratings files are in a folder ratings/daymet/ or ratings/prism/
for(j in c("daymet","prism","nldas")){
  print(j)
  #Set path to read in ratings  based on data source of outer loop
  pathToread <- paste0("ratings/",j,"/")
  for( i in list.files(pathToread) ){
    #For each file in the directory, read in the file i as a csv
    filei <- read.csv(paste0(pathToread,i))
    #Store the gage number
    filei$gage <- gsub(".*_(\\d+)-.*","\\1",i)
    #Combine the file into a single data frame
    if(!exists("combineFile")){
      combineFile <- filei
    }else{
      combineFile <- rbind(combineFile,filei)
    }
  }
  #Assign to a specific variable and delete the generic combineFile
  assign(paste0('combineFile',j),combineFile)
  rm(combineFile)
}

#Join the daymet and prism data together
joinData <- combineFileprism %>% 
  #Remove the row numbers (not important) and rename the r_squared for clarity
  select(-X,prismRating = r_squared) %>% 
  #Join in the dayment data, but first rename the r_squared column for clarity.
  #Join on gage and month
  left_join(combineFiledaymet %>% 
              select(daymetRating = r_squared,gage,mo),
            by = c("gage","mo")) %>% 
  #Join in the nldas data, but first rename the r_squared column for clarity.
  #Join on gage and month
  left_join(combineFilenldas %>% 
              select(nldasRating = r_squared,gage,mo),
            by = c("gage","mo")) %>% 
  #Pivot it longer to have a column with the data source and one for the
  #ratings, for plotting ease
  pivot_longer(c(prismRating,daymetRating,nldasRating),
               names_to = 'dataSource',
               values_to = 'rating')

#Plot the data
ggplot(data = joinData) +
  #Box plot, with month on the x-axis (note that it must be a factor) and rating
  #on the y-axis. Create separate boxes for each datasource via color
  geom_boxplot(aes(as.factor(mo),rating,color = dataSource)) +
  xlab(element_blank()) + ylab("Adjusted R^2") +
  #Limit the axis between 0 - 1, removing negative adjusted R squared values
  coord_cartesian(ylim = c(0,1)) + 
  #Set a classic theme and give some gridlines for ease of reference
  theme_classic() +
  theme(
    panel.grid.major.y = element_line(linetype = 3, color = "grey50")
  ) + 
  #Color the boxes
  scale_color_manual(values = c("dodgerblue3", "violetred4","black"))

image

@rburghol
Copy link
Contributor

These look interesting @COBrogan @ilonah22 -- I see the December drop off but these are still high explanatory variables, over 40% for the median is quite good and I would not expect much better anyhow.

I am very interested in the comparisons of data sources for a single watersheds since that is where model improvement can occur.

@rburghol
Copy link
Contributor

rburghol commented Aug 30, 2024

@COBrogan - I amend my above statement, I just realized that the median December NLDAS2 R^2 was only 25% while the daymet and PRISM median's appear to be over 40% which could certainly drive some recharge improvement! Of course we still need to check these side by side in single watersheds to be sure that it's not just illusory, but very hopeful! In that interest, I'd like to see if we can get @ilonah22 trained up on running these batches (or is that already done?) and replicate this for the simple method to see what we see in that respect.

@ilonah22
Copy link

ilonah22 commented Sep 3, 2024

I just finished determining which gages ran for which methods. 184 gages ran for daymet and nldas2, 188 gages ran for prism and 179 ran for all 3. I am planning on picking one or two gages that ran for all 3 to compare across the data sources.

@ilonah22
Copy link

ilonah22 commented Sep 9, 2024

I finished making the combined dataset, including drainage area. Most of the code is the same as what @COBrogan mentioned above, but I added the part at the end for drainage area.

Combined Data with Drainage Area

for(j in c("daymet","prism","nldas")){
  print(j)
  #Set path to read in ratings based on data source of outer loop
  pathToread <- paste0("*filepath*/Ratings/",j,"/")
  for( i in list.files(pathToread) ){
    filei <- read.csv(paste0(pathToread,i))
    #Store the gage number
    filei$gage <- gsub(".*_(\\d+)-.*","\\1",i)
    #Combine the file into a single data frame
    if(!exists("combineFile")){
      combineFile <- filei
    }else{
      combineFile <- rbind(combineFile,filei)
    }
  }
  #Assign to a specific variable and delete the generic combineFile
  assign(paste0('combineFile',j),combineFile)
  rm(combineFile)
}

#Join the daymet and prism data together
joinData <- combineFileprism %>% 
  #Remove the row numbers (not important) and rename the r_squared for clarity
  select(-X,prismRating = r_squared) %>% 
  #Join in the dayment data, but first rename the r_squared column for clarity.
  #Join on gage and month
  left_join(combineFiledaymet %>% 
              select(daymetRating = r_squared,gage,mo),
            by = c("gage","mo")) %>% 
  #Join in the nldas data, but first rename the r_squared column for clarity.
  #Join on gage and month
  left_join(combineFilenldas %>% 
              select(nldasRating = r_squared,gage,mo),
            by = c("gage","mo")) %>% 
  #Pivot it longer to have a column with the data source and one for the
  #ratings, for plotting ease
  pivot_longer(c(prismRating,daymetRating,nldasRating),
               names_to = 'dataSource',
               values_to = 'rating')


# Import and add drainage area from USGS
drainage_area <- dataRetrieval::readNWISsite(c(unique(joinData$gage)))
joinData <- sqldf(
  "select a.mo as mo, a.gage as gageid, a.dataSource as dataSource, a.rating as rating,
  b.drain_area_va as drainageArea_sqmi
  from joinData as a
  left outer join drainage_area as b
  on (
   a.gage = b.site_no
  )"
)

# Remove "Ratings" from data source column
joinData$dataSource <- gsub("Rating","",joinData$dataSource)

@ilonah22
Copy link

ilonah22 commented Sep 9, 2024

After I combined the data sets and added drainage area, I made a quick plot of r-squared vs. drainage area:

It was a little hard to tell from that plot what the trend line looked like, so I also created a version that has adjusted y scale:

I thought the difference in accuracy between small and large drainage areas would be bigger, but this is very preliminary and I will look into this a bit more on Wednesday.

@ilonah22
Copy link

ilonah22 commented Dec 4, 2024

Here is the outline I showed on Monday for separating stormEventsLM_cmd.R:

Outline:

  1. QC Storms
    • Inputs:
      • StormStatsPath
      • StormPath
      • comp_dataFilePath
    • Outputs:
      • StormStats
      • StormEvents
  2. Finding Rolling Precip
    • Inputs:
      • StormStats
      • comp_data
    • Outputs:
      • StormStats
      • comp_data
  3. Conduct Regression
    • Inputs:
      • StormStats
    • Outputs:
      • monthEventOut
      • Ratings
  4. Predict Flow
    • Inputs:
      • StormStats
      • monthEventOut
    • Outputs:
      • StormStats (with predicted flow and rating)

On Monday we also decided it would be better to overwrite files rather than making new ones after each step, which will hopefully reduce the amount of extra files produced by separating this script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants