-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Week 09/16 - 09/23 #1355
Comments
Here is the path to the file I was working on today which finds max and min r-squared ranges on the data-analysis branch: HARP-2024-2025/DrainageAreaAnalysis.R |
Thanks @ilonah22 ! @nathanielf22 here is a useful GIS in R reference I've used in the past: Leafing through it, these tutorials seem helpful. The most relevant R code will be those that use the There are several chapters in there to go over using sf to do basic GIS stuff in R. Otherwise, here's some extra code to help get some spatial plots going. It's pretty long and it's assuming you have the ratings files downloaded (easiest way is R Code for SF Plottinglibrary(tidyverse)
#Assumes all ratings files are in a folder ratings/daymet/ or ratings/prism/
for(j in c("daymet_stormvol","prism_stormvol","nldas_stormvol","daymet_simplelm","prism_simplelm","nldas_simplelm")){
print(j)
#Set path to read in ratings based on data source of outer loop
pathToread <- paste0("ratings/",j,"/")
for( i in list.files(pathToread) ){
#For each file in the directory, read in the file i as a csv
filei <- read.csv(paste0(pathToread,i))
#Store the gage number
filei$gage <- gsub(".*_(\\d+)-.*","\\1",i)
#Store the analysis type
filei$workflow <- gsub(".*_","",j)
#Keep only the necessary columns, depending on the workflow:
if(filei$workflow[1] == "simplelm"){
filei <- filei[,c("mo","rating","gage","workflow")]
}else{
filei <- filei[,c("mo","r_squared","gage","workflow")]
}
names(filei) <- c("mo","rating","gage","workflow")
#Combine the file into a single data frame
if(!exists("combineFile")){
combineFile <- filei
}else{
combineFile <- rbind(combineFile,filei)
}
}
#Assign to a specific variable and delete the generic combineFile
assign(paste0('combineFile',j),combineFile)
rm(combineFile)
}
#Join the daymet and prism data together
joinData <- combineFileprism_stormvol %>%
#Rename the rating for clarity
select(prismRatingstorm = rating,gage,mo,workflow) %>%
#Join in the dayment data, but first rename the rating column for clarity.
#Join on gage, month, and workflow
full_join(combineFiledaymet_stormvol %>%
select(daymetRatingstorm = rating,gage,mo,workflow),
by = c("gage","mo","workflow")) %>%
full_join(combineFiledaymet_simplelm %>%
select(daymetRatinglm = rating,gage,mo,workflow),
by = c("gage","mo","workflow")) %>%
#Add remaining prism data:
full_join(combineFileprism_simplelm %>%
select(prismRatinglm = rating,gage,mo,workflow),
by = c("gage","mo","workflow")) %>%
#Join in the nldas data, but first rename the rating column for clarity.
#Join on gage, month, and workflow
full_join(combineFilenldas_stormvol %>%
select(nldasRatingstorm = rating,gage,mo,workflow),
by = c("gage","mo","workflow")) %>%
full_join(combineFilenldas_simplelm %>%
select(nldasRatinglm = rating,gage,mo,workflow),
by = c("gage","mo","workflow")) %>%
#For easier viewing, combine lm and storm columns such that there is only one
#column for prism, daymet, nldas classified by the workflow column
mutate(prismRating = coalesce(prismRatingstorm,prismRatinglm),
daymetRating = coalesce(daymetRatingstorm,daymetRatinglm),
nldasRating = coalesce(nldasRatingstorm,nldasRatinglm)
) %>%
#Remove now superflous columns:
select(-prismRatingstorm,-prismRatinglm,
-daymetRatingstorm,-daymetRatinglm,
-nldasRatingstorm,-nldasRatinglm) %>%
#Pivot it longer to have a column with the data source and one for the
#ratings, for plotting ease
pivot_longer(c(prismRating,daymetRating,nldasRating),
names_to = 'dataSource',
values_to = 'rating') %>%
arrange(gage,mo,workflow)
#At each gage, does the best performing data source change between workflows?
#The below code is for ESTIMATES ONLY. The left_join assumes that the ratings
#are unique between datasources for each gage, workflow, month. This is brittle
#and could result in incorrect answers!
gageCompare <- joinData %>% dplyr::ungroup() %>%
#Group by gage, workflow, and month and find the max rating:
dplyr::group_by(gage,workflow,mo) %>%
dplyr::summarise(maxRating = max(rating,na.rm = TRUE)) %>%
#Join the joinData df back in matching by gage, workflow, mo, and rating. This
#could be problematic with duplicate ratings as in a case where all ratings
#across the data sources are the same value
left_join(joinData,by = c('gage','workflow','mo','maxRating' = 'rating')) %>%
#Now pivot wider such that we can see ratings and best data sources side-by-side
pivot_wider(names_from = workflow,values_from = c(maxRating,dataSource)) %>%
#Now filter to only find instances where the best data sources are different:
filter(dataSource_simplelm != dataSource_stormvol) %>%
#Create a column showing the difference in rating and arrange by the difference
mutate(differenceInRating = maxRating_simplelm - maxRating_stormvol) %>%
arrange(differenceInRating)
#Isolate one month
gageCompareMonth <- gageCompare[gageCompare$mo == 10,]
library(sf)
#Get the watershed coverage from the server
watershedGeo <- read.csv("http://deq1.bse.vt.edu:81/met/usgsGageWatershedGeofield.csv")
#Get the gage numbers as their own field and store a copy of the data
gageWatershedSF <- watershedGeo
gageWatershedSF$gage <- gsub(".*_(\\d+)","\\1",gageWatershedSF$hydrocode)
#Let's begin by plotting October daymet ratings for stormVol
forPlot <- joinData[joinData$dataSource == 'daymetRating' &
joinData$workflow == 'stormvol' &
joinData$mo == 10,]
#Join the geometry data onto out plot data
joinDataSF <- forPlot %>%
left_join(gageWatershedSF %>% select(-hydrocode),
by = 'gage') %>%
filter(!is.na(wkt))
#Create an SF object. Specify the coordinate system and the field name in the
#data frame that contains the well-known text. In this case, WKT is the name of
#the field with the polygon geometries
joinDataSF <- st_as_sf(joinDataSF,wkt = 'wkt',crs = 4326)
#Repair broken geometries
joinDataSF <- st_make_valid(joinDataSF)
#Add shape area in coordinate system units (likely meaningless in crs 4326)
joinDataSF$area <- st_area(joinDataSF)
#Order the data by largest to smallest area to make plotting more effective
joinDataSF <- joinDataSF[order(joinDataSF$area,decreasing = TRUE),]
#Plot SF polygons
ggplot(joinDataSF) +
geom_sf(aes(fill = rating)) +
scale_fill_viridis_c(option = 'magma') +
theme_classic()
The above code should generate the following image, which is showing October adjusted R squared values for daymet from the Storm Volume method: |
Thank you @COBrogan! I have been attempting to use the code, but having some issues with the creation of gageCompare, where it has all the dplyr steps. The ratings appear to be repeating, and it leads to the creation of list-cols that make the data frame unusable. Now, I'm trying to sftp the ratings again to be sure I have the right data, but I'm struggling to find the correct folder on the server to sftp from. Could you help point me in the right direction to get the data that works for this R script? |
@nathanielf22 -- no need for sftp, as all data should be available via web link with pattern: http://deq1.bse.vt.edu:81/met/[scenario]/ For example you can find the precip data and flow analyses for things that were run for scenario |
@rburghol, that makes sense. What about the simple lm data? The code includes those on the third line, but there isn't a folder labelled simplelm. |
The simplelm data is in the folders labeled NLDAS, PRISM, and daymet. Those directories should be renamed eventually, but we haven’t done so yet. So, in ‘PRISM/stats/‘ should be the simplelm results. @rburghol i think ‘sftp’ is easier, unless there’s a way to read all those csv urls at once into R? So far, I’ve been telling everyone to just use sftp if they are analyzing ALL 1,080 csv files… |
@COBrogan I think that if we are at the point that we need to analyze thousands of files at the same time we're at the need to prioritize putting the analysis inside the workflow and setting summary metrics in the database. Building local desktop workflows to analyze thousands of files that are downloaded via STP on the server we risk time wasted and trouble created with redundant analysis and download and then data sets coming out of date. I think that we really should be focusing on small case study analysis right now rather than broadbrush global things. When we have analytical lenses that we think are worthy of repeating over 1000 files then we move forward. As for the scenario names, for sure scenario Interestingly on the server we have the .con files as well: http://deq1.bse.vt.edu:81/p6/vadeq/config/control/met/ |
Hmm. I agree that these metrics should be shifted into the db soon. I think we discussed making a data model sketch for that next week, but we talked about Nate beginning to create those workflows to analyze/QC precip data based on annual means, potential timestamp issues, and spatial visualization of precip annual data. This work will naturally become a workflow as it’s based on the precip files we’re generating, so no issues there. |
@COBrogan I 100% agree that we will eventually want to see global trends in all sorts of variables, but the QA workflow step is at the single coverage level, not over the global data sets. I think I may have been too vague in my comments on this previously. I will elaborate below. @nathanielf22 In our workflows, everything operates at a coverage level, so in developing the QA steps we should:
Otherwise, we spend time doing file management (sftp), and we write a code workflow that iterates through files rather than a standalone workflow to handle analyzing a single file, for a single coverage. Then we have to disentangle our batch code to operate standalone. But we already have a robust set of code to allow us to retrieve all the metrics from all the scenarios at one single time in a very efficient manner |
Working on rerunning workflows
ALL other methods are out of date! |
Update for Thanksgiving break: On Tuesday of this week, I met with Connor to integrate some of the scripts I have been working on recently into the workflow, which went relatively smooth. There were a few issues that came up with some steps of the workflow that we tried to fix as we went. My next step is going to be trying to split up the StormEventsLM_cmd.R script into multiple parts, because right now it's performing multiple functions in the same script, which we don't want for our final workflow. |
wdm
workflow. model_meteorology#86dh_timeseries_weather
calc_raster_ts
wdm
workflow. model_meteorology#86 (comment))dh_timeseries_weather
with DEQ supportamalgamate
model_meteorology#66simple_lm
#GEO_MET_MODEL option 1: Simplelm()
analysis: model_meteorology#57simple_lm
that develops monthly regressions and applies them on a weekly basis. This will involve regressions based on four weekly data points applied on a weekly basis. Issue created, but needs reworking #GEO_MET_MODEL option 2: Weekly min(e)lm()
analysis: model_meteorology#59amalgamate
stormVol
#GEO_MET_MODEL option 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis model_meteorology#60amalgamate
stormVol
that applies divides events into 8 year periods and runs regressions based on months. The end result would be a timeseries that has constant months during each 8 year periodstormVol
configs to run the stormVol approach comparing volume above baseflow to precipitation during the storm event to have a more deterministic approach[coverage]-[met_scenario]-lm_simple-rating-ts.csv
file to include tstime and tsendtime - covering periods, rather than an entry for each day.stormVol
?om_vahydro_metric_grid
. Can get started by pullingL30
from models and begin to test spatial visualization while we process our results and eventually store them via REST. Can pull scripts from last year's projectdh_timeseries_weather
ordh_property
via REST REST / RomDataSource, metrics and analysis #1354om_vahydro_metric_grid
to recall data fromdh_property
lm(PRISM ~ NLDAS2)
. We can leverage thebaseline
variables set-up in the*.con
files to easily integrate these into our existing workflowslm(PRISM$precip[PRISM$time + 1] ~ NLDAS2)
.con
files include variableBASELINE_MET_SCENARIO
to provide this infoThe text was updated successfully, but these errors were encountered: