Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coding Workflow Components: Weeks of 7/1/2024 and 7/8/2024 #1291

Open
8 of 9 tasks
mwdunlap2004 opened this issue Jul 1, 2024 · 6 comments
Open
8 of 9 tasks

Coding Workflow Components: Weeks of 7/1/2024 and 7/8/2024 #1291

mwdunlap2004 opened this issue Jul 1, 2024 · 6 comments
Assignees

Comments

@mwdunlap2004
Copy link
Collaborator

mwdunlap2004 commented Jul 1, 2024

@mwdunlap2004
Copy link
Collaborator Author

I pushed my function that takes in a dataset and gageid variable and outputs a weekly csv of that data, it is on the weeklyprecip branch, and it is called "attemptatweekdata".

@mwdunlap2004
Copy link
Collaborator Author

Right now the function uses basic variable names like precip_in, do any of you know of a way to change the name of a variable using the dataset so it would look like PRISM_p_cfs for example.

@COBrogan
Copy link
Collaborator

COBrogan commented Jul 1, 2024

Right now the function uses basic variable names like precip_in, do any of you know of a way to change the name of a variable using the dataset so it would look like PRISM_p_cfs for example.

Sure, there are several ways to create a dynamically named field in a dataframe or as an entry to a list. The simplest is likely the following, which will create a column of NAs named MyCol2 in a data.frame:

inVar <- 2
myDF[,paste0("My","Col",inVar)] <- NA

You could alternatively simply add a column and rename it based on the index of said column:

myDF$dummyColumn <- NA
names(myDF)[ names(myDF) == "dummyColumn" ] <- paste0("MyCol", inVar)
#OR
myDF$dummyColumn <- NA
names(myDF) [ grepl("dummyColumn", names(myDF)) ] <- paste0("MyCol", inVar)

However, I'm not sure we'd want more specifically named columns unless these are all being joined together. If the function is only handling one dataset at a time, it might be helpful to keep the structure of the output file generic such that we always get a data frame with the same names. This makes it easier to handle the data frame in future processing steps, regardless of the data source. In other words, it might be helpful to get a field labeled precip_in as long as it represents only one datasource. Then, we know we can use this function and always simply precip_in to get the precip data from the datasource that we specify earlier in the workflow.

@ilonah22
Copy link
Collaborator

ilonah22 commented Jul 2, 2024

I created a new version of access-file.R, called lmsingledata, which only requires one dataset, so all analysis can be run on one data source at a time. The major change was how data is pulled in.


hydrocode = paste0('usgs_ws_', gageid)

data_source = "prism"

hydro_data <- read.csv(paste0("http://deq1.bse.vt.edu:81/files/met/", 
                              hydrocode,"-",data_source, "-all.csv"))

hydro_data[,c('yr', 'mo', 'da', 'wk')] <- cbind(year(as.Date(hydro_data$obs_date)),
                                                month(as.Date(hydro_data$obs_date)),
                                                day(as.Date(hydro_data$obs_date)),
                                                week(as.Date(hydro_data$obs_date)))

if (data_source=="nldas2"){
hydro_data <- sqldf(
  "select featureid, min(obs_date) as obs_date, yr, mo, da, 
     sum(precip_mm) as precip_mm, sum(precip_in) as precip_in
   from hydro_data 
   group by yr, mo, da
   order by yr, mo, da
  "
)}

@rburghol rburghol changed the title Week of 7/1/2024 Weeks of 7/1/2024 and 7/8/2024 Jul 3, 2024
@rburghol
Copy link
Contributor

rburghol commented Jul 9, 2024

@ilonah22 the code that you pasted above looks excellent -- what it does is to create a daily summary dataframe from the raw data file. The next step is to create a second script that does almost the same thing but takes the daily CSV as input and generates a weekly CSV (which we use for some of our methods).

The only mods I would put for the script, is that rather than guessing the hydrocode and input file name (and output filename), these will be inputs to the script. The details of this script are in the issue I tagged you in over here: HARPgroup/model_meteorology#61 -- if you can start to develop and track your progress on this over there that would be awesome. Keep me posted - thanks!

@COBrogan
Copy link
Collaborator

@ilonah22 I think Rob's comments are spot-on. Taking this framework you have and creating a weekly version is a great next step and will help to reinforce our workflow development. I'd be happy to help out with this as needed. I have some availability in the afternoon and can help parse through Rob's suggestion or go over some next steps. I found this workflow process to be a bit tricky at first and am happy to discuss! Just let me know and I can set-up a Teams Meeting.

@rburghol rburghol changed the title Weeks of 7/1/2024 and 7/8/2024 Coding Workflow Components: Weeks of 7/1/2024 and 7/8/2024 Jul 15, 2024
@rburghol rburghol mentioned this issue Jul 15, 2024
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants