Workflow step `geo -> process -> 04_weekly_data`: Create R script to produce weekly mean value file #61

rburghol · 2024-07-03T13:14:15Z

ilonah22 · 2024-07-09T18:00:45Z

I am having some trouble trying to run Rscript in the command line, I keep getting the error that the Rscript command is not found.

$ Rscript hydroimport_daily.R
bash: Rscript: command not found

I tried this both inside and outside of Harp archive directory, the Rscript I am trying to run is stored in the HARP 2024-2025 folder on my computer.

I also can't find any files or folders called Rscript on my computer, does that mean I don't have it and will have to download something else?

COBrogan · 2024-07-09T18:20:57Z

Hi @ilonah22 . This error indicates your computer can't find the bin folder associated with R. We will need to add it to your "PATH" Windows environmental variable. First, search for "Edit environmental variables for your account" on the Windows search bar:

Then, select "path" and click "edit":

Finally, click "New" and add the path to the "bin" directory under your R install folder. My example below has my path. My R is installed in AppData, but yours may be installed under your "Documents/" folder:

Let me know if you have any trouble and we can hop on a call to debug this. It's an annoying issue. Basically we are telling Windows where R is so it knows where to get functions like Rscript

ilonah22 · 2024-07-09T19:25:24Z

@COBrogan Thank you, I had some trouble finding the right path, but it seems to be working now.

ilonah22 · 2024-07-10T18:27:14Z

I made some changes to the daily value file I have been working on that incorporates the commandargs() we talked about. I didn't want to get too far on a weekly version until I got this one to work in the command line.

# Inputs (args):
# 1 = File path of csv from VA Hydro
# 2 = End path of new csv
args <- commandArgs(trailingOnly = TRUE)

# Pull csv from input file path
hydro_daily <- read.csv(args[1])

# Add in more date information
hydro_daily[,c('yr', 'mo', 'da', 'wk')] <- cbind(year(as.Date(hydro_daily$obs_date)),
                                                 month(as.Date(hydro_daily$obs_date)),
                                                 day(as.Date(hydro_daily$obs_date)),
                                                 week(as.Date(hydro_daily$obs_date)))

# If data comes from nladas2 (hourly), it must be converted into daily data
if (data_source=="nldas2"){
  hydro_daily <- sqldf(
    "select featureid, min(obs_date) as obs_date, yr, mo, da, 
     sum(precip_mm) as precip_mm, sum(precip_in) as precip_in
   from hydro_daily 
   group by yr, mo, da
   order by yr, mo, da
  "
  )}

# Write csv in new file path
write.csv(hydro_daily,args[2])

But, when I tried to run this in the command line I got a segmentation fault error.

$ Rscript hydroimport_daily.R "C:/Users/ilona/OneDrive - Virginia Tech/HARP/R Tests/usgs_ws_03176500-daymet-all.csv" "C:/Users/ilona/OneDrive - Virginia Tech/HARP/R Tests/Glen-RscriptTest.csv"
Segmentation fault

I'm not sure if it's a problem with the script or how I entered the inputs into the command line.

* I don't know if this matters, but I downloaded the csv from http://deq1.bse.vt.edu:81/met/daymet/out/ onto my computer because I thought that would have the best chance of working.

COBrogan · 2024-07-11T13:36:38Z

@ilonah22 Sorry for the delay in response, I somehow missed this yesterday afternoon. I was able to get this scrip to work but I had to make a few changes. First, there were a few things missing from the script. The library lubridate needs to be added in to the script due to the calls for lubridate::year(), lubridate::month(), etc. The if statement below is also looking for data_source. We should probably pass data_source in as an argument to the script:
if (data_source=="nldas2"){ ... } to if (args[3] == "nldas2"){ ... }

Now, these errors did NOT reproduce the segmentation error you're seeing. My call was as follows. Are you passing in the correct path for the hydroimport_daily.R? Maybe try using an absolute path as I did below?
Rscript c:/Users/gcw73279.COV/Desktop/testCommand.R "C:/Users/gcw73279.COV/Downloads/usgs_ws_01656000-daymet-all.csv" "C:/Users/gcw73279.COV/Desktop/testOut.csv" "prism"

It may help to add a print("Script started!") call at the beginning of hydroimport_daily.R and one right before the write.csv(). This would print a message to your console helping you figure out if the script is being called successfully and successfully reaches the write.csv() step. Segmentation fault errors are typically memory or access errors....

ilonah22 · 2024-07-11T20:35:51Z

We finished a first draft of make_weekly_summary_ts.R, which should be in the main branch now, but there were a few bits we had trouble with.

[weekly_column(default=weekly_mean_value)] , We were a little confused about this aspect of the inputs, so we did not include it yet.
start_date and end_date, the way we went about the weekly summary we assumed that the input file would be the comp_data, which does not have columns with those names, so the warning messages about these columns are commented out for now.

Other than those two issue, I was able to run it with a comp_data file that @mwdunlap2004 had already made, and it looks like it worked.

COBrogan · 2024-07-12T13:04:22Z

I think that the [weekly_column(default=weekly_mean_value)] is suggesting that the output dataset use this input as the name of the column. So, if we use Rob's example, Rscript.exe make_weekly_summary_ts.R source_filename output_daily_filename data_column weekly_average_value implies the output dataset should contain a column called "weekly_average_value" that contains the weekly means. If this argument is not provided, we should proceed with the default column name "weekly_mean_value e.g. in R:

#Set column name of output file using the third argument, if provided. Otherwise, default name to weekly_mean_value
colName <- args[3]
if (is.null(colName)) {
colName <- weekly_mean_value
}

mwdunlap2004 · 2024-07-15T12:38:38Z

I'm still confused about what the requirements mean and how to implement the column name and data_column aspects. For example, the comp_data csv we've created has 7 data columns, including usgs_cfs, dataset_p_in, and dataset_cfs. Which one do we want the weekly average for, or is it all of the columns? Could we meet at some point today to discuss how to go about this?

rburghol · 2024-07-19T14:12:04Z

Hey @mwdunlap2004 I am thinking that you probably know the answer to your above question, but yes, all data columns will be averaged.

rburghol mentioned this issue Jul 3, 2024

Coding Workflow Components: Weeks of 7/1/2024 and 7/8/2024 HARPgroup/HARParchive#1291

Open

9 tasks

rburghol transferred this issue from HARPgroup/HARParchive Jul 3, 2024

rburghol assigned rburghol, COBrogan, mwdunlap2004 and ilonah22 Jul 3, 2024

rburghol changed the title ~~Create R script to produce weekly mean value file~~ Workflow step geo -> process -> 03_weekly_data: Create R script to produce weekly mean value file Jul 12, 2024

mwdunlap2004 mentioned this issue Jul 15, 2024

Week of 7/15/2024 HARPgroup/HARParchive#1309

Open

15 tasks

rburghol mentioned this issue Jul 15, 2024

Meteorology Mashup Model HARPgroup/meta_model#59

Open

COBrogan mentioned this issue Jul 15, 2024

GEO_MET_MODEL option 3: Storm Hydrograph Analysis and Precip/Hydrograph Volume Analysis #60

Open

7 tasks

rburghol mentioned this issue Jul 17, 2024

Workflow geo #65

Open

rburghol changed the title ~~Workflow step geo -> process -> 03_weekly_data: Create R script to produce weekly mean value file~~ Workflow step geo -> process -> 04_weekly_data: Create R script to produce weekly mean value file Jul 17, 2024

rburghol changed the title ~~Workflow step geo -> process -> 04_weekly_data: Create R script to produce weekly mean value file~~ Workflow step geo -> process -> 03_weekly_data: Create R script to produce weekly mean value file Jul 17, 2024

rburghol changed the title ~~Workflow step geo -> process -> 03_weekly_data: Create R script to produce weekly mean value file~~ Workflow step geo -> process -> 04_weekly_data: Create R script to produce weekly mean value file Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow step `geo -> process -> 04_weekly_data`: Create R script to produce weekly mean value file #61

Workflow step `geo -> process -> 04_weekly_data`: Create R script to produce weekly mean value file #61

rburghol commented Jul 3, 2024 •

edited

Loading

ilonah22 commented Jul 9, 2024

COBrogan commented Jul 9, 2024

ilonah22 commented Jul 9, 2024

ilonah22 commented Jul 10, 2024

COBrogan commented Jul 11, 2024 •

edited

Loading

ilonah22 commented Jul 11, 2024

COBrogan commented Jul 12, 2024

mwdunlap2004 commented Jul 15, 2024

rburghol commented Jul 19, 2024

Workflow step geo -> process -> 04_weekly_data: Create R script to produce weekly mean value file #61

Workflow step geo -> process -> 04_weekly_data: Create R script to produce weekly mean value file #61

Comments

rburghol commented Jul 3, 2024 • edited Loading

ilonah22 commented Jul 9, 2024

COBrogan commented Jul 9, 2024

ilonah22 commented Jul 9, 2024

ilonah22 commented Jul 10, 2024

COBrogan commented Jul 11, 2024 • edited Loading

ilonah22 commented Jul 11, 2024

COBrogan commented Jul 12, 2024

mwdunlap2004 commented Jul 15, 2024

rburghol commented Jul 19, 2024

Workflow step `geo -> process -> 04_weekly_data`: Create R script to produce weekly mean value file #61

Workflow step `geo -> process -> 04_weekly_data`: Create R script to produce weekly mean value file #61

rburghol commented Jul 3, 2024 •

edited

Loading

COBrogan commented Jul 11, 2024 •

edited

Loading