Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEO_MET_MODEL option 1: Simple lm() analysis: #57

Open
rburghol opened this issue Jun 18, 2024 · 15 comments
Open

GEO_MET_MODEL option 1: Simple lm() analysis: #57

rburghol opened this issue Jun 18, 2024 · 15 comments
Assignees

Comments

@rburghol
Copy link
Contributor

rburghol commented Jun 18, 2024

Overview

  • Metric: $R^2$ from lm() of $Q_{week} = f(P_{week,month})$
    • Each dataset has 12 monthly lm() created to compare weekly mean P with weekly mean Q
  • Application: data sets are ranked according to $R^2$, amalgamated dataset contains all monthly data from best performing base dataset for that month.

Work flow steps

See #65

Prep Variables

MODEL_ROOT=/backup/meteorology/
MODEL_BIN=$MODEL_ROOT
SCRIPT_DIR=/opt/model/model_meteorology/sh
export MODEL_ROOT MODEL_BIN SCRIPT_DIR

Quick Run

Run steps

  • geo
    • process: See Meteorology Mashup Model meta_model#59
    • analyze
      • 02_model_lm_simple - create an lm() for the weekly data, sorted by month. Store data in a local json file or in the VAHydro database as properties.
        • Use: run_model raster_met PRISM "usgs_ws_02031000" auto geo analyze 01
    • 08_create_rating - creates a time series with start_date, end_date and rating, with rating as the monthly r-squared value for this method.
    • note: a file with SQL called "/tmp/[outfile].sql" will be created to do the actual work, if you want to see it/debug it add "n" as the last parameter when calling)
  • Retrieve: sftp dbase2:/tmp/usgs_ws_02031000-prism-all.csv /var/www/html/files/met/usgs_ws_02031000-prism-all.csv

Generate Weekly Ratings

  • Done during analysis step of cov workflow
  • See R script to perform analysis and create CSV of weekly ratings: https://github.com/HARPgroup/HARParchive/blob/master/HARP-2024-2025/methods/simple_monthly_lm.R
  • Store ratings data:
    • This method analyzes datasources and assigns a goodness of fit for each month, using weekly summarized data
    • Store 1 value per month, for each method.
    • propname: [method name]
      • propname: [month 3-letter]_rsq
        • propval: r-squared for this month
        • propcode: n/a
      • propname: [month 3-letter]_p
        • propval: p value for this month
        • propcode: n/a
    • propname: image_monthly_rsq.png
      • propval: bar chart of values for each month
      • propcode: n/a
    • propname: image_ts_e.png
      • propval: Timeseries plot of weekly error value for each week from the timeseries of lm() predictions versus USGS observed
      • propcode: n/a
    • propname: image_monthly_boxplot_e.png
      • propval: box and whisker of error value for each month based on the timeseries of lm() predictions versus USGS observed
      • propcode: n/a
  • Intermediate data products:
    • Daily timeseries of precip summarized over coverage (output from calc_raster_ts)
    • should we call this script once for each datasource?
    • Then what is our data model for storing the multiple columns?
      • One file for each coverage, with USGS gage as column?
        • We could simply tell it to rewrite our file after adding the data source values as a column?
        • script will need to resample data to daily level to match USGS and create a common timestep
      • One file for each datasource + coverage
        • Our analytics file takes the multiple datasources as inputs

Use monthly best fit model as rating.

@rburghol
Copy link
Contributor Author

TRy new method without raster calc which is unneccessary?

create temp table raster_best(tid serial, featureid integer, tstime bigint, tsendtime bigint, varid integer, rast raster);
insert into raster_best(featureid, tstime, tsendtime, varid, rast)
select f.hydroid, w.tstime, w.tsendtime, w.varid, 
  ST_Resample(st_clip(w.rast, fgeo.dh_geofield_geom), tpr.rast) as rast
from dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
  fgeo.entity_id = f.hydroid
  and fgeo.entity_type = 'dh_feature'
)
left outer join week_ratings as r
on (
 yr = 1984
)
left outer join dh_variabledefinition as v
on (
  v.varkey = r.best_varkey
)
left outer join dh_timeseries_weather as w
on (
   extract( year from to_timestamp(w.tsendtime) ) = r.yr
   and extract( month from to_timestamp(w.tsendtime) ) = r.mo
   and extract( week from to_timestamp(w.tsendtime) ) = r.wk 
  and v.hydroid = w.varid
)
left outer join raster_templates as tpr
on (1=1)
where f.hydrocode = 'usgs_ws_01665500'
and w.tid is not null;

  • Now export
PGHOST=dbase2 PGPORT=5432 PGUSER=rob gdal_translate -of GTiff "PG:dbname=drupal.dh03 table=raster_best_out where='tid = 1 ' " outfile.tiff

@rburghol
Copy link
Contributor Author

rburghol commented Jun 25, 2024

@ilonah22 @mwdunlap2004

  • usgs_data: Rscript to take in gage ID, (and time period?) and download flow data, save as csv
    • don't overwrite unless requested to
    • user calls script with desired filename (don't guess)
  • bf_separation: Rscript to take a flow CSV as input, create the bf separaton plot, save it as a file
    • generate a BF plot for some time period. Maybe an arbitrary date? maybe the last year of the dataset?
    • User calls script with input file, and the name of the output baseflow graph(s)
  • Research Rscript.exe and passing arguments and reading arguments inside the command prompt. See ex: https://github.com/HARPgroup/meta_model/blob/main/scripts/river/hsp_hydr_analysis.R
  • (*bonus) lm_lastweek: Rscript to create lm stats

@COBrogan

This comment has been minimized.

@rburghol

This comment has been minimized.

@COBrogan

This comment has been minimized.

@rburghol

This comment has been minimized.

@COBrogan

This comment has been minimized.

@rburghol

This comment has been minimized.

@COBrogan

This comment has been minimized.

@COBrogan

This comment has been minimized.

@rburghol

This comment has been minimized.

@COBrogan

This comment has been minimized.

@ilonah22
Copy link

ilonah22 commented Jul 8, 2024

I ran into an error message this morning while trying to run access-file.R and was hoping somebody could tell me what's going wrong.

I get this error message when I try to run the line that pulls the data in from VA Hydro:

  cannot open the connection to 'http://deq1.bse.vt.edu:81/files/met/usgs_ws_03176500-prism-all.csv'
In addition: Warning message:
In file(file, "rt") :
  cannot open URL 'http://deq1.bse.vt.edu:81/files/met/usgs_ws_03176500-prism-all.csv': HTTP status was '404 Not Found'

I also tried daymet but got the same issue.

@rburghol
Copy link
Contributor Author

rburghol commented Jul 8, 2024

Hey @ilonah22 sorry to delay response. Last week we shifted the output files to a different directory, so we need to change the URL references to the following:

  • Use: http://deq1.bse.vt.edu:81/met/[data source]/out/
  • Ex 1: http://deq1.bse.vt.edu:81/met/daymet/out/
  • Ex 2: http://deq1.bse.vt.edu:81/met/PRISM/out/
  • Ex 3: http://deq1.bse.vt.edu:81/met/nldas2/out/

See here for more details: HARPgroup/HARParchive#1291

@rburghol rburghol changed the title Workflow 1: Simple lm() analysis: GEO_MET_MODEL option 1: Simple lm() analysis: Aug 5, 2024
@COBrogan
Copy link
Collaborator

COBrogan commented Sep 5, 2024

Batch script for running all gages:

gage_coverage_file="usgsgageList.csv"
gage_coverage_SQLfile=usgsgageList.sql
gageSQL="
\\set fname '/tmp/${gage_coverage_file}' \n
 

copy ( select hydrocode
FROM dh_feature
WHERE bundle = 'watershed' AND ftype = 'usgs_full_drainage'
) to :'fname';"
# turn off the expansion of the asterisk
set -f
echo -e $gageSQL > $gage_coverage_SQLfile 
cat $gage_coverage_SQLfile | psql -h dbase2 -d drupal.dh03
sftp dbase2:"/tmp/${gage_coverage_file}" "~/${gage_coverage_file}"

filename="~/${gage_coverage_file}"
for i in `cat $filename`; do
	echo $i
done

filename="~/${gage_coverage_file}"
for i in `cat $filename`; do
  echo "Running: sbatch /opt/model/meta_model/run_model raster_met stormVol_prism \"$i\" auto geo"
  sbatch /opt/model/meta_model/run_model raster_met PRISM "$i" auto geo 
done

As of 1:15PM September 05, 2024:
181 gages run for PRISM
43 gages run for daymet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants