-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep Dive segments #83
Comments
Hey @COBrogan, @ilonah22, @nathanielf22 and @mwdunlap2004 - the batch run that Michael did last night cruised along and did the simple LM for many many segments, but has been stalled for almost 12 hours doing the Dan River at Wentworth NC (usgs_ws_02071000). I am 99% sure that this is the same segment that stalled when Connor ran it a couple weeks ago. Now, it is a fairly large area, so it could be that it is just too big, however, since this same one has come up twice, and not some other large segment like the James River at Richmond or something, I am wondering if there might be some hinkiness in the geometry that is causing troubles? Other than that thought, I have no real ideas on how to debug it -- we can check Anyhow, I did not yet kill Michael's job on that segment, but I think we can check in on it later today and give it the adios if it hasn't completed -- queries that run perpetually can have a very bad impact on overall system performance. |
Another item of note is that we have 156 nldas2 gages, 182 PRISM gages, and 184 daymet gages. I'm not sure why the numbers are different, but I will say for at least one of the datasets I somehow managed to download storm_vol results for at least one gage. |
@mwdunlap2004 At least some of the missing NLDAS gages are a result of the issues we've been having with The storm_vol results you downloaded from PRISM were generated months ago and must have been an early test. I have deleted them from /media/model/met/PRISM/out/ to prevent that issue in the future. Here's some early images to help visualize the results @nathanielf22 @ilonah22. First, all data (this one is a little hard to read): There are 901 instances in which the two methods predict different "best" data sources for a given gage and month. The ratings difference between these "best" data sets range from -100 - 50% (with a negative value indicating a much better performance by the storm volume method). Code below but you'll need to tweak the file paths in lines 3 and 6 because my code assumes you have folders for each data with the names in line 3, all of which are in the ratings/ directory in line 6: **Code for plots and gage comparisons**``` r library(tidyverse) #Assumes all ratings files are in a folder ratings/daymet/ or ratings/prism/ for(j in c("daymet_stormvol","prism_stormvol","nldas_stormvol","daymet_simplelm","prism_simplelm","nldas_simplelm")){ print(j) #Set path to read in ratings based on data source of outer loop pathToread <- paste0("ratings/",j,"/") for( i in list.files(pathToread) ){ #For each file in the directory, read in the file i as a csv filei <- read.csv(paste0(pathToread,i)) #Store the gage number filei$gage <- gsub(".*_(\\d+)-.*","\\1",i) #Store the analysis type filei$workflow <- gsub(".*_","",j) #Keep only the necessary columns, depending on the workflow: if(filei$workflow[1] == "simplelm"){ filei <- filei[,c("mo","rating","gage","workflow")] }else{ filei <- filei[,c("mo","r_squared","gage","workflow")] } names(filei) <- c("mo","rating","gage","workflow") #Combine the file into a single data frame if(!exists("combineFile")){ combineFile <- filei }else{ combineFile <- rbind(combineFile,filei) } } #Assign to a specific variable and delete the generic combineFile assign(paste0('combineFile',j),combineFile) rm(combineFile) }#Join the daymet and prism data together #Plot the data #Plot the data (simplelm) #Plot the all data (both workflows) #At each gage, does the best performing data source change between workflows?
|
Thanks for the review on these results @COBrogan !! I would add that "best" in my mind does not refer to the analysis method, but rather, to the data source. To that, I think it is quite interesting the handful of months where the simple lm shows large differences between NLDAS2 and prism/daymet (April, May, November, December). I think this is intriguing since the goal is not to get the best R^2, but in a sense, to find where the data sources differentiate the most. Of course, it may very well be that the difference in those datasets shown by To better enable us to answer these questions, we need to get these into the REST database, so that they are accessible with |
@rburghol @COBrogan I created plots for 8 gages that ran for nldas2 last night of the 10 that Ilona had selected for use previously, and this matches my results, nldas2 seems to be consistently worse than daymet and PRISM. Daymet and PRISM also seem to be very similar to each other and I didn't notice many differences in their results, and in many cases their lines were eerily similar. I'm going to run some summary statistics on these gages after I talk to Dr. Scott today about how to handle nldas2 missing 2 gages (in regards to my presentation). In the long term, I think improving the clipping methods could help this? |
@mwdunlap2004 -- the clipping methods will certainly fix this. I am 99.9% certain these are just the result of the overlap algorithm that we are using, and the fix is either to resample or to use the polygon intersection method -- both of which work, but performance is an issue. So, it is totally cool to have cases where your data is not workign out, for reasons of resolution - very good point to make to the audience imo (@COBrogan may have other thoughts) |
Segments that need in depth QA due to large errors, missing data, or peculiar interest. @mwdunlap2004 @COBrogan @ilonah22
wdm
#72 (comment)usgs_ws_02071000 Dan River at Wentworth NC
calc_raster_ts
withCalling: /opt/model/model_meteorology/sh/calc_raster_ts usgs_ws_02071000 nldas2_obs_hourly /tmp/usgs_ws_02071000_1725581950_277/usgs_ws_02071000-nldas2-all.csv.sql /tmp/usgs_ws_02071000-nldas2-all.csv dbase2 drupal.dh03
select st_isvaliddetail(dh_geofield_geom) from field_data_dh_geofield where entity_id = 437550;
st_isvaliddetail ---> (t,,)
select st_area2d(dh_geofield_geom) from field_data_dh_geofield where entity_id = 437550;
0.2726207204175703
select st_area2d(dh_geofield_geom) from field_data_dh_geofield where entity_id = 290049;
0.7199793220968411
select st_numgeometries(dh_geofield_geom) from field_data_dh_geofield where entity_id = 437550;
st_numgeometries --> 1
L51079
nldas2
./nldas_land_cells Land_segment
#_of_pairs Cell1_X Cell1_Y Cell2_x Cell2_Y ...
./nldas_land_cells L51079
6 372 106 373 106 374 106 372 107 373 107 374 107
The text was updated successfully, but these errors were encountered: