-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow amalgamate
#66
Comments
First attempt at amalgamating a raster. Pulls in clipped, resampled rasters for daymet and PRISM. The "best" dataset below is picked randomly, this will need to be updated as values are added to REST. Unions NLDAS rasters prior to resampling and clipping. This is the longest part of the query, adding roughly 10 seconds. For one day, this query takes about 15 seconds. Scaling this to the full time period would result take about 60 hours (15 sec/day * 365 day/yr * 39 yr / 3600 sec/hours). To-do:
--Given we have a file that indicates which dataset performed best, we should have
--three different datasets defined via WITH, grabbing and clipping coverages based on the file
--Arbitrarily pick a tsendtime to practice with, here February 18, 2020.
--Note that with NLDAS this will pick an abitrary hour and we need the full 24-hour set
\set tsendin '1582027200'
\set resample_varkey 'daymet_mod_daily'
-- sets all integer feature and varid with query
select hydroid as covid from dh_feature where hydrocode = 'cbp6_met_coverage' \gset
--Grab the USGS full drainage geometries/coverages and assign ratings to inicate best
--performing precip dataset
WITH usgsCoverage as (
SELECT f.*,
--Join in ratings. Until ratings are put in db via REST, let's
--use a random integer between 0 (NLDAS) and 2 (daymet)
floor(random() * (2-0+1) + 0)::int as dataID,
--Add area of watershed/coverage for refernce and to order from downstream to upstrea
ST_AREA(fgeo.dh_geofield_geom) as covArea,
fgeo.dh_geofield_geom as dh_geofield_geom
FROM dh_feature as f
LEFT JOIN field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
WHERE f.bundle = 'watershed' AND f.ftype = 'usgs_full_drainage'
ORDER BY covArea DESC
),
--Get the geometry and feature fields for the full coverage based on the covid variable gset above.
--This will be used to create a resampled NLDAS for the day
fullCoverage as (
SELECT f.*,fgeo.dh_geofield_geom
FROM dh_feature as f
LEFT JOIN field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
WHERE f.hydroid = :'covid'
),
--Where PRISM is the best performing dataset, grab the appropriate
--dialy raster from dh_weather_timeseries and resample to target resolution
--and then clip to watershed boundaries
prism as (
SELECT cov.*,
met.featureid,met.tsendtime,
st_clip(st_resample(met.rast,rt.rast), cov.dh_geofield_geom) as rast
FROM usgsCoverage as cov
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey='prism_mod_daily'
and met.featureid = :covid
and met.tsendtime = :'tsendin'
) AS met
ON ST_Intersects(ST_ConvexHull(met.rast),cov.dh_geofield_geom)
LEFT JOIN (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
WHERE cov.dataID = 1
),
--Where daymet is the best performing dataset, grab the appropriate
--daily raster from dh_weather_timeseries and resample to target resolution
--and then clip to watershed boundaries
daymet as (
SELECT cov.*,
met.featureid,met.tsendtime,
st_clip(st_resample(met.rast,rt.rast), cov.dh_geofield_geom) as rast
FROM usgsCoverage as cov
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey='daymet_mod_daily'
and met.featureid = :covid
and met.tsendtime = :'tsendin'
) AS met
ON ST_Intersects(ST_ConvexHull(met.rast),cov.dh_geofield_geom)
LEFT JOIN (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
WHERE cov.dataID = 2
),
--Union all NLDAS rasters for the day to get the sum of NLDAS for the day
nldasFullDay AS (
SELECT st_union(met.rast,'sum') as rast
FROM (
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey='nldas2_precip_hourly_tiled_16x16'
and met.featureid = :covid
and extract(year from to_timestamp(met.tsendtime)) = extract(year from to_timestamp(:'tsendin'))
and extract(month from to_timestamp(met.tsendtime)) = extract(month from to_timestamp(:'tsendin'))
and extract(day from to_timestamp(met.tsendtime)) = extract(day from to_timestamp(:'tsendin'))
) AS met
),
nldasFullDayResamp AS (
select st_resample(met.rast,rt.rast) as rast
FROM fullCoverage as f
JOIN nldasFullDay as met
ON ST_ConvexHull(met.rast) && f.dh_geofield_geom
LEFT JOIN (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
),
--Union all NLDAS rasters for the day, intersecting by the usgsCoverage geometries
--to leverage the tiled NLDAS rasters. The end result is a raster for each coverage
--where NLDAS is the most highly rated that is of the full day's dataset,
--but clipped to only intersecting tiles
nldasDay as (
SELECT cov.hydroid, cov.hydrocode,
cov.ftype, cov.bundle, cov.name,
:'tsendin' as tsendtime,
st_union(met.rast,'sum') as rast
FROM usgsCoverage as cov
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey='nldas2_precip_hourly_tiled_16x16'
and met.featureid = :covid
and extract(year from to_timestamp(met.tsendtime)) = extract(year from to_timestamp(:'tsendin'))
and extract(month from to_timestamp(met.tsendtime)) = extract(month from to_timestamp(:'tsendin'))
and extract(day from to_timestamp(met.tsendtime)) = extract(day from to_timestamp(:'tsendin'))
) AS met
ON ST_Intersects(ST_ConvexHull(met.rast),cov.dh_geofield_geom)
WHERE cov.dataID = 0
GROUP BY cov.hydroid, cov.hydrocode, cov.ftype,
cov.bundle, cov.name
),
--Now, using the union of NLDAS hourly data in nldasDay, resample to the template raster and clip to each
--watershed where NLDAS is rated the best via an INNER JOIN and the WHERE in nldasDay
nldas as (
SELECT cov.*,met.tsendtime,
st_clip(st_resample(met.rast,rt.rast), cov.dh_geofield_geom) as rast
FROM usgsCoverage as cov
INNER JOIN nldasDay as met
on cov.hydroid = met.hydroid
LEFT JOIN (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
),
--For each feature in usgsCoverage, find the non-NULL summary dataset (which will represent the best rated dataset)
amalgamate as (
select cov.*, COALESCE(prismMet.rast,daymetMet.rast,nldasMet.rast) as rast
FROM usgsCoverage as cov
LEFT JOIN nldas as nldasMet
on cov.hydroid = nldasMet.hydroid
LEFT JOIN prism as prismMet
on cov.hydroid = prismMet.hydroid
LEFT JOIN daymet as daymetMet
on cov.hydroid = daymetMet.hydroid
),
--Union the best rated datasets together. Since the data is sorted by drainage area,
--upstream areas will populate after larger, downstream coverages
amalgamateUnion as (
SELECT ST_union(amalgamate.rast) as rast
FROM usgsCoverage as cov
LEFT JOIN amalgamate as amalgamate
on cov.hydroid = amalgamate.hydroid
)
--Use a full union to create a column with the amalgamateUnion raster and the nldasFulLDayResamp raster
--Then, union the rasters to get a raster in which the "background" is the nldasFullDayResamp and everything else is
--the best fit raster
SELECT ST_union(fullUnion.rast) as rast
FROM (
SELECT rast FROM nldasFullDayResamp
UNION ALL
SELECT rast FROM amalgamateUnion
) as fullUnion;
|
Testing the above raster. Using random numbers to assign best fit dataset (for now):
So, let's find the mean precipitation in this watershed using the resampled PRISM data from Check `dh_timeseries_weather` PRISM data
featureid | obs_date | yr | mo | da | hr | precip_in Now let's find the mean precipitation in this watershed using the amalgamated data from Check `amalgamate` data
featureid | obs_date | yr | mo | da | hr | precip_in These different values actually make sense. This is a larger watershed and will thus change after amalgamation as smaller watersheds are unioned on top. We instead need to find a headwater watershed, which should remain unchanged. For instance, |
This is great progress @COBrogan thanks for pushing it along! I was today days old when I learned the I had a couple questions and speculations (and may reveal my misunderstanding):
|
The headwaters are revealing the same data as the data from the randomly selected dataset. I created some temp tables to test this out but that code isn’t here (but I have it on OneDrive if I need to refer to it again).
Absolutely! Those raster will reduce run time by at least 33% I think.
That’s what I’m hoping! I tried to put a comment block above each ‘WITH’ to help it be a little more legible bc the WITHs are difficult to read.
‘fullCoverage’ is the union of nldas for the whole CB extent. It’s unioned at the end to serve as the “background” data is no better data exists
To get the base raster. There may be a better way when we finish with temporal disaggregateion
Hopefully we speed it up! I’ll be working on confirming that the larger watersheds are being overwritten by smaller headwaters using st point next week. |
Thinking about steps
|
@COBrogan Putting a little thought into how ratings could be included as part of the WITH section. Here it is in SQL form:
Then, we could collapse the WITH statements for prism and daymet into a single generic query where we simply selected varid joined to ratings.best_varid like so:
For more information on how I think we might go about creating those pre-amalgamated ratings for each coverage see #96 |
@mwdunlap2004 @rburghol Using
In each case, I found that the amalgamated raster was working. I think the amalgamation is working as intended! See here for a modified amalgamate code that creates temp tables for the ratings and amalgamate raster for later exploration. I used this to test amalgamate in three scenarios above, but please note that the ratings are still randomly assigned such that each time you run this you will need to check the datasources selected in scenarios 1 and 2: Temp Table Amalgamate--Given we have a file that indicates which dataset performed best, we should have
--three different datasets defined via WITH, grabbing and clipping coverages based on the file
--Arbitrarily pick a tsendtime to practice with, here September 16, 1981.
--Note that with NLDAS this will pick an abitrary hour and we need the full 24-hour set
\set tsendin '369489600'
\set resample_varkey 'daymet_mod_daily'
-- sets all integer feature and varid with query
select hydroid as covid from dh_feature where hydrocode = 'cbp6_met_coverage' \gset
create temp table tmp_usgsCoverage as (
SELECT f.*,
--Join in ratings. Until ratings are put in db via REST, let's
--use a random integer between 0 (NLDAS) and 2 (daymet)
floor(random() * (2-0+1) + 0)::int as dataID,
--Add area of watershed/coverage for refernce and to order from downstream to upstrea
ST_AREA(ST_MakeValid(fgeo.dh_geofield_geom)) as covArea,
fgeo.dh_geofield_geom as dh_geofield_geom
FROM dh_feature as f
LEFT JOIN field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
WHERE f.bundle = 'watershed' AND f.ftype = 'usgs_full_drainage'
ORDER BY covArea DESC
)
create temp table tmp_amalgamate as (
--Grab the USGS full drainage geometries/coverages and assign ratings to inicate best
--performing precip dataset
WITH usgsCoverage as (
SELECT *
FROM tmp_usgsCoverage
),
--Get the geometry and feature fields for the full coverage based on the covid variable gset above.
--This will be used to create a resampled NLDAS for the day
fullCoverage as (
SELECT f.*,fgeo.dh_geofield_geom
FROM dh_feature as f
LEFT JOIN field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
WHERE f.hydroid = :'covid'
),
--Where PRISM is the best performing dataset, grab the appropriate
--dialy raster from dh_weather_timeseries and resample to target resolution
--and then clip to watershed boundaries
prism as (
SELECT cov.*,
met.featureid,met.tsendtime,
st_clip(st_resample(met.rast,rt.rast), cov.dh_geofield_geom) as rast
FROM usgsCoverage as cov
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey='prism_mod_daily'
and met.featureid = :covid
and met.tsendtime = :'tsendin'
) AS met
ON ST_Intersects(ST_ConvexHull(met.rast),cov.dh_geofield_geom)
LEFT JOIN (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
WHERE cov.dataID = 1
),
--Where daymet is the best performing dataset, grab the appropriate
--daily raster from dh_weather_timeseries and resample to target resolution
--and then clip to watershed boundaries
daymet as (
SELECT cov.*,
met.featureid,met.tsendtime,
st_clip(st_resample(met.rast,rt.rast), cov.dh_geofield_geom) as rast
FROM usgsCoverage as cov
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey='daymet_mod_daily'
and met.featureid = :covid
and met.tsendtime = :'tsendin'
) AS met
ON ST_Intersects(ST_ConvexHull(met.rast),cov.dh_geofield_geom)
LEFT JOIN (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
WHERE cov.dataID = 2
),
--Union all NLDAS rasters for the day to get the sum of NLDAS for the day
nldasFullDay AS (
SELECT st_union(met.rast,'sum') as rast
FROM (
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey='nldas2_precip_hourly_tiled_16x16'
and met.featureid = :covid
and extract(year from to_timestamp(met.tsendtime)) = extract(year from to_timestamp(:'tsendin'))
and extract(month from to_timestamp(met.tsendtime)) = extract(month from to_timestamp(:'tsendin'))
and extract(day from to_timestamp(met.tsendtime)) = extract(day from to_timestamp(:'tsendin'))
) AS met
),
nldasFullDayResamp AS (
select st_resample(met.rast,rt.rast) as rast
FROM fullCoverage as f
JOIN nldasFullDay as met
ON ST_ConvexHull(met.rast) && f.dh_geofield_geom
LEFT JOIN (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
),
--Union all NLDAS rasters for the day, intersecting by the usgsCoverage geometries
--to leverage the tiled NLDAS rasters. The end result is a raster for each coverage
--where NLDAS is the most highly rated that is of the full day's dataset,
--but clipped to only intersecting tiles
nldasDay as (
SELECT cov.hydroid, cov.hydrocode,
cov.ftype, cov.bundle, cov.name,
:'tsendin' as tsendtime,
st_union(met.rast,'sum') as rast
FROM usgsCoverage as cov
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey='nldas2_precip_hourly_tiled_16x16'
and met.featureid = :covid
and extract(year from to_timestamp(met.tsendtime)) = extract(year from to_timestamp(:'tsendin'))
and extract(month from to_timestamp(met.tsendtime)) = extract(month from to_timestamp(:'tsendin'))
and extract(day from to_timestamp(met.tsendtime)) = extract(day from to_timestamp(:'tsendin'))
) AS met
ON ST_Intersects(ST_ConvexHull(met.rast),cov.dh_geofield_geom)
WHERE cov.dataID = 0
GROUP BY cov.hydroid, cov.hydrocode, cov.ftype,
cov.bundle, cov.name
),
--Now, using the union of NLDAS hourly data in nldasDay, resample to the template raster and clip to each
--watershed where NLDAS is rated the best via an INNER JOIN and the WHERE in nldasDay
nldas as (
SELECT cov.*,met.tsendtime,
st_clip(st_resample(met.rast,rt.rast), cov.dh_geofield_geom) as rast
FROM usgsCoverage as cov
INNER JOIN nldasDay as met
on cov.hydroid = met.hydroid
LEFT JOIN (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
),
--For each feature in usgsCoverage, find the non-NULL summary dataset (which will represent the best rated dataset)
amalgamate as (
select cov.*,
COALESCE(prismMet.rast,daymetMet.rast,nldasMet.rast) as rast
FROM usgsCoverage as cov
LEFT JOIN nldas as nldasMet
on cov.hydroid = nldasMet.hydroid
LEFT JOIN prism as prismMet
on cov.hydroid = prismMet.hydroid
LEFT JOIN daymet as daymetMet
on cov.hydroid = daymetMet.hydroid
),
--Union the best rated datasets together. Since the data is sorted by drainage area,
--upstream areas will populate after larger, downstream coverages
amalgamateUnion as (
SELECT ST_union(amalgamate.rast) as rast
FROM usgsCoverage as cov
LEFT JOIN amalgamate as amalgamate
on cov.hydroid = amalgamate.hydroid
)
--Use a full union to create a column with the amalgamateUnion raster and the nldasFulLDayResamp raster
--Then, union the rasters to get a raster in which the "background" is the nldasFullDayResamp and everything else is
--the best fit raster
SELECT ST_union(fullUnion.rast) as rast
FROM (
SELECT rast FROM nldasFullDayResamp
UNION ALL
SELECT rast FROM amalgamateUnion
) as fullUnion
); In my case, I found that daymet was randomly selected to be the "rating" for usgs_ws_02035000:
PRISM was selected as the random "rating" for usgs_ws_02030000:
First, let's check what value the amalgamated raster has in scenario 1 above (at a point in usgs_ws_02035000 that does not overlap any other watershed). Here, we should see that the amalgamated raster is equal to the raw data in \set hydrocode 'cbp6_met_coverage'
\set band '1'
\set tsendin '369489600'
\set varkey 'daymet_mod_daily'
\set resample_varkey 'daymet_mod_daily'
--Set point to extract
\set latitude 37.792314
\set longitude -78.157391
-- sets all integer feature and varid with query
select hydroid as covid from dh_feature where hydrocode = 'cbp6_met_coverage' \gset WITH usgs_features AS (
--SELECT *
--FROM dh_feature as f
--WHERE f.bundle = 'watershed' AND f.ftype = 'usgs_full_drainage'
SELECT *
FROM dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
WHERE f.hydrocode = :'hydrocode'
),
testPoint as (
SELECT ST_SetSRID(ST_Point( :'longitude', :'latitude'),4326) as testPoint
),
met as (
Select f.hydrocode as hydrocode, to_timestamp(:'tsendin') as obs_date,
extract(year from to_timestamp(:'tsendin')) as yr,
extract(month from to_timestamp(:'tsendin')) as mo,
extract(day from to_timestamp(:'tsendin')) as da,
extract(hour from to_timestamp(:'tsendin')) as hr,
st_value(met.rast, :'band', testPoint.testPoint) as stats
FROM usgs_features as f
JOIN tmp_amalgamate AS met
ON ST_Intersects(ST_ConvexHull(met.rast),ST_SetSRID(f.dh_geofield_geom,4326))
JOIN testPoint as testPoint
ON ST_INTERSECTS(testPoint.testPoint,ST_SetSRID(f.dh_geofield_geom,4326))
)
select hydrocode, obs_date, yr, mo, da, hr,
0.0393701 * stats precip_in
from met
order by met.obs_date;
We compare this to the value for daymet from WITH usgs_features AS (
SELECT *
FROM dh_feature as f
left outer join field_data_dh_geofield as fgeo
on (
fgeo.entity_id = f.hydroid
and fgeo.entity_type = 'dh_feature'
)
WHERE f.hydrocode = :'hydrocode'
),
testPoint as (
SELECT ST_SetSRID(ST_Point( :'longitude', :'latitude'),4326) as testPoint
),
metUnion as (
Select
st_union(met.rast,'sum') as rast
FROM usgs_features as f
JOIN testPoint as testPoint
ON 1 = 1
JOIN(
select *
from dh_timeseries_weather as met
left outer join dh_variabledefinition as b
on (met.varid = b.hydroid)
where b.varkey=:'varkey'
and met.featureid = :covid
and extract(year from to_timestamp(met.tsendtime)) = extract(year from to_timestamp(:'tsendin'))
and extract(month from to_timestamp(met.tsendtime)) = extract(month from to_timestamp(:'tsendin'))
and extract(day from to_timestamp(met.tsendtime)) = extract(day from to_timestamp(:'tsendin'))
) AS met
ON ST_INTERSECTS(testPoint.testPoint, met.bbox)
),
met as (
Select f.hydrocode, to_timestamp(:'tsendin') as obs_date,
extract(year from to_timestamp(:'tsendin')) as yr,
extract(month from to_timestamp(:'tsendin')) as mo,
extract(day from to_timestamp(:'tsendin')) as da,
extract(hour from to_timestamp(:'tsendin')) as hr,
st_value(st_resample(met.rast,rt.rast), :'band', testPoint.testPoint) as stats
FROM usgs_features as f
JOIN metUnion AS met
ON ST_Intersects(ST_ConvexHull(met.rast),ST_SetSRID(f.dh_geofield_geom,4326))
JOIN testPoint as testPoint
ON ST_INTERSECTS(testPoint.testPoint,ST_SetSRID(f.dh_geofield_geom,4326))
left join (select rast from raster_templates where varkey = :'resample_varkey') as rt
ON 1 = 1
)
select hydrocode, obs_date, yr, mo, da, hr,
0.0393701 * stats precip_in
from met
order by met.obs_date;
Success! We have matching values. Now, what about in a point in hydrocode \set latitude 37.923872
\set longitude -78.587890
\set varkey 'prism_mod_daily'
We now rerun the query against
Success! The smaller watershed was correctly amalgamated on top of the larger, downstream watershed. With our last case, we want to ensure that the "background" raster exists where there is no USGS coverage. In the amalgamated raster, this should be: \set latitude 40.299638
\set longitude -76.909790
\set varkey 'nldas2_precip_hourly_tiled_16x16'
Now let's compare to the tiled nldas data in
Once again, we see the same value. So NLDAS2 is successfully being amalgamated as the "background" precip values. |
@rburghol I like your idea of how to add in the ratings and the "best" varid. I think that will help clean up our query because that combined PRISM/daymet statement will work well. This is a great way to structure out how to deal with the ratings files and will make the query more applicable to future datasets (currently, we'd have to keep adding to the |
Moved to issue "Amalgamate Data Model" #96 |
New variable definitions for ratings files with varkeys --Variable definition for rating files
insert into dh_variabledefinition(varname,vardesc, vocabulary, varunits, varkey,datatype,varcode,varabbrev,multiplicity)
VALUES
('Best fit met ratings','Best fit ratings timeseries derived from meteorology',
'met_rating', '', 'met_rating','value','met_rating','met_rating','');
insert into dh_variabledefinition(varname,vardesc, vocabulary, varunits, varkey,datatype,varcode,varabbrev,multiplicity)
VALUES
('Best fit precip raster','Best fit precip raster data amalgamated from rating scenarios',
'amalgamate_daily', '', 'amalgamate_daily','value','amalgamate_daily','amalgamate_daily',''); |
Added two steps to
/opt/model/meta_model/run_model raster_met simple_lm_PRISM usgs_ws_02038000 auto geo import
/opt/model/meta_model/run_model raster_met simple_lm_daymet usgs_ws_02038000 auto geo import
Test on hydrocode |
Get all ratings for a model scenario: SELECT modelProp.featureid as featureid,
CASE
WHEN ts.tsvalue IS NOT NULL THEN scenProp.pid
ELSE NULL
END as pid,
ts.tstime,
ts.tsendtime,
ts.tsvalue as tsvalue
FROM dh_properties as modelProp
LEFT JOIN dh_properties as scenProp
ON scenProp.featureid = modelProp.pid
LEFT JOIN dh_timeseries as ts
ON ts.featureid = scenProp.pid
WHERE modelProp.propcode = 'met-1.0'
AND scenProp.propname IN ('simple_lm_nldas2','simple_lm_daymet','simple_lm_PRISM')
AND modelProp.featureid = :coveragefeatureid
ORDER BY ts.tstime Now, we need to:
Details-- Ratings in each column
SELECT tstime,tsendtime,
MAX(CASE WHEN varkey = 'rating_daymet' THEN tsvalue END) as rating_daymet,
MAX(CASE WHEN varkey = 'rating_prism' THEN tsvalue END) as rating_prism,
MAX(CASE WHEN varkey = 'rating_nldas2' THEN tsvalue END) as rating_nldas
FROM (
SELECT to_timestamp(ts.tstime) as tstime,
to_timestamp(ts.tsendtime) as tsendtime,
ts.tsvalue,
v.varkey
FROM dh_properties as modelProp
LEFT JOIN dh_properties as scenProp
ON scenProp.featureid = modelProp.pid
AND scenProp.entity_type = 'dh_properties'
LEFT JOIN dh_timeseries as ts
ON ts.featureid = scenProp.pid
LEFT JOIN dh_variabledefinition as v
ON v.hydroid = ts.varid
WHERE modelProp.propcode = 'met-1.0'
AND scenProp.propname = 'lm_simple'
) AS ratings
GROUP BY tstime,tsendtime;
DetailsWITH maxRating AS (
SELECT modelProp.featureid as featureid,
ts.tstime as tstime,
ts.tsendtime as tsendtime,
max(ts.tsvalue) as maxtsvalue
FROM dh_properties as modelProp
LEFT JOIN dh_properties as scenProp
ON scenProp.featureid = modelProp.pid
LEFT JOIN dh_timeseries as ts
ON ts.featureid = scenProp.pid
WHERE modelProp.propcode = 'met-1.0'
AND scenProp.propname = 'lm_simple'
GROUP BY modelProp.featureid,
ts.tstime,
ts.tsendtime
)
SELECT modelProp.featureid AS featureid,
to_timestamp(ts.tstime) as tstime,
to_timestamp(ts.tsendtime) as tsendtime,
ts.tsvalue,
v.varkey
FROM dh_properties as modelProp
LEFT JOIN dh_properties as scenProp
ON scenProp.featureid = modelProp.pid
LEFT JOIN dh_timeseries as ts
ON ts.featureid = scenProp.pid
LEFT JOIN dh_variabledefinition as v
ON v.hydroid = ts.varid
INNER JOIN maxRating
ON ts.tstime = maxRating.tstime
AND ts.tsendtime = maxRating.tsendtime
AND ts.tsvalue = maxRating.maxtsvalue
AND modelProp.featureid = maxRating.featureid
WHERE modelProp.propcode = 'met-1.0'
AND scenProp.propname = 'lm_simple'
ORDER BY modelProp.featureid,ts.tstime; Note that we see some date inconsistencies in 1980 due to the weekly expansion. We can get around this by modifying our obs_date in the coverage precip file or by using monthly ratings only:
But they do not persist:
|
This comment has been minimized.
This comment has been minimized.
@COBrogan I am eager to discuss this with you. It has been a while, but I think ideally if we have references to the best data set for each coverage stored as time series attached to those scenario properties as you suggest, we should be able to roll up all gage rasters into the single domain wide coverage. But we've never been so close to actually getting it done… Let's get together ASAP and discuss our proposals. Maybe we can put a few queries together and then see where the best place would be to alter the workflow? |
@rburghol amalgamate
It'd be interesting to see @mwdunlap2004 visualize these with the USGS gage boundaries to see if he gets the same thing in mapserver. I had to try a few different combinations of inputs for The tid for this raster is R Code for image (slow): library(raster)
library(sf)
#Get the watershed coverage from the server
watershedGeo <- read.csv("http://deq1.bse.vt.edu:81/met/usgsGageWatershedGeofield.csv")
#Get the gage numbers as their own field and store a copy of the data
gageWatershed <- watershedGeo
gageWatershed$gage <- gsub(".*_(\\d+)","\\1",gageWatershedSF$hydrocode)
gageSF <- st_as_sf(gageWatershed,wkt = 'wkt',crs = 4326)
gageSF$area <- st_area(gageSF$wkt)
gageSF <- gageSF[order(gageSF$area),]
#Get the amalgamate test raster
aTest <- raster("http://deq1.bse.vt.edu:81/met/amalgamateTest.tiff")
crs(aTest) <- crs(aTest,4326)
pal <- c("steelblue2","darkgreen","grey50","grey50","grey50","grey50","maroon4")
plot(aTest, axes = TRUE,
#plot with defined breaks
col = pal,
ylim = c(36,40.5),xlim = c(-84,-77))
plot(gageSF$wkt,col = NA,add = TRUE) |
Super interesting @COBrogan !! Maybe we can get a primer on the edge phenomena you describe on Monday? I am not sure that I know what I'm seeing. Still awesome progress! |
@COBrogan I just had an idea. It might be really slow, or it might be surprisingly fast I haven't the faintest notion. Rather than calling the routine multiple times and saving the intermediate as we discussed perhaps it could be accomplished with So as it is recursively called, it merges the product of the last merger with the next mask that you create? Example from that issue where I was developing the approach: HARPgroup/hydro-tools#435 |
@rburghol I'm still trying to figure out For now, under amalgamate ->import 03, I have the FOR loop set-up to iterate through |
@rburghol The amalgamation appears to be working for both
Check it out! Here is the varid raster from step 2 (red = NLDAS2, green= daymet, blue= PRISM):
And here is the amalgamated best fit precip data for that same day in mm! One item that became important was that the The slowest part of the workflow is step 2. The varid key raster takes ~10 - 20 seconds to be created. The amalgamated raster forms quickly, typically within a few seconds. So, I would estimate about 25-30 seconds per day........or 121 hours for the entire model time period. Not ideal, but with slurm that's likely closer to 121 / 6 = 20 hours. So, it would likely need to be run in chunks or over the weekend. I think it would be good to QC this method to make sure all the correct days were selected. Given how complicated the timestamps became (Fiji time vs EST), I think it would be good to trace data from start to finish. Looking at a cell in the raw data, find out what rating that watershed ultimately gets, make sure it is higher than the other datasets (and is thus selected as the "best"), and make sure that that same data ultimately makes it into the amalgamated raster e.g. the correct day is selected. I want to spend some time working on the main issue on this page to flush out some plausible next steps and to maybe better document some of the raster tricks we've employed across these issues (or maybe I'll add to #54 and #55) |
@COBrogan This is fantastic!! And I am eager to know how you generated that raster image :). Regardless of the "how", the "what" is interesting to me in terms of how organic it looks -- at least qualitatively there is no discernable artifacting at this zoom level. Can't wait to dive more deeply! |
@COBrogan excited to see this too. I’ll have to learn what slurm is… |
@COBrogan - to your earlier questions about how to RECURSIVE on this, I still have only a vague idea, and may need so e fancy footwork as aggregate functions like ST_UNION are apparently tricky to use in recursive statements (see here), but it goes like this:
|
Actually @COBrogan this extension adds array math capabilities to postgresql and I suspect that it is intended for use on rasters, since the contributors to the project are Paul Ramsey and Regina Obe who are primary postGIS developers/maintainers. https://github.com/pramsey/pgsql-arraymath In my mind, if the step where we create holes in the dataset is set to make the holes where the data is NOT the best, then we should be able to simply add the 3 (or more) rasters together. In fact, if we can set "not the best" cells to zero maybe that fixes the whole thing and allows us to use |
@rburghol @mwdunlap2004 We can now generate plots automatically for both the variable ID (key) rasters and the amalgamated precip rasters using the following. These are stored under the amalgamate scenario with their tsendtime e.g. http://deq1.bse.vt.edu:81/met/amalgamate_storm_vol/plots/
|
The following dates failed to amalgamate, all with the same error that happens in the first step of
2021-01-01 It appears the issue is caused by some very large NLDAS2 rasters. Looking at
Compared to 2023-12-30
It should also be noted that two distinct rasters were selected for 2023-12-30 due to how the daily fraction were developed. Both fixes will involve ensuring a consistent bounding box across the daily fractions and consistent times:
It appears the overly large rasters were a result of bad daily and hourly NLDAS2 fractions. Rerunning these steps in met #64 seems to have fixed this issue and created only one daily fraction per day, but some additional QC may be warranted. The hourly and daily fractions should be rerun across the entire time period to fix any other entries that may be causing issues that were not caught in the slurm reports. e.g.
|
This will produce a full domain coverage raster of amalgamated daily best-fit rasters, and store it in the database for later use by coverage model time series input routines in workflow
wdm
Data Model
Steps
Info
varkey
indicating the name of coverage best fit records to amalgamate for a given model scenariogeo ->impot -> 04
. Store this raster as 8-bit integer indh_timeseries_weather
under the amalgamation scenario propertyamalgamate.sh
, which takes a given varid and sets all values in the bestVarIDRaster to NULL before unioning in the resamples raw precip data to create an amalgamated rasterplotRaster.R
, called byplotRaster.sh
Run amalgamate for
simple_lm
should be in the form:There is no need to insert a coverage. The coverage details will be included in the amalgamate config and will be for the model extent. Best fit ratings should be extracted and combined for all coverages.
Run amalgamate for
storm_vol
should be in the form:Batch Run for entire model period (process only):
The text was updated successfully, but these errors were encountered: