Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raster Disk Storage Factors #77

Open
Tracked by #54
rburghol opened this issue Aug 5, 2024 · 0 comments
Open
Tracked by #54

Raster Disk Storage Factors #77

rburghol opened this issue Aug 5, 2024 · 0 comments

Comments

@rburghol
Copy link
Contributor

rburghol commented Aug 5, 2024

Raster Disk Storage Factors

Tiling

  • Smaller raster cell sizes could lead to more efficient disk loads.
  • Tiling should not increase storage overhead by much

Numeric Data Type

  • 32BF - 32-bit floating point type -- with rainfall data in millimeters, 64-bit should not be needed?
    • Note: nldas2 is 64-bit, but prism and daymet are already using 32 bit so no savings there
    • In reality a 16-bit floating point type would capture our instrument range and precisions without any issue.
  • unsigned 8bit integer '8BUI'::text
    • This will be an ideal candidate for doing amalalgamated daily (or weekly) rasters that rather than having a rainfall value, have an integer which points to the integer index of the dataset in question, by adding a serial integer index columns to the raster_templates to match the varkey to an 8-but number, which gives us up to 256 different variables that we can reference in the amalgamated rasters. Then, they will also plot nicely where we can assign a separate color to each integer index for data source to display the amalgamations.
Change Data Type
  • Fractional rasters change from 64 to 32 bit
  • Before type change: /dev/nvme1n1p1 3.6T 3.3T 183G 95% /data
UPDATE dh_timeseries_weather SET rast = ST_MapAlgebraExpr(rast, '32BF', '[rast]') 
WHERE varid in (select hydroid from dh_variabledefinition where varkey = 'nldas2_precip_daily_frac');

  • After type change: /dev/nvme1n1p1 XXX

Resampling

Data Storage

  • Data Storage Capacity (note: dbase2 has 188GB of RAM)
  • Resampling could have a negative effect on total disk storage (and lead to slower loads into database memory)
    • Storage for hi-res rasters can become challenging. See https://postgis.net/docs/ST_MemSize.html
    • daymet total is(query below): totgeomsum = 60 GB
      • 1.4 Terabytes would be needed to store the entire daymet source as 24-hour rasters.
      • We cannot store 24 hours @ 1 km x 1km resolution. It takes 1.4 TB to do so, and that's 75% of our disks. Thus, temporal dissag must happen at just in time for spatial averaging over the land segments that we generate precip for.
      • Flip side: We CAN store daily daymet resampled values for PRISM in our permanent database tables if it is advantageous. But not for NLDAS2 (we could for daily summed NLDAS2 if it is advantageous).
      • However: at a watershed scale, both temporal disaggregation and spatial resampling can happen simultaneously since the amounts of memory will be much smaller.
        • Example, James River is about 10% of size of full meteorology domain, which would be about 140GB of data.
      • Also, we could store a watersheds disaggregated and resampled intermediate in a database table for a short(ish) duration during a workflow.
    • Does resampling/processing order affect precip disaggregation values? See Reversibility of Spatial Resample and Temporal Disaggregate Steps #79
      Query 1: Get sum of raster storage needs for daymet dataset.
SELECT pg_size_pretty(SUM(ST_MemSize(rast))) as totgeomsum from dh_timeseries_weather where varid in (select hydroid from dh_variabledefinition where varkey = 'daymet_mod_daily');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant