Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week of 2022/06/27 HARP hdf5 data-mining, running hsp2, data models and REST #239

Open
20 tasks done
rburghol opened this issue Jun 24, 2022 · 0 comments
Open
20 tasks done

Comments

@rburghol
Copy link
Contributor

rburghol commented Jun 24, 2022

Topics:

  • See below "As of Last Week" for previously covered.
  • Goal for the week:
  • Data mining with hdf5. (h5dump h5dump #235, rhdf5 rhdf5 #207, python Python hdf5 reader #236)
    • R:
      • Create issue for problem handling timestamps in rhdf5, migrate comments from last weeks issues that are relevant so as to preserve (and review) our work. Handling timestamps in .h5 files with rhdf5 #241
      • Annotate data sources found last week in data dictionary Data Dictionary #237
      • Begin data mining landseg h5 demo files and annotate in dictionary
    • Terminal:
      • Is the R module just leveraging the h5dump command line (terminal) tools?
      • Can we do a update in there?
  • Python
    • basic python h5 tools needs 1000000000 divisor also
    • how about in the hsp2 package? what do their demos do?
    • is this a bug in hsp2 hdf5 storage or is it just a seldom used convention?
  • Learning hsp2
  • Work through vahydro land, river features in vahydro via REST. (some info outlined in 2022/06/13 HARP Work Session :VAHydro Model Components Overview #206)
  • Review basic models in vahydro -- Intake, and channel/impoundment primarily.
  • Mapping in R: @jdkleiner, when you think this is ready for prime time let's schedule a 30+ minute tutorial, and ideally, the 3 of us will come up with a goal for a "fact sheet" type Rmd to do a map, a hydro analysis and withdrawal summary table (using components from VWP where applicable).
  • Running demo of hsp2: I have 2 demo datasets already: landseg and riverseg. These demos use the basic HSP2 command line tools, and are simple and straightforward. However, I have to enable the install to be globally usable. Currently it is not.

Project Management/Medium-Term Challenges

  • Ultimately, we will be running hsp2 models via the same command line tools as the cbp hspf, not the hsp2 commands by themselves(demo above) since we will enable the cbp tools to use hsp2 or hspf. However, as we have yet to integrate hsp2 into the cbp commands, and also because these are good background, I think this a useful pursuit.
  • Accessing data from hdf5 database format: this is the central goal of the summer. There are 2 ways: R and cmd line tools (h5dump). R is easier, but there is a potential problem with timestamps, h5dump works fine it seems, but I don't know how to control formatting so there will need to be development to make it usable.
  • File sizes: already there seems to be some immediate drawbacks to HSP2. The HDF five data format as used by HSP to, is absolutely enormous. One land segment/land-use element generates a gigabyte of data for the 35 year model run period. Note: there are between 35 and 50 land-use/land segment elements for every river segment in the model, obviously this can't work. So, we will need to look at different ways to address this, maybe seeing if gzip can be used to compress, or maybe being very aggressive about cleanup after model runs -- that is, we export only those data components that we know we will need from the HDF5 file, then delete to save space. Nothing to figure out today, but we need to have this in our minds so we can come up with an ideal solution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants