Python hdf5 reader #236

nicoledarling · 2022-06-23T17:17:42Z

Exploring how to use Python to help us find a solution to finding timestamps in .h5 files
See also:

Recommended method to walk HDF5 data tree? h5py/h5py#406

gcambridge · 2022-06-23T18:04:11Z

I was using the py docs for h5 in the data dictionary issue (#237), but it should be pretty helpful for this issue too.

Preliminary reading of the documentation suggests element hierarchy (group>dataset>attribute) is more defined, with clearer relationships between each "level". Additionally, the functions for modifying (set) and retrieving (get) attribute data have clearer syntax (might just be personal preference).

The following are some of the h5py functions and syntax useful in the conversion from R to python:

Opening .h5 files:

f = h5py.File('myfile.hdf5','r')
where 'r' is the parameter determining if data is to be read from or written to the file. The following parameters are accepted:
r - read
w - write
a - read/write

Datasets:

list all existing data sets using list(f.keys())
assign dataset to a Dataset object using dset = f['dset_id'], where dset is the object, and dset_id is the id found in the previous line.

Attributes

Attributes within a dataset can be listed using list(dset.keys())

Closing .h5 files:

File.close() will result in a "strong close", where all references to file components become unusable.

gcambridge · 2022-06-24T16:00:21Z

6/24/2022: Successfully located and converted timestamp using Python

The following are the steps for locating the timestamps and then manually converting them to UTC.

Ensure h5py is properlly installed using pip install h5py
Open python3 in the deq server. use command python3 from terminal
Import h5py using import h5py
Open the target file using f = h5py.File('file_name', 'r') where target is the file you want to open
View the datasets in the file using list(f.keys())
Create a TIMESERIES Dataset object using dset = f['TIMESERIES']
View the sub datasets with list(dset.keys())
Create a TIMESERIES/target Dataset object using dset_target = f['TIMESERIES/target'], where target is the dataset you want to open. (which could be TS011, TS012, etc)
For each dataset at this level, there are two tables, chose the correct table using dset_target_table = f['TIMESERIES/target/table'] (again target is the desired dataset) (the two tables are [ '_i_table' , 'table' ] and we just want table)
From this table, you can pull timeseries over a given range from a to z data using dset_target_table[a:z]

From here, the values reported need to be divided by 1x10^9, and then manually input into the python funtion fromtimestamp(), following the following steps:

import datetime
dt = datetime.datetime.fromtimestamp('Unix_timestamp')

Next steps are to streamline the conversion process using a python script

Note: Data from TIMESERIES/TS011/table is output as a 'tuple' which is displayed as (#, #). First number is the timestamp and second is called 'value' but we are unsure so far of significance

Simple and inefficient example for converting first timestamp to a date (to be continued):

>>> dset_TS011table[1:5]
array([(441766800000000000, 8.82848454), (441770400000000000, 8.81089973),
       (441774000000000000, 8.79396248), (441777600000000000, 8.77740383)],
      dtype=[('index', '<i8'), ('values', '<f8')])
>>> tuple1 = dset_TS011table[1]
>>> tuple1
(441766800000000000, 8.82848454)
>>> timestamp1, value1 = tuple1
>>> timestamp1
441766800000000000
>>> timestamp1_ = timestamp1/1000000000
>>> timestamp1_
441766800.0
>>> datetime.datetime.fromtimestamp(timestamp1_)
datetime.datetime(1984, 1, 1, 1, 0)
>>> tuple_last = dset_TS011table[324359]
>>> timestamp_last, value_last = tuple_last
>>> timestamp_last_ = timestamp_last/1000000000
>>> datetime.datetime.fromtimestamp(timestamp_last_)
datetime.datetime(2020, 12, 31, 23, 0)

The original UCI from #211 had the end date as 2019/12/31, but Python showed that the last timestamp from the hdf5 was 2020/12/31... is this a problem?

megpritch · 2022-06-24T17:12:14Z

Helpful Source for HDF5 Files in Python:

https://pythonnumericalmethods.berkeley.edu/notebooks/chapter11.05-HDF5-Files.html

Helped me understand the syntax more clearly (e.g. that keys are the groups and then there are subgroups inside them)
The commands & their explanations made it much easier to access the time series datasets.

megpritch · 2022-06-24T18:31:38Z

Source for Creating Python Script in Command Line

https://www.jcchouinard.com/create-python-script-from-terminal/

gcambridge · 2022-06-24T19:09:43Z

Python script for converting timestamps from Unix time to datetime format

A script for timestamp conversion has been committed to the Harp Archive folder.

Log into the deq server and enter the directory with the OR1_7700_7980.h5 file.
Use vim to create and edit the file: vim h5py_ts_convert
Enter editing mode by pressing "i"
Right click and paste the contents of the github file into the vim editor
Save and exit by typing :wq
Run the script by entering python3 h5py_ts_convert into the terminal line
Follow the prompts and enter timeseries table you whish to convert

rburghol · 2022-11-29T14:29:40Z

Trying on my local windows machine
Downloaded: http://deq1.bse.vt.edu:81/files/cbp/OR1_7700_7980.h5
Followed some more details here: https://github.com/respec/HSPsquared/blob/8176be9236be9ee6e55422d01531591a769a8df7/examples/archive/Demo3.ipynb
NOTE: as of 12.20.2022 this code no longer works. Specifically the line uhydro = h5py.read_hdf(fpath , '/RCHRES/HYDR') fails saying that AttributeError: module 'h5py' has no attribute 'read_hdf'

import h5py
import pandas as pd
from collections import defaultdict
from pandas import DataFrame, read_hdf, HDFStore

fpath = 'C:/Workspace/modeling/cbp/hsp2/river/OR1_7700_7980.h5' 
f = h5py.File(fpath,'r')

rchres = f['RCHRES']
list(rchres.keys())
# ['ADCALC', 'GENERAL', 'HYDR']
hydr = rchres['HYDR']
list(hydr.keys())
h5py.File(fpath,'a')
uhydro = h5py.read_hdf(fpath , '/RCHRES/HYDR')
# uhydro[0:]
#           VOL  ICAT  COLIN1  COLIN2  COLIN3  COLIN4  COLIN5  OUTDG1  OUTDG2  ...  CFRAC3  CTAG4  CFRAC4 CTAG5 CFRAC5  CTAG6 CFRAC6  CTAG7 CFRAC7
# R001  7.07001  -1.0     4.0     4.0     4.0     4.0     4.0     0.0     0.0  ...     0.0      0     0.0     0    0.0      0    0.0      0    0.0

uhydro.index
# Index(['R001'], dtype='object')
uhydro.index[0]
# 'R001'

rburghol · 2022-11-29T16:55:38Z

See use of hdf5 to create new datasets for the Equation type: HARPgroup/HSPsquared#25 (comment)

nicoledarling assigned rburghol, gcambridge, glenncampagna, megpritch, nicoledarling and juliabruneau Jun 23, 2022

nicoledarling mentioned this issue Jun 23, 2022

2022/06/21-24 HARP hdf5 knowledge transfer, prep for data mining #233

Open

9 tasks

rburghol mentioned this issue Jun 24, 2022

Week of 2022/06/27 HARP hdf5 data-mining, running hsp2, data models and REST #239

Open

20 tasks

rburghol changed the title ~~Python~~ Python hdf5 reader Jun 27, 2022

rburghol mentioned this issue Jun 27, 2022

Create issue for using Python #238

Closed

rburghol mentioned this issue Oct 5, 2022

Detailed Work Plan 2022 #209

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python hdf5 reader #236

Python hdf5 reader #236

nicoledarling commented Jun 23, 2022 •

edited by rburghol

Loading

gcambridge commented Jun 23, 2022 •

edited

Loading

gcambridge commented Jun 24, 2022 •

edited by glenncampagna

Loading

megpritch commented Jun 24, 2022 •

edited

Loading

megpritch commented Jun 24, 2022

gcambridge commented Jun 24, 2022 •

edited by megpritch

Loading

rburghol commented Nov 29, 2022 •

edited

Loading

rburghol commented Nov 29, 2022 •

edited

Loading

Python hdf5 reader #236

Python hdf5 reader #236

Comments

nicoledarling commented Jun 23, 2022 • edited by rburghol Loading

gcambridge commented Jun 23, 2022 • edited Loading

Opening .h5 files:

Datasets:

Attributes

Closing .h5 files:

gcambridge commented Jun 24, 2022 • edited by glenncampagna Loading

6/24/2022: Successfully located and converted timestamp using Python

The following are the steps for locating the timestamps and then manually converting them to UTC.

megpritch commented Jun 24, 2022 • edited Loading

Helpful Source for HDF5 Files in Python:

megpritch commented Jun 24, 2022

Source for Creating Python Script in Command Line

gcambridge commented Jun 24, 2022 • edited by megpritch Loading

Python script for converting timestamps from Unix time to datetime format

rburghol commented Nov 29, 2022 • edited Loading

rburghol commented Nov 29, 2022 • edited Loading

nicoledarling commented Jun 23, 2022 •

edited by rburghol

Loading

gcambridge commented Jun 23, 2022 •

edited

Loading

gcambridge commented Jun 24, 2022 •

edited by glenncampagna

Loading

megpritch commented Jun 24, 2022 •

edited

Loading

gcambridge commented Jun 24, 2022 •

edited by megpritch

Loading

rburghol commented Nov 29, 2022 •

edited

Loading

rburghol commented Nov 29, 2022 •

edited

Loading