Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python hdf5 reader #236

Open
nicoledarling opened this issue Jun 23, 2022 · 7 comments
Open

Python hdf5 reader #236

nicoledarling opened this issue Jun 23, 2022 · 7 comments
Assignees

Comments

@nicoledarling
Copy link
Contributor

nicoledarling commented Jun 23, 2022

Exploring how to use Python to help us find a solution to finding timestamps in .h5 files
See also:

@gcambridge
Copy link
Contributor

gcambridge commented Jun 23, 2022

I was using the py docs for h5 in the data dictionary issue (#237), but it should be pretty helpful for this issue too.

Preliminary reading of the documentation suggests element hierarchy (group>dataset>attribute) is more defined, with clearer relationships between each "level". Additionally, the functions for modifying (set) and retrieving (get) attribute data have clearer syntax (might just be personal preference).

The following are some of the h5py functions and syntax useful in the conversion from R to python:

Opening .h5 files:

f = h5py.File('myfile.hdf5','r')
where 'r' is the parameter determining if data is to be read from or written to the file. The following parameters are accepted:
r - read
w - write
a - read/write

Datasets:

  • list all existing data sets using list(f.keys())
  • assign dataset to a Dataset object using dset = f['dset_id'], where dset is the object, and dset_id is the id found in the previous line.

Attributes

  • Attributes within a dataset can be listed using list(dset.keys())

Closing .h5 files:

  • File.close() will result in a "strong close", where all references to file components become unusable.

@gcambridge
Copy link
Contributor

gcambridge commented Jun 24, 2022

6/24/2022: Successfully located and converted timestamp using Python

The following are the steps for locating the timestamps and then manually converting them to UTC.

  1. Ensure h5py is properlly installed using pip install h5py
  2. Open python3 in the deq server. use command python3 from terminal
  3. Import h5py using import h5py
  4. Open the target file using f = h5py.File('file_name', 'r') where target is the file you want to open
  5. View the datasets in the file using list(f.keys())
  6. Create a TIMESERIES Dataset object using dset = f['TIMESERIES']
  7. View the sub datasets with list(dset.keys())
  8. Create a TIMESERIES/target Dataset object using dset_target = f['TIMESERIES/target'], where target is the dataset you want to open. (which could be TS011, TS012, etc)
  9. For each dataset at this level, there are two tables, chose the correct table using dset_target_table = f['TIMESERIES/target/table'] (again target is the desired dataset) (the two tables are [ '_i_table' , 'table' ] and we just want table)
  10. From this table, you can pull timeseries over a given range from a to z data using dset_target_table[a:z]

From here, the values reported need to be divided by 1x10^9, and then manually input into the python funtion fromtimestamp(), following the following steps:

  1. import datetime
  2. dt = datetime.datetime.fromtimestamp('Unix_timestamp')

Next steps are to streamline the conversion process using a python script

Note: Data from TIMESERIES/TS011/table is output as a 'tuple' which is displayed as (#, #). First number is the timestamp and second is called 'value' but we are unsure so far of significance

  • Simple and inefficient example for converting first timestamp to a date (to be continued):
>>> dset_TS011table[1:5]
array([(441766800000000000, 8.82848454), (441770400000000000, 8.81089973),
       (441774000000000000, 8.79396248), (441777600000000000, 8.77740383)],
      dtype=[('index', '<i8'), ('values', '<f8')])
>>> tuple1 = dset_TS011table[1]
>>> tuple1
(441766800000000000, 8.82848454)
>>> timestamp1, value1 = tuple1
>>> timestamp1
441766800000000000
>>> timestamp1_ = timestamp1/1000000000
>>> timestamp1_
441766800.0
>>> datetime.datetime.fromtimestamp(timestamp1_)
datetime.datetime(1984, 1, 1, 1, 0)
>>> tuple_last = dset_TS011table[324359]
>>> timestamp_last, value_last = tuple_last
>>> timestamp_last_ = timestamp_last/1000000000
>>> datetime.datetime.fromtimestamp(timestamp_last_)
datetime.datetime(2020, 12, 31, 23, 0)

The original UCI from #211 had the end date as 2019/12/31, but Python showed that the last timestamp from the hdf5 was 2020/12/31... is this a problem?

@megpritch
Copy link
Collaborator

megpritch commented Jun 24, 2022

Helpful Source for HDF5 Files in Python:

https://pythonnumericalmethods.berkeley.edu/notebooks/chapter11.05-HDF5-Files.html

  • Helped me understand the syntax more clearly (e.g. that keys are the groups and then there are subgroups inside them)
  • The commands & their explanations made it much easier to access the time series datasets.

@megpritch
Copy link
Collaborator

Source for Creating Python Script in Command Line

https://www.jcchouinard.com/create-python-script-from-terminal/

@gcambridge
Copy link
Contributor

gcambridge commented Jun 24, 2022

Python script for converting timestamps from Unix time to datetime format

A script for timestamp conversion has been committed to the Harp Archive folder.

  • Log into the deq server and enter the directory with the OR1_7700_7980.h5 file.
  • Use vim to create and edit the file: vim h5py_ts_convert
  • Enter editing mode by pressing "i"
  • Right click and paste the contents of the github file into the vim editor
  • Save and exit by typing :wq
  • Run the script by entering python3 h5py_ts_convert into the terminal line
  • Follow the prompts and enter timeseries table you whish to convert

@rburghol rburghol changed the title Python Python hdf5 reader Jun 27, 2022
@rburghol
Copy link
Contributor

rburghol commented Nov 29, 2022

import h5py
import pandas as pd
from collections import defaultdict
from pandas import DataFrame, read_hdf, HDFStore

fpath = 'C:/Workspace/modeling/cbp/hsp2/river/OR1_7700_7980.h5' 
f = h5py.File(fpath,'r')

rchres = f['RCHRES']
list(rchres.keys())
# ['ADCALC', 'GENERAL', 'HYDR']
hydr = rchres['HYDR']
list(hydr.keys())
h5py.File(fpath,'a')
uhydro = h5py.read_hdf(fpath , '/RCHRES/HYDR')
# uhydro[0:]
#           VOL  ICAT  COLIN1  COLIN2  COLIN3  COLIN4  COLIN5  OUTDG1  OUTDG2  ...  CFRAC3  CTAG4  CFRAC4 CTAG5 CFRAC5  CTAG6 CFRAC6  CTAG7 CFRAC7
# R001  7.07001  -1.0     4.0     4.0     4.0     4.0     4.0     0.0     0.0  ...     0.0      0     0.0     0    0.0      0    0.0      0    0.0

uhydro.index
# Index(['R001'], dtype='object')
uhydro.index[0]
# 'R001'

@rburghol
Copy link
Contributor

rburghol commented Nov 29, 2022

See use of hdf5 to create new datasets for the Equation type: HARPgroup/HSPsquared#25 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants