Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling timestamps in .h5 files with rhdf5 #241

Open
glenncampagna opened this issue Jun 27, 2022 · 5 comments
Open

Handling timestamps in .h5 files with rhdf5 #241

glenncampagna opened this issue Jun 27, 2022 · 5 comments
Assignees

Comments

@glenncampagna
Copy link
Collaborator

  • Summary of last week's findings about the timestamp issue in R
  • We used the H5Dread command and utilized the bit64conversion argument in order to get usable values, since R does not support the raw 64-bit timestamps
fid = H5Fopen("OR1_7700_7980.h5")
did = H5Dopen(fid, "RESULTS/RCHRES_R001/HYDR/table")
rchres1 <- H5Dread(did, bit64conversion= "double")
head(rchres1)
         index       DEP     IVOL O1 O2       O3 OVOL1 OVOL2     OVOL3 PRSUPY
1 4.417668e+17 0.2415072 8.847264  0  0 2.055183     0     0 0.1229398      0
2 4.417704e+17 0.3002159 8.828485  0  0 3.175832     0     0 0.2161576      0
3 4.417740e+17 0.3486096 8.810900  0  0 4.282218     0     0 0.3081839      0
4 4.417776e+17 0.3905506 8.793962  0  0 5.374578     0     0 0.3990412      0
5 4.417812e+17 0.4279465 8.777404  0  0 6.453111     0     0 0.4887475      0
6 4.417848e+17 0.4619085 8.761090  0  0 7.517995     0     0 0.5773184      0
        RO     ROVOL     SAREA        TAU     USTAR      VOL VOLEV
1 2.055183 0.1229398  67.47366 0.02344255 0.1099862 15.79433     0
2 3.175832 0.2161576  83.87602 0.02914127 0.1226281 24.40666     0
3 4.282218 0.3081839  97.39652 0.03383873 0.1321425 32.90938     0
4 5.374578 0.3990412 109.11423 0.03790982 0.1398658 41.30430     0
5 6.453111 0.4887475 119.56212 0.04153978 0.1464090 49.59296     0
6 7.517995 0.5773184 129.05061 0.04483640 0.1521076 57.77673     0
  • Now converting the timestamp to a date-time
origin <- "1970-01-01"
rchres1$index <- as.POSIXct((rchres1$index)/1000000000, origin = origin, tz="UTC")
> head(rchres1)
                index       DEP     IVOL O1 O2       O3 OVOL1 OVOL2     OVOL3
1 1984-01-01 01:00:00 0.2415072 8.847264  0  0 2.055183     0     0 0.1229398
2 1984-01-01 02:00:00 0.3002159 8.828485  0  0 3.175832     0     0 0.2161576
3 1984-01-01 03:00:00 0.3486096 8.810900  0  0 4.282218     0     0 0.3081839
4 1984-01-01 04:00:00 0.3905506 8.793962  0  0 5.374578     0     0 0.3990412
5 1984-01-01 05:00:00 0.4279465 8.777404  0  0 6.453111     0     0 0.4887475
6 1984-01-01 06:00:00 0.4619085 8.761090  0  0 7.517995     0     0 0.5773184
  PRSUPY       RO     ROVOL     SAREA        TAU     USTAR      VOL VOLEV
1      0 2.055183 0.1229398  67.47366 0.02344255 0.1099862 15.79433     0
2      0 3.175832 0.2161576  83.87602 0.02914127 0.1226281 24.40666     0
3      0 4.282218 0.3081839  97.39652 0.03383873 0.1321425 32.90938     0
4      0 5.374578 0.3990412 109.11423 0.03790982 0.1398658 41.30430     0
5      0 6.453111 0.4887475 119.56212 0.04153978 0.1464090 49.59296     0
6      0 7.517995 0.5773184 129.05061 0.04483640 0.1521076 57.77673     0

The same process should work for converting the timestamps of any data table in the .h5 file to a date-time

@glenncampagna glenncampagna changed the title Handing timestamps in .h5 files with rhdf5 Handling timestamps in .h5 files with rhdf5 Jun 27, 2022
@glenncampagna
Copy link
Collaborator Author

glenncampagna commented Jun 27, 2022

  • Writing a dataset from the .h5 to a .csv:
  • Make sure the output of H5Dread has been assigned to a variable and is in the format of a data frame, mine is 'rchres1'
  • write.table(rchres1, file = "/Users/glenncampagna/Documents/HARPteam22/rchres.csv", sep = ",", row.names = FALSE) results in a populated .csv file in my HARPteam22 folder
  • Note: this was only successful in the local R application, and would need a different file path to be downloaded while on the deq1 server

@rburghol
Copy link
Contributor

Ok so the discovery of bit64conversion is exciting -- great work! Now if we can just figure out why we have to divide by a billion.

@nicoledarling
Copy link
Contributor

nicoledarling commented Jun 28, 2022

@rburghol I believe since the Unix timestamps are in nanoseconds and the normal dates are in seconds, we need to divide by a billion to convert. More on this in link below in answer 5:
https://discuss.dizzycoding.com/convert-numpy-datetime64-to-string-object-in-python

@juliabruneau
Copy link
Contributor

juliabruneau commented Jun 28, 2022

@rburghol in addition to what @nicoledarling found, this converter website gives the definition of the Unix epoch timeseries that the .h5 files use:

The Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT), not counting leap seconds (in ISO 8601: 1970-01-01T00:00:00Z). Literally speaking the epoch is Unix time 0 (midnight 1/1/1970), but 'epoch' is often used as a synonym for Unix time. Some systems store epoch dates as a signed 32-bit integer, which might cause problems on January 19, 2038 (known as the Year 2038 problem or Y2038). The converter on this page converts timestamps in seconds (10-digit), milliseconds (13-digit) and microseconds (16-digit) to readable dates.

Since they state that "some systems store the epoch dates as signed 32-bit integer", this might be the reason why we had to convert the data first with bit64conversion?

From: https://www.epochconverter.com/

@glenncampagna glenncampagna self-assigned this Jun 28, 2022
@rburghol
Copy link
Contributor

rburghol commented Jun 28, 2022

@nicoledarling ahh this is a good catch. I would clarify though that unix tjmestamps are, i think in seconds, and it is numpy timestamps that are in nanoseconds. numpy is a library in python, and perhaps there is some reason that it is storing these data as nanoseconds. Maybe the way our ches bay UCI files are formatted? I think we should look at the output of the test UCIs/h5 files that come with the hsp2 package.
Download and run these thru hsp2:

@rburghol rburghol mentioned this issue Jul 5, 2022
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants