-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF5 Support #41
Comments
Hi Andrew, I don't think hdf5 lends itself well to having multiple interleaved streams of data. I might be wrong, but I don't think it can have multiple datasets that are all able to grow arbitrarily in size after they are created. Can it? I know it can have 1, but I seem to recall facing a problem when I tried to have more than 1. Even if it can have multiple resizable datasets, HDF5 is just a container. The organization of the metadata, datasets, timestamps, etc don't have any universal specification. The only thing that would improve (over xdf) from a user perspective is that they could use any widely available hdf5 importer to get the data in memory, but they would still have to write a custom layout-specific importer to get an intuitive representation of the data. This is arguably worse than using the existing xdf importers we have. So then we would need to specify a data layout, and create (and maintain) xdfh5 tools. There are some nice tools for hdf5 (e.g. Dask) that have some great features when loading large data... so I see the value in hdf5 support. However, I think this value can be more easily obtained with an xdf to hdf5 converter. You can already do this pretty easily with NeuroPype and not really lose any information, but you still encounter the problem of having to know the neuropype h5 layout when you want to load it with tools other than neuropype. I think NeuroPype can also write to nwb. If it doesn't yet then it should be able to soon. I also have the opinion that importing arbitrary h5 files in Matlab kind of sucks and I don't see it as an improvement over the xdf-Matlab importer. |
This is an interesting argument of decarative vs imperative-ness of storing data. I see what you're saying regarding not being explicit enough and losing the meaning of data - that said we can use metadata - so I'm not sure if you'd lose any specification info. I don't think there's an issue of multiple resizable datasets. On the other hand, you'd be able to use the format nearly everywhere and in whatever language you'd like... which would open us to use more sophisticated tooling than matlab? Perhaps ubiquity might be a good option over specialization? But its an old ongoing argument in the tech world. |
We'd still have to agree on the specifications. What will the field names be exactly? What is the type and layout of the information they hold? etc. But actually maybe it's just better to create & use a specification that is an extension of NWB? e.g. NWB:X? (I would suggest NWB:N, but I don't think that holds all of the information we need for different modalities. LSL ecosystem records more than just EEG.) I would be interested to see a pull request wherein LabRecorder users have the option of saving either to XDF or to NWB. But, that seems like a lot of work and the value added isn't a whole lot more than what you would get with the comparatively easier work to write a lightweight pyxdf --> pynwb converter, if for some reason you aren't happy with the conversion options available in NeuroPype or MNE-Python. This tool could also come with an anonymization feature and a BIDS validator. I think there's a lot of value here. There's a much bigger problem that I forgot to mention before. So, if we were to write to another format, we would have to either:
So once again, a converter is a much better option. |
I agree fully with Chadwick. HDF is not suitable for continuous recording. So we would need a converter first. But a format like xdf will be more compact, and we already have xdf loaders for many languages. So where is the benefit of hdf? |
My use-case is that I'm interested in far more than neuro data (just to fill in the background). It's interesting about doing the clock offsets post processing. Totally understand why you like xdf now. Perhaps is that something that we could improve in the underlying LSL protocol? Wonder if we can include sync and buffering while writing realtime. We use our own format in neuromore.com studio but I am pushing the team more towards open source - so interested to see what best practices we can develop with the community. NWB looks like what I might be after - super interesting, thanks @cboulay Not sure what you are on about @agricolab |
In my limited experience, manipulating or extending an hdf5 file usually results in bloated files. see https://support.hdfgroup.org/HDF5/doc/H5.user/Performance.html |
Interesting @agricolab, I haven't come across that issue myself as I don't remove chunks from the data I'm capturing. From what I read though if you do get bloat you can rewrite the file and it will free up the space. Astronomers use h5 so I'm sure it can capture streaming sufficiently at massive bitrates. I'm a fan of using standards that way you benefit from so much extra tooling and platform support. |
Hi there,
Just wondering if you thought of adding hdf5 support for the more widely used format and if there are any issues against?
Thanks
Andrew
The text was updated successfully, but these errors were encountered: