Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace deepdish #416

Merged
merged 11 commits into from
Oct 31, 2023
Merged

Replace deepdish #416

merged 11 commits into from
Oct 31, 2023

Conversation

ejolly
Copy link
Collaborator

@ejolly ejolly commented May 8, 2023

Ok @ljchang I took a crack at replacing deepdish for saving and loading .h5 Brain_Data and Adjacency objects, which addresses #415. Feel free to play around with this PR and if nothing too strange is happening we can marge it in!

@ljchang
Copy link
Member

ljchang commented Oct 3, 2023

Seems to be working for me. let's go ahead and merge. (also not sure why my previous comments aren't showing up to do the same thing).

@ejolly
Copy link
Collaborator Author

ejolly commented Oct 11, 2023

Ok @ljchang I did some formal testing about trying to read in h5 files made with the current main version of nltools and this new proposed h5 fix, and it doesn't work 🙁 So if we merge this in then any h5 files created with previous versions of nltools will need old versions of nltools to be loaded, rendering them basically useless over time.

The reason seems to be that the way the deepdish does saving/loading is very different and more complicated from how h5py does which is pretty much like opening any other file: with H5File('brain.h5', 'r'):

As a result this new PR isn't backwards compatible. How do you want to proceed? A few thoughts:

  • Try to reverse engineer deepdish's file loading logic and hope it still works on new versions of numpy without having to introduce our own hacks (lack of staying current with numpy is why this PR is trying to drop deepdish in the first place)
  • Don't support backwards compatibility at all
  • Find a different high-level h5 library in python thats interoperable with deepdish

@ljchang
Copy link
Member

ljchang commented Oct 11, 2023

That's a bummer @ejolly . What specifically isn't working? I have tried loading older hdf5 files from the FNL dataset and they seem to be loading okay for me. Though I don't have any .X or .Y attributes stored in them.

@ejolly
Copy link
Collaborator Author

ejolly commented Oct 11, 2023

Yea it's primarily the X and Y attributes. We can't load those easily.

@ljchang
Copy link
Member

ljchang commented Oct 11, 2023

ok, I think that is because they are using pytables to deal with the pandas stuff.

I have a couple of thoughts.

  1. how hard does it look to try and emulate this aspect of deepdish?
  2. how big of a deal is it if we don't support older hdf5 files that contain .X or .Y info? pickle has not done this (super annoying BTW). I think the pure numpy ones should be fine. We could just throw an error or warning if .X or .Y info is detected on the newer version that you will need to load this using an older version of nltools. I have no idea how often people even use this functionality. I personally never do because I'm usually just trying to make a faster version of a nifti file and those don't store this type of metadata anyway.

@ejolly
Copy link
Collaborator Author

ejolly commented Oct 20, 2023

@ljchang I was able to solve this now and verified it works across different nltools version! Brain_Data now accepts a legacy_h5 kwarg that when set to True will use pytables to mimic how deepdish was loading files. Otherwise it will use h5py which is the new default going forward.

The only thing we lose when loading h5 files from older versions of nltools is the column-type data for the X and Y attributes. Everything becomes at object type but I think that's probably acceptable and the best we can do.

I added a test for this and the last thing left to do when I get a chance is add support for legacy Adjacency hdf5 files.

@ljchang
Copy link
Member

ljchang commented Oct 23, 2023

awesome! nice work!

@ejolly
Copy link
Collaborator Author

ejolly commented Oct 27, 2023

Legacy h5 support for Adjacency is now implemented. One thing to note is that new hdf5 files (for brain or adjacency) created with these updates are a bit larger than those created with deepdish because of the default compression (zlib) deepdish was using. h5py supports gzip natively which I have implemented as the default, but it might be worth looking into this helper library for further compression support: https://pypi.org/project/hdf5plugin/

Tests are only failing on python 3.7 which is fine because we're deprecating support for that version (and adding up to 3.11)

@ejolly ejolly merged commit 3624a5a into master Oct 31, 2023
@ejolly ejolly mentioned this pull request Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants