-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameterize dtype for h5path with SlideData
constructor
#335
base: dev
Are you sure you want to change the base?
Conversation
SlideData
constructor
@jacob-rosenthal Do I remember you saying that certain transforms expect float16? If you remember which we should tag and document these |
Thanks @tddough98 , this looks great! The reason we had to add the dtype parameter in the first place was as a workaround because some of our collaborators were running into problems with images where the dtype from the metadata didn't match the dtype of the image data itself. So the parameter was added so that people could read images in e.g. float16 even if the image metadata said int8. From what I can tell looking at the code, this PR will still keep that behavior? And yes, @ryanccarelli I think we did run into problems with some of the image transforms being picky about dtypes, and found that float16 worked well usually. I think since float16 remains the default, this should still be fine, since things would only break if people manually change dtype themselves So from what I can tell, this PR seems like strictly an improvement, keeping existing behavior but making it cleaner & more flexible |
I reverted storing all masks as boolean. I was focused on masks made by |
Ah yes that is a good catch. I think that should have been caught by the tests? But looks like the tests haven't run for this PR yet, I guess because it is a PR that isn't merging into dev or master (the only branches that trigger tests). |
Currently, PathML stores all images with
float16
, forcing all image inputs to be upcast or downcast to this data type, which increases storage size or loses information. There already is adtype
parameter in theSlideData
constructor, but it's only used to assist theBioFormatsBackend
in loading images correctly. This repurposes that parameter to control what dtype h5py uses when writing image data.I also changed masks to stored as ENUM and use the strongest compression setting as boolean masks are highly compressible and easily compressed. The compression made a huge difference in file size, and using (HDFView)[https://www.hdfgroup.org/downloads/hdfview/] showed a compression ratio of 100-200x for masks. The ENUM data type is stored as an 8-bit integer (https://docs.h5py.org/en/stable/special.html#enumerated-types) but at least this is less than using
float16
.