Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce initialization time for plates #128

Merged
merged 1 commit into from
May 17, 2024

Conversation

melissalinkert
Copy link
Member

Fixes #124.

This iterates over the plate map once to find the paths corresponding to each series index, instead of looking at the plate map once per series. For plates mentioned in #124, I see that total conversion time with this change is faster than just the initialization time (before progress bars appear) with 0.7.0.

This iterates over the plate map once to find the paths corresponding
to each series index, instead of looking at the plate map once per series.
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with one of the BBBC017 plates mentioned in the original issue and converting it to OME-Zarr using bioformats2raw 0.9.2. The plate was the converted to OME-TIFF using

raw2ometiff NIRHTa-001.zarr NIRHTa-001.ome.tiff --log-level INFO > conversion.log 2>&1

both with raw2ometiff 0.7.0 and raw2ometiff 0.8.0-SNAPSHOT built locally with this PR included.

In both cases, the conversion completed successfully and the writing time was comparable between both versions

sbesson@Sebastiens-MacBook-Pro-3 Downloads % grep writeIFD conversion*             
conversion_0.7.0.log:2024-04-17 11:40:02,894 [main] INFO  org.perf4j.TimingLogger - start[1713350373140] time[29755] tag[writeIFDs]
conversion_0.8.0-SNAPSHOT.log:2024-04-17 10:56:36,699 [main] INFO  org.perf4j.TimingLogger - start[1713347768644] time[28054] tag[writeIFDs]
sbesson@Sebastiens-MacBook-Pro-3 Downloads % grep convert conversion*             
conversion_0.7.0.log:2024-04-17 11:40:02,894 [main] INFO  org.perf4j.TimingLogger - start[1713350318725] time[84170] tag[convertToPyramid]
conversion_0.8.0-SNAPSHOT.log:2024-04-17 10:56:36,699 [main] INFO  org.perf4j.TimingLogger - start[1713347712019] time[84680] tag[convertToPyramid]

What changes significantly with this PR is the initialization time which dropped from 40min to 10s

sbesson@Sebastiens-MacBook-Pro-3 Downloads % grep initialize conversion*   
conversion_0.7.0.log:2024-04-17 11:38:38,724 [main] INFO  org.perf4j.TimingLogger - start[1713347834359] time[2484365] tag[initialize]
conversion_0.8.0-SNAPSHOT.log:2024-04-17 10:55:12,018 [main] INFO  org.perf4j.TimingLogger - start[1713347702721] time[9297] tag[initialize]

Looking at the associated changes, the previous code was performing a plate lookup once per series which had no particular impact on the digital pathology use case where typically 1-3 series will exist but was causing significant performance issues in the HCS use case where the number of series will be typically in the 1-10K range or even higher.
The proposed changes should only perform the lookup once and satisfy the requirements of both domains while discovering the number of resolutions associated with each series to convert.

@sbesson sbesson merged commit deaa3ea into glencoesoftware:master May 17, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initialize function performance issues on HCS datasets
3 participants