-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inplace conversion to Starfish format seems slow #1388
Comments
What does |
Basically it reads in our metadata format and organizes it so that starfish can query it to fetch tiles. Then it calls def format_survey_data(input_manifest_path: Path, output_dir: Path,
aux_probes: set = {'DAPI'}):
survey_manifest = SurveyManifest(manifest_path=input_manifest_path,
aux_probes=aux_probes)
aux_tfs = {aux_type: SurveyAuxTileFetcher(survey_manifest, aux_type)
for aux_type in aux_probes}
tile_fetchers = {'primary': SurveyPrimaryTileFetcher(survey_manifest),
**aux_tfs}
inplace.enable_inplace_mode()
write_irregular_experiment_json(
path=str(output_dir.resolve()),
tile_format=ImageFormat.TIFF,
image_tile_coordinates=survey_manifest.expt_dims,
tile_fetchers=tile_fetchers,
postprocess_func=add_codebook,
fov_path_generator=fov_path_generator,
tile_opener=inplace.inplace_tile_opener
) |
It's totally possible that conversion is as fast as it can be. I'm just curious whether it could be faster. So I should probably change the subject name. |
I verified that your tile fetchers don't need to return any sane data. That should shave 900+ seconds from your profile. As long as it returns correctly shaped data, the actual data is discarded. I'm codifying this expectation into a test. |
Oh, so instead of the |
Yes, any np array of the correct shape and dtype should work. |
Hmm... I wonder if that will give significant time savings. Even if you pass bogus data, it will still get sha256 hashed |
I am somewhat skeptical that the extra sha computation will cost that much time. It's far more likely that the extra encoding into tiff costs way more time. |
In any case, can you just give it a whirl? Should be an easy change. :) |
Bizarrely, things got worse (swear I wasn't running or doing anything else!)? Here's what I did:
zipped cProfile output file: format_survey_data.py:62 is just:
|
… rely on TileFetcher data When we construct an experiment in-place, the source data comes from the tiles already on disk. Therefore, whatever image data we provide the TileFetcher that we pass into `write_experiment_json` should be discarded. This verifies that the data from the tiles on the disk are what gets constituted into the experiment, and that the data returned by TileFetcher is ignored. Test plan: passed the test. Fixes #1388
… rely on TileFetcher data (#1389) When we construct an experiment in-place, the source data comes from the tiles already on disk. Therefore, whatever image data we provide the TileFetcher that we pass into `write_experiment_json` should be discarded. This verifies that the data from the tiles on the disk are what gets constituted into the experiment, and that the data returned by TileFetcher is ignored. Test plan: passed the test. Fixes #1388
@ttung Hey Tony, I had tried your suggestion, but it did not appear to make things any faster... see: #1388 (comment) |
Is this the same data set? I'm a bit suspicious since the imsave portion should be the same, but the new profile is 670s while the old profile is only 339s. |
Yeah exact same dataset. Very odd, I know... |
A 2 round, 5 channel, 29 fov, 44 zplane mini-experiment takes about 30 minutes to convert to starfish format. This is likely to be problematic for our pipeline especially if we want to be converting + analyzing data on a per round basis as soon as it comes off the microscope.
It seems like the majority of time is still being taken up by
write
calls byslicedimage
even for inplace conversion.zipped cProfile output file:
starfish_conversion.prof.tar.gz
Visualization with snakeviz:
CC: @berl
The text was updated successfully, but these errors were encountered: