Inplace conversion to Starfish format seems slow #1388

njmei · 2019-06-03T19:07:38Z

A 2 round, 5 channel, 29 fov, 44 zplane mini-experiment takes about 30 minutes to convert to starfish format. This is likely to be problematic for our pipeline especially if we want to be converting + analyzing data on a per round basis as soon as it comes off the microscope.

It seems like the majority of time is still being taken up by write calls by slicedimage even for inplace conversion.

zipped cProfile output file:
starfish_conversion.prof.tar.gz

Visualization with snakeviz:

CC: @berl

The text was updated successfully, but these errors were encountered:

ttung · 2019-06-03T19:34:52Z

What does format_survey_data.py do?

njmei · 2019-06-03T19:52:54Z

Basically it reads in our metadata format and organizes it so that starfish can query it to fetch tiles. Then it calls write_irregular_experiment_json. The source for it is:

def format_survey_data(input_manifest_path: Path, output_dir: Path,
                       aux_probes: set = {'DAPI'}):
    survey_manifest = SurveyManifest(manifest_path=input_manifest_path,
                                     aux_probes=aux_probes)

    aux_tfs = {aux_type: SurveyAuxTileFetcher(survey_manifest, aux_type)
               for aux_type in aux_probes}
    tile_fetchers = {'primary': SurveyPrimaryTileFetcher(survey_manifest),
                     **aux_tfs}

    inplace.enable_inplace_mode()

    write_irregular_experiment_json(
        path=str(output_dir.resolve()),
        tile_format=ImageFormat.TIFF,
        image_tile_coordinates=survey_manifest.expt_dims,
        tile_fetchers=tile_fetchers,
        postprocess_func=add_codebook,
        fov_path_generator=fov_path_generator,
        tile_opener=inplace.inplace_tile_opener
    )

njmei · 2019-06-03T20:16:26Z

It's totally possible that conversion is as fast as it can be. I'm just curious whether it could be faster. So I should probably change the subject name.

ttung · 2019-06-03T21:56:17Z

I verified that your tile fetchers don't need to return any sane data. That should shave 900+ seconds from your profile. As long as it returns correctly shaped data, the actual data is discarded.

I'm codifying this expectation into a test.

njmei · 2019-06-03T22:01:31Z

Oh, so instead of the tile_data FetchedTile method returning skimage.io.imread(path_of_tiff) (like it currently does) just have it return a dummy of some sort (e.g. np.empty())?

ttung · 2019-06-03T22:11:33Z

Yes, any np array of the correct shape and dtype should work.

njmei · 2019-06-03T23:02:07Z

Hmm... I wonder if that will give significant time savings. Even if you pass bogus data, it will still get sha256 hashed num_rounds * num_fovs * num_chs times.

ttung · 2019-06-03T23:06:20Z

I am somewhat skeptical that the extra sha computation will cost that much time. It's far more likely that the extra encoding into tiff costs way more time.

ttung · 2019-06-03T23:07:30Z

In any case, can you just give it a whirl? Should be an easy change. :)

njmei · 2019-06-03T23:51:43Z

Bizarrely, things got worse (swear I wasn't running or doing anything else!)?

Here's what I did:

    @functools.lru_cache(maxsize=1)
    def tile_data(self) -> np.ndarray:
        return np.zeros((self.tile.shape.y, self.tile.shape.x))

zipped cProfile output file:
starfish_conversion_2.prof.tar.gz

snakeviz:

format_survey_data.py:62 is just:

    @property
    def sha256(self) -> str:
        hasher = hashlib.sha256()
        with open(self.tile.full_filepath, 'rb') as fh:
            hasher.update(fh.read())
        return hasher.hexdigest()

… rely on TileFetcher data When we construct an experiment in-place, the source data comes from the tiles already on disk. Therefore, whatever image data we provide the TileFetcher that we pass into `write_experiment_json` should be discarded. This verifies that the data from the tiles on the disk are what gets constituted into the experiment, and that the data returned by TileFetcher is ignored. Test plan: passed the test. Fixes #1388

… rely on TileFetcher data (#1389) When we construct an experiment in-place, the source data comes from the tiles already on disk. Therefore, whatever image data we provide the TileFetcher that we pass into `write_experiment_json` should be discarded. This verifies that the data from the tiles on the disk are what gets constituted into the experiment, and that the data returned by TileFetcher is ignored. Test plan: passed the test. Fixes #1388

njmei · 2019-06-19T16:27:31Z

@ttung Hey Tony, I had tried your suggestion, but it did not appear to make things any faster...

see: #1388 (comment)

ttung · 2019-06-19T18:36:14Z

Is this the same data set? I'm a bit suspicious since the imsave portion should be the same, but the new profile is 670s while the old profile is only 339s.

njmei · 2019-06-19T19:22:34Z

Yeah exact same dataset. Very odd, I know...

shanaxel42 assigned ttung Jun 3, 2019

shanaxel42 added this to the 0.2.0 milestone Jun 3, 2019

shanaxel42 added the bug An issue with an existing feature label Jun 3, 2019

njmei changed the title ~~Inplace conversion to Starfish format is slow~~ Inplace conversion to Starfish format seems slow Jun 3, 2019

ttung mentioned this issue Jun 3, 2019

Codify the expectation that in-place experiment construction does not rely on TileFetcher data #1389

Merged

ttung closed this as completed in #1389 Jun 19, 2019

ttung mentioned this issue Jun 24, 2019

parallelize tile writing #1420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inplace conversion to Starfish format seems slow #1388

Inplace conversion to Starfish format seems slow #1388

njmei commented Jun 3, 2019 •

edited

Loading

ttung commented Jun 3, 2019

njmei commented Jun 3, 2019 •

edited by neuromusic

Loading

njmei commented Jun 3, 2019

ttung commented Jun 3, 2019

njmei commented Jun 3, 2019 •

edited

Loading

ttung commented Jun 3, 2019

njmei commented Jun 3, 2019 •

edited

Loading

ttung commented Jun 3, 2019

ttung commented Jun 3, 2019

njmei commented Jun 3, 2019 •

edited

Loading

njmei commented Jun 19, 2019 •

edited

Loading

ttung commented Jun 19, 2019

njmei commented Jun 19, 2019

Inplace conversion to Starfish format seems slow #1388

Inplace conversion to Starfish format seems slow #1388

Comments

njmei commented Jun 3, 2019 • edited Loading

ttung commented Jun 3, 2019

njmei commented Jun 3, 2019 • edited by neuromusic Loading

njmei commented Jun 3, 2019

ttung commented Jun 3, 2019

njmei commented Jun 3, 2019 • edited Loading

ttung commented Jun 3, 2019

njmei commented Jun 3, 2019 • edited Loading

ttung commented Jun 3, 2019

ttung commented Jun 3, 2019

njmei commented Jun 3, 2019 • edited Loading

njmei commented Jun 19, 2019 • edited Loading

ttung commented Jun 19, 2019

njmei commented Jun 19, 2019

njmei commented Jun 3, 2019 •

edited

Loading

njmei commented Jun 3, 2019 •

edited by neuromusic

Loading

njmei commented Jun 3, 2019 •

edited

Loading

njmei commented Jun 3, 2019 •

edited

Loading

njmei commented Jun 3, 2019 •

edited

Loading

njmei commented Jun 19, 2019 •

edited

Loading