Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize loading from S3 #677

Closed
tischi opened this issue Apr 23, 2022 · 13 comments
Closed

Optimize loading from S3 #677

tischi opened this issue Apr 23, 2022 · 13 comments

Comments

@tischi
Copy link
Contributor

tischi commented Apr 23, 2022

Bottlenecks:

MergedGridSource

MergedGridSource: this.type = Util.getTypeFromInterval( referenceSource.getSource( 0, 0 ) );

image

getMask()

MoBIEHelper: getMask():

image

@tischi
Copy link
Contributor Author

tischi commented Apr 23, 2022

Ideas:

  1. Add field for allSourceSameDimensions for the GridSources, and then do not load all of them.
  2. Already touch the sources during the loading such that the information is cached in a multi-threaded way.

tischi added a commit that referenced this issue Apr 23, 2022
…s not slow down initial loading too much, looks like a big improvement
@tischi
Copy link
Contributor Author

tischi commented Apr 23, 2022

Adding sourceAndConverter.getSpimSource().getSource( 0,0 ) ) to the initial loading seems to help a lot later and does not seem to harm too much while doing it.

Adding however Util.getTypeFromInterval( sourceAndConverter.getSpimSource().getSource( 0,0 ) ); to the initial opening also seems to help later, but also takes a lot longer during the initial loading. I think here the point is that we do not need to do this for all sources.

Question to @KateMoreva : Can we get possibly get rid of the "doesObjectExist()" call, which appears to take halve of the time?
image

@tischi
Copy link
Contributor Author

tischi commented Apr 23, 2022

See here: mobie/mobie-io#91

@constantinpape
Copy link
Contributor

Adding however Util.getTypeFromInterval( sourceAndConverter.getSpimSource().getSource( 0,0 ) ); to the initial opening also seems to help later, but also takes a lot longer during the initial loading. I think here the point is that we do not need to do this for all sources.

Is this to get the data type? If so, I think we probably only need to do this once, data types shouldn't be mixed in the same grid...

Question to @KateMoreva : Can we get possibly get rid of the "doesObjectExist()" call, which appears to take halve of the time?

Yeah, I think it's better to do EAFP (easier to ask for permission) rather than LBYL (look before you leap) here to improve performance when accessing things on s3.

@tischi
Copy link
Contributor Author

tischi commented Apr 23, 2022

Is this to get the data type? If so, I think we probably only need to do this once, data types shouldn't be mixed in the same grid...

I think I do it only once, but already that is very slow. This is a recurring issue and maybe something will change here at some point: imglib/imglib2#312. I'll keep on debugging here as much as I can.

I am making very good progress on other fronts, though. Things should get significantly faster now.

@constantinpape
Copy link
Contributor

I think I do it only once, but already that is very slow. This is a recurring issue and maybe something will change here at some point: imglib/imglib2#312. I'll keep on debugging here as much as I can.

If you just want to get the datatype you can also get it from the metadata, e.g. the .zattrs. That is probably much faster than doing it the imglib way.

@tischi
Copy link
Contributor Author

tischi commented Apr 23, 2022

If you just want to get the datatype you can also get it from the metadata, e.g. the .zattrs

Very good point...
The issue with imglib2 is that it actually needs an access to the image data, which loads a whole bunch of pixels, which are not needed.

But I am not sure how to implement your idea, because I need the imglib2 type variable in the end! I will have a think...

@constantinpape
Copy link
Contributor

But I am not sure how to implement your idea, because I need the imglib2 type variable in the end! I will have a think...

You'll probably need to write a function that translate the type you get from the metadata (which will be a string?!) to the corresponding imglib2 type.

@tischi
Copy link
Contributor Author

tischi commented Apr 23, 2022

Turns out that

Util.getTypeFromInterval( sourceAndConverter.getSpimSource().getSource( 0,0 ) );

was nonsense, because there is:

sourceAndConverter.getSpimSource().getType()

which apparently is already initialized and thus returns almost instantaneous instead of taking 500 ms 🥳 .

@tischi
Copy link
Contributor Author

tischi commented Apr 23, 2022

To better see what is happening, I worked a lot on the logging as well.

I also introduced a variable (currently 100ms), and I am only logging what takes longer than this, in order not to clutter the log window, makes sense?

Based on the few measurements below (this is from home, for my Mac, which has 4 CPU) I think I will keep 16 IO threads for now as the default.

Clearly, multi-threading during IO helps.

Any ideas how to do this more systematically?

Should we expose the numIOThreads in the UI?

1 IO threads

# MoBIE
 
Opening project: https://github.com/mobie/covid-if-project
 
Opening view: default
 
Opening (1/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0007.ome.zarr
Opening (3/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0000.ome.zarr
Opening (5/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0001.ome.zarr
Opening (9/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0005.ome.zarr
Opening (13/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0005.ome.zarr
Opening (17/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0003.ome.zarr
Opening (21/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0001.ome.zarr
Opening (25/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0004.ome.zarr
Opening (29/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0001.ome.zarr
Opening (33/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0001.ome.zarr
Opening (37/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0005.ome.zarr
Opening (41/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0007.ome.zarr
Opening (45/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0005.ome.zarr
Opening (49/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0003.ome.zarr
Opening (53/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE07_0006.ome.zarr
Opening (57/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE06_0004.ome.zarr
Opening (61/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE06_0008.ome.zarr
Opening (65/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0000.ome.zarr
Opening (69/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE06_0000.ome.zarr
Opening (73/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE06_0004.ome.zarr
Opening (77/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE07_0005.ome.zarr
Opening (81/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE07_0001.ome.zarr
Opening (85/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0003.ome.zarr
Opening (89/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE07_0007.ome.zarr
Opened 90 image(s)  in 108946 ms, using 1 thread(s).
 
Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0000/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0004/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0001/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0002/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0006/default.tsv
Fetched 18 tables(s)  in 7114 ms, using 1 thread(s).
 
Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0002/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0006/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0003/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0001/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0000/default.tsv
Fetched 18 tables(s)  in 6821 ms, using 1 thread(s).
 
Opened view: default, in 124170 ms.

4 IO threads

# MoBIE
 
Opening project: https://github.com/mobie/covid-if-project
 
Opening view: default
 
Opening (1/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0007.ome.zarr
Opening (3/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0000.ome.zarr
Opening (6/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0001.ome.zarr
Opening (9/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0005.ome.zarr
Opening (17/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0003.ome.zarr
Opening (33/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0001.ome.zarr
Opening (50/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0003.ome.zarr
Opening (65/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0000.ome.zarr
Opening (81/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE07_0001.ome.zarr
Opened 90 image(s)  in 30796 ms, using 4 thread(s).
 
Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0001/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0007/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0008/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0000/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0000/default.tsv
Fetched 18 tables(s)  in 958 ms, using 4 thread(s).
 
Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0001/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0000/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0007/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0007/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0008/default.tsv
Fetched 18 tables(s)  in 683 ms, using 4 thread(s).
 
Opened view: default, in 33467 ms.

16 IO threads

# MoBIE
 
Opening project: https://github.com/mobie/covid-if-project
 
Opening view: default
 
Opening (1/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0007.ome.zarr
Opening (9/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0005.ome.zarr
Opening (5/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0001.ome.zarr
Opening (3/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0000.ome.zarr
Opening (18/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0003.ome.zarr
Opening (33/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0001.ome.zarr
Opening (65/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0000.ome.zarr
Opened 90 image(s)  in 12171 ms, using 16 thread(s).
 
Opening (2/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0008/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0005/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0000/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0006/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0000/default.tsv
Fetched 18 tables(s)  in 1143 ms, using 16 thread(s).
 
Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0003/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0002/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0005/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0001/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0000/default.tsv
Fetched 18 tables(s)  in 900 ms, using 16 thread(s).
 
Opened view: default, in 15195 ms.

32 IO threads

# MoBIE
 
Opening project: https://github.com/mobie/covid-if-project
 
Opening view: default
 
Opening (1/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0007.ome.zarr
Opening (17/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE07_0003.ome.zarr
Opening (9/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0005.ome.zarr
Opening (5/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/cell_segmentation_WellE06_0001.ome.zarr
Opening (3/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/serumIgG_WellE07_0000.ome.zarr
Opening (34/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nucleus_segmentation_WellE06_0001.ome.zarr
Opening (49/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/marker_tophat_WellE07_0003.ome.zarr
Opening (65/90): https://s3.embl.de/covid-if-project/20200406_164555_328/images/ome-zarr/nuclei_WellE06_0000.ome.zarr
Opened 90 image(s)  in 10863 ms, using 32 thread(s).
 
Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0006/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0001/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0002/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE06_0008/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/cell_segmentation_WellE07_0008/default.tsv
Fetched 18 tables(s)  in 916 ms, using 32 thread(s).
 
Opening (1/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0005/default.tsv
Opening (5/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0001/default.tsv
Opening (3/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0008/default.tsv
Opening (9/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE06_0001/default.tsv
Opening (17/18): https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/nucleus_segmentation_WellE07_0004/default.tsv
Fetched 18 tables(s)  in 1316 ms, using 32 thread(s).
 
Read in 117 ms: https://raw.githubusercontent.com/mobie/covid-if-project/main/data/20200406_164555_328/tables/sites/default.tsv

Opened view: default, in 14683 ms.

@constantinpape
Copy link
Contributor

I also introduced a variable (currently 100ms), and I am only logging what takes longer than this, in order not to clutter the log window, makes sense?

Yeah, I think that makes sense.

Based on the few measurements below (this is from home, for my Mac, which has 4 CPU) I think I will keep 16 IO threads for now as the default.

16 threads sounds reasonable; in any case, I think it makes sense to have more threads than cores here since these things are usually not compute bound. I am not sure if there's a better way to determine the number of threads, so I think 16 sounds ok for now.

@constantinpape
Copy link
Contributor

I checked with the new MoBIE-beta now and it works much better already. I can load the full grid from s3 now in a reasonable amount of time.

@tischi
Copy link
Contributor Author

tischi commented May 17, 2022

I am closing this as I think we exhausted the options discussed in this issue.
There is probably more we can do but I would rather open a new issue.

@tischi tischi closed this as completed May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants