Use (or generalize) cloud_ilastik architecture for my implementations of scalable algorithms #70

constantinpape · 2020-04-27T13:07:48Z

To really get the discussion going that we started a few weeks ago:
It would be beneficial to re-use the cloud_ilastik architecture for running jobs on different target systems (local, slurm, etc.) for my implementations of scalable (3D) segmentation and image analysis algorithms, currently available here https://github.com/constantinpape/cluster_tools.

Briefly, my current implementation has three issues:

To implement a task for a given target, I use a mixin pattern. E.g. to implement an ilastik slurm prediction task, this would look like class IlastikPredictionSlurm(IlastikPredictionBase, SlurmTask), see this for details. This approach has the mdrawback that it does not scale well to new computation targets because for each existing task one needs to define a new mixin subclass.
Monitoring and logging are convoluted (it's fine for me, because I know what's happening, but it's not easily usable for anyone else). This is not really tied to 1, but it would be great to implement a clean solution once and re-use it.
Re-running a partially failed job is very cumbersome and it's usually easier to delete the (intermediate) result and rerun the whole job.

The advantages of using the cloud_ilastik implementation: 1. is solved more elegantly already.
I don't know how/if you have tackled 2 and 3 already, but at least moving to a more common code-base would decrease redundant work. Also, this would allow cloud_ilastik to use the scalable algorithms I have implemented already.

This came up in the context of our more recent project for processing high-throughput screening data, where @Tomaz-Vieira had a closer look at the implementation: sciai-lab/batchlib#5. Since then, I have simplified the design, because we don't really need a multi-target solution. But in general this issue is relevant for batch processing of 2d image as well. Also, for this project I have implemented a solution for issue 3 that works well for images and could probably be extended to nD chunked data, see this for details.

More concretely, the questions I would like to explore:

How can we integrate cloud_ilastik and the algorithms in cluster_tools? Can I just use cloud_ilastik as is or is it better to implement a common parent library?
Are there existing solutions / libraries we can offload some work to? (I will open a follow-up issue on this soon.)

The text was updated successfully, but these errors were encountered:

constantinpape mentioned this issue Apr 27, 2020

Related projects #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use (or generalize) cloud_ilastik architecture for my implementations of scalable algorithms #70

Use (or generalize) cloud_ilastik architecture for my implementations of scalable algorithms #70

constantinpape commented Apr 27, 2020

Use (or generalize) cloud_ilastik architecture for my implementations of scalable algorithms #70

Use (or generalize) cloud_ilastik architecture for my implementations of scalable algorithms #70

Comments

constantinpape commented Apr 27, 2020