Add image segmentation docs (#1821)

Added tutorials and how-tos related to cell segmentation
spacetx · Mar 12, 2020 · 1a47383 · 1a47383
1 parent 3a2c0b5
commit 1a47383
Show file tree

Hide file tree

Showing 12 changed files with 583 additions and 25 deletions.
diff --git a/docs/source/creating_an_image_processing_pipeline/index.rst b/docs/source/creating_an_image_processing_pipeline/index.rst
@@ -31,8 +31,8 @@ and running it on an experiment.
 Loading Data
 ------------
 
-* :ref:`Formatting your data <data_conversion_examples>`
-* :ref:`Using formatted example data <datasets>`
+* Tutorial: :ref:`Formatting your data <data_conversion_examples>`
+* Tutorial: :ref:`Using formatted example data <datasets>`
 
 .. _section_manipulating_images:
 
@@ -43,8 +43,8 @@ Sometimes it can be useful subset the images by, for example, excluding out-of-f
 cropping out edge effects. For sparse data, it can be useful to project the z-volume into a single
 image, as this produces a much faster processing routine.
 
-* :ref:`Cropping <tutorial_cropping>`
-* :ref:`Projecting <tutorial_projection>`
+* Tutorial: :ref:`Cropping <tutorial_cropping>`
+* Tutorial: :ref:`Projecting <tutorial_projection>`
 
 .. _section_correcting_images:
 
@@ -56,11 +56,11 @@ handling or microfluidices that are involved in capturing the images. These step
 *independent* of the assay. *Starfish* enables the user to design a pipeline that matches their
 imaging system
 
-* :ref:`Illumination Correction <tutorial_illumination_correction>`
-* :ref:`Chromatic Aberration <tutorial_chromatic_aberration>`
-* :ref:`Deconvolution <tutorial_deconvolution>`
-* :ref:`Image Registration <tutorial_image_registration>`
-* :ref:`Image Correction Pipeline <tutorial_image_correction_pipeline>`
+* Tutorial: :ref:`Illumination Correction <tutorial_illumination_correction>`
+* Tutorial: :ref:`Chromatic Aberration <tutorial_chromatic_aberration>`
+* Tutorial: :ref:`Deconvolution <tutorial_deconvolution>`
+* Tutorial: :ref:`Image Registration <tutorial_image_registration>`
+* Tutorial: :ref:`Image Correction Pipeline <tutorial_image_correction_pipeline>`
 
 .. _section_improving_snr:
 
@@ -72,7 +72,7 @@ some level of autofluorescence which causes cellular compartments to have more b
 intracellular regions. This can confound spot finders, which look for local intensity differences.
 These approaches ameliorate these problems.
 
-* :ref:`Removing Autofluorescence <tutorial_removing_autoflourescence>`
+* Tutorial: :ref:`Removing Autofluorescence <tutorial_removing_autoflourescence>`
 
 .. _section_normalizing_intensities:
 
@@ -116,22 +116,28 @@ How to normalize
 How to normalize depends on your data and a key assumption. There are two approaches for
 normalizing images in starfish:
 
-:ref:`Normalizing Intensity Distributions<tutorial_normalizing_intensity_distributions>`
+Normalizing Intensity Distributions
+"""""""""""""""""""""""""""""""""""
 
 If you know a priori that image volumes acquired for every channel and/or every round should have
 the same distribution of intensities then the intensity *distributions* of image volumes can be
 normalized with :py:class:`.MatchHistograms`. Typically this means the number of spots and amount of
 background autofluorescence in every image volume is approximately uniform across channels and/or
 rounds.
 
-:ref:`Normalizing Intensity Values <tutorial_normalizing_intensity_values>`
+* Tutorial: :ref:`Normalizing Intensity Distributions<tutorial_normalizing_intensity_distributions>`
+
+Normalizing Intensity Values
+""""""""""""""""""""""""""""
 
 In most data sets the differences in gene expression leads to too much variation in number of
 spots between channels and rounds. Normalizing intensity distributions would incorrectly skew the
 intensities. Instead you can use :py:class:`.Clip`, :py:class:`.ClipPercentileToZero`, and
 :py:class:`.ClipValueToZero` to normalize intensity *values* by clipping extreme values and
 rescaling.
 
+* Tutorial: :ref:`Normalizing Intensity Values <tutorial_normalizing_intensity_values>`
+
 .. _section_finding_and_decoding:
 
 Finding and Decoding Spots
@@ -142,6 +148,88 @@ Finding and Decoding Spots
 Segmenting Cells
 ----------------
 
+Unlike single-cell RNA sequencing, image-based transcriptomics methods do not physically separate
+cells before acquiring RNA information. Therefore, in order to characterize cells, the RNA must be
+assigned into single cells by partitioning the image volume. Accurate unsupervised cell-segmentation
+is an `open problem <https://www.kaggle.com/c/data-science-bowl-2018>`_ for all biomedical imaging
+disciplines ranging from digital pathology to neuroscience.
+
+The challenge of segmenting cells depends on the structural complexity of the sample and quality
+of images available. For example, a sparse cell mono-layer with a strong cytosol stain would be
+trivial to segment but a dense heterogeneous population of cells in 3D tissue with only a DAPI stain
+can be impossible to segment perfectly. On the experimental side, selecting good cell stains and
+acquiring images with low background will make segmenting a more tractable task.
+
+There are many approaches for segmenting cells from image-based transcriptomics assays. Below are
+a few methods that are implemented or integrated with starfish to output a
+:py:class:`.BinaryMaskCollection`, which represents a collection of labeled objects. If you do not
+know which segmentation method to use, a safe bet is to start with thresholding and watershed. On
+the other hand, if you can afford to manually define ROI masks there is no better way to
+guarantee accurate segmentation.
+
+.. note::
+    While there is no "ground truth" for cell segmentation, the closest approximation is manual
+    segmentation by an expert in the tissue of interest.
+
+Thresholding and Watershed
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The traditional method for segmenting cells in fluorescence microscopy images is to threshold the
+image into foreground pixels and background pixels and then label connected foreground as
+individual cells. Common issues that affect thresholding such as background noise can be corrected
+by preprocessing images before thresholding and filtering connected components after. There are
+`many automated image thresholding algorithms <https://imagej.net/Thresholding>`_ but currently
+starfish requires manually selecting a global threshold value in :py:class:`.ThresholdBinarize`.
+
+When overlapping cells are labeled as one connected component, they are typically segmented by
+using a `distance transformation followed by the watershed algorithm <https://www.mathworks
+.com/company/newsletters/articles/the-watershed-transform-strategies-for-image-segmentation
+.html>`_. Watershed is a classic image processing algorithm for separating objects in images and
+can be applied to all types of images. Pairing it with a distance transform is particularly
+useful for segmenting convex shapes like cells.
+
+A segmentation pipeline that consists of thresholding, connected component analysis, and watershed
+is simple and fast to implement but its accuracy is highly dependent on image quality.
+The signal-to-noise ratio of the cell stain must be high enough for minimal errors after
+thresholding and binary operations. And the nuclei or cell shapes must be convex to meet the
+assumptions of the distance transform or else it will over-segment. Starfish includes the basic
+functions to build a watershed segmentation pipeline and a predefined :py:class:`.Watershed`
+segmentation class that uses the :term:`primary images<Primary Images>` as the cell stain.
+
+* Tutorial: :ref:`Ways to segment by thresholding and watershed<tutorial_watershed_segmentation>`
+
+Manually Defining Cells
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The most accurate but time-consuming approach is to manually segment images using a tool such as
+`ROI manager <https://imagej.net/docs/guide/146-30.html#fig:The-ROI-Manager>`_ in FIJI (ImageJ). It
+is a straightforward process that starfish supports by allowing ROI set binaries to be imported as a
+:py:class:`.BinaryMaskCollection`. These masks can then be integrated into the pipeline for
+visualization and assigning spots to cells.
+
+* Tutorial: :ref:`Loading ImageJ ROI set<tutorial_manual_segmentation>`
+
+Machine-Learning Methods
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Besides the two classic cell segmentation approaches mentioned above, there are machine-learning
+methods that aim to replicate the accuracy of manual cell segmentation while reducing the labor
+required. Machine-learning algorithms for segmentation are continually improving but there is no
+perfect solution for all image types yet. These methods require training data (e.g. stained
+images with manually defined labels) to train a model to predict cell or nuclei locations in test
+data. There are `exceptions that don't require training on your specific data <http://www.cellpose
+.org/>`_ but generally training the model is something to consider when evaluating how much time
+each segmentation approach will require.
+
+Starfish currently has built-in functionality to support `ilastik <https://www.ilastik.org/>`_, a
+segmentation toolkit that leverages machine-learning. Ilastik has a Pixel Classification
+workflow that performs semantic segmentation of the image, returning probability maps for each
+label such as cells and background. To transform the images of pixel probabilities to binary
+masks, you can use the same thresholding and watershed methods in starfish that are used for
+segmenting images of stained cells.
+
+* Tutorial: :ref:`Using ilastik in starfish<tutorial_ilastik_segmentation>`
+
 .. _section_assigning_spots:
 
 Assigning Spots to Cells
@@ -170,4 +258,3 @@ origin. At this point, it's trivial to create a cell x gene matrix.
 * :ref:`Spot Decoding <tutorial_spot_decoding>`
 * :ref:`Segmenting Cells <tutorial_segmenting_cells>`
 * :ref:`Assigning Spots to Cells <tutorial_assigning_spots_to_cells>`
-
diff --git a/examples/RoiSet.zip b/examples/RoiSet.zip
diff --git a/examples/dapi_Probabilities.h5 b/examples/dapi_Probabilities.h5
diff --git a/examples/ilastik_segmentation.py b/examples/ilastik_segmentation.py
@@ -0,0 +1,146 @@
+"""
+.. _tutorial_ilastik_segmentation:
+
+Using ilastik in starfish
+=========================
+
+In order to create a cell by gene expression matrix from image-based transcriptomics data, RNA
+spots must be assigned to cells. Segmenting a microscopy image into single cells is either
+time-consuming when done manually or inaccurate when done with thresholding and watershed. The
+advent of deep learning models for image segmentation promises a solution that is both fast and
+accurate.
+
+Starfish currently has built-in functionality to support `ilastik <https://www.ilastik.org/>`_,
+a segmentation toolkit that leverages machine-learning. Ilastik has a `Pixel Classification
+workflow <https://www.ilastik.org/documentation/pixelclassification/pixelclassification/>`_
+that performs semantic segmentation of the image, returning probability maps for each
+label (e.g. cells). To use ilastik in starfish first install ilastik locally and follow the
+Pixel Classification workflow using the GUI to train a classifier, which is saved in an ilastik
+project file.
+
+There are two options for using ilastik in starfish. The first is running the trained pixel
+classifier on images outside of starfish to generate a probability map that can then be loaded in
+starfish as an :py:class:`.ImageStack`. The second is to run the pixel classifier from within
+starfish by calling out to ilastik and passing it the trained classifier. The results will be the
+same.
+
+The probability map is an nd-array with the same size as the input :py:class:`.ImageStack`. The
+value of each element in the nd-array equals the probability of the corresponding pixel to belong to
+a label. In ilastik, you can train the pixel classifier to predict the probability for multiple
+labels such as nuclei, cells, and background. However, starfish will only load the probability
+map of the first label so you should use the first label for cells or nuclei only.
+
+This tutorial will first describe how you can run a trained ilastik classifier from within a
+starfish process. In the second section this tutorial will demonstrate how to load an ilastik
+probability map hdf5 file as an :py:class:`.ImageStack`. The last section transforms the
+:py:class:`.ImageStack` into a :py:class:`BinaryMaskCollection` that can be used by
+:py:class:`.Label` to assign spots to cells.
+"""
+
+###################################################################################################
+# Calling Out to ilastik Trained Pixel Classifier
+# ===============================================
+#
+# Running a trained ilastik pixel classifier from within starfish allows a pipeline to take
+# advantage of ilastik's machine-learning model without having to process images from every field
+# of view, switching to ilastik to run classifier, exporting all the probability maps, and then
+# loading the files back into a starfish pipeline.
+#
+# The requirements for calling out to ilastik from a starfish pipeline is having ilastik
+# installed locally and having an ilastik project that contains a trained classifier. The
+# classifier can only be run on an :py:class:`ImageStack` with a single round and channel.
+# Obviously, the :py:class:`ImageStack` should be an image of the same type that the classifier
+# was trained on for expected probability results.
+
+# Load MERFISH data to get dapi ImageStack
+import os
+import matplotlib
+import matplotlib.pyplot as plt
+
+import starfish.data
+from starfish.image import Filter
+from starfish.types import Axes, Coordinates, Levels
+
+matplotlib.rcParams["figure.dpi"] = 150
+
+experiment = starfish.data.MERFISH()
+fov = experiment["fov_000"]
+dapi = fov.get_image("nuclei")
+
+# Process dapi images by blurring and clipping
+def preprocess(dapi):
+    blur = Filter.GaussianLowPass(sigma=5)
+    blurred = blur.run(dapi)
+    clip = Filter.Clip(p_min=1, p_max=95, level_method=Levels.SCALE_BY_CHUNK)
+    clipped = clip.run(blurred)
+    return clipped
+
+dapi = preprocess(dapi)
+
+# Need ilastik script and ilastik project with trained classifier to instantiate filter
+ilastik_exe_path = os.path.join(os.path.dirname("__file__"), 'run_ilastik.sh')
+ilastik_proj_path = os.path.join(os.path.dirname("__file__"), 'merfish_dapi.ilp')
+ipp = Filter.IlastikPretrainedProbability(ilastik_executable=ilastik_exe_path, ilastik_project=ilastik_proj_path)
+
+# Run IlastikPretrainedProbability filter to get probabilities of 'Label 1' as ImageStack
+# probabilities = ipp.run(stack=dapi)
+
+###################################################################################################
+# Loading ilastik Probability Map
+# ===============================
+#
+# If you already have probability maps of your images from ilastik or prefer to use the ilastik
+# GUI it is possible to load the exported hdf5 files into starfish. When using the 'Prediction
+# Export' panel in ilastik be sure to use ``Source: Probabilities`` and ``Format: hdf5``. If you
+# edit the Dataset name then it must also be passed as ``dataset_name`` to
+# :py:meth:`.import_ilastik_probabilities`. The example below loads the same probability map that
+# would have been generated in the first section of this tutorial.
+
+# Need to instantiate Filter even though it is not run
+ilastik_exe_path = os.path.join(os.path.dirname("__file__"), 'run_ilastik.sh')
+ilastik_proj_path = os.path.join(os.path.dirname("__file__"), 'merfish_dapi.ilp')
+ipp = Filter.IlastikPretrainedProbability(ilastik_executable=ilastik_exe_path, ilastik_project=ilastik_proj_path)
+
+# Load hdf5 file as ImageStack
+h5_file_path = os.path.join(os.path.dirname("__file__"), 'dapi_Probabilities.h5')
+imported_probabilities = ipp.import_ilastik_probabilities(path_to_h5_file=h5_file_path)
+
+# View probability map next to dapi image
+f, (ax1, ax2) = plt.subplots(ncols=2)
+ax1.imshow(dapi.xarray.values.squeeze())
+ax1.set_title("Dapi")
+ax2.imshow(imported_probabilities.xarray.values.squeeze())
+ax2.set_title("Probabilities")
+f.tight_layout()
+
+###################################################################################################
+# Transforming Probability Map to Masks
+# =====================================
+#
+# In order to use ilastik semantic segmentation probabilities to assign spots to cells in
+# starfish they must be transformed into binary cell masks. A straightforward approach to
+# transforming the :py:class:`.ImageStack` to a :py:class:`.BinaryMaskCollection` is to threshold
+# the probabilities, run connected component analysis, and watershed. This process is analogous to
+# segmenting a stained cell image.
+
+# Scale probability map
+clip = Filter.Clip(p_min=0, p_max=100, level_method=Levels.SCALE_BY_CHUNK)
+clipped_probabilities = clip.run(imported_probabilities)
+
+# Threshold, connected components, watershed, and filter by area
+from starfish.morphology import Binarize, Filter
+prob_thresh = 0.05
+min_dist = 100
+min_allowed_size = 5000
+max_allowed_size = 100000
+binarized_probabilities = Binarize.ThresholdBinarize(prob_thresh).run(clipped_probabilities)
+labeled_masks = Filter.MinDistanceLabel(min_dist, 1).run(binarized_probabilities)
+masks = Filter.AreaFilter(min_area=min_allowed_size, max_area=max_allowed_size).run(labeled_masks)
+
+# Show probability map along with cell masks
+f, (ax1, ax2) = plt.subplots(ncols=2)
+ax1.imshow(imported_probabilities.xarray.values.squeeze())
+ax1.set_title("Probabilities")
+ax2.imshow(labeled_masks.to_label_image().xarray.values.squeeze(), cmap=plt.cm.nipy_spectral)
+ax2.set_title("Masks")
+f.tight_layout()
diff --git a/examples/manual_segmentation.py b/examples/manual_segmentation.py
@@ -0,0 +1,72 @@
+"""
+.. _tutorial_manual_segmentation:
+
+Loading ImageJ ROI Sets
+=======================
+
+.. note::
+    Starfish only supports importing 2D ROIs.
+
+In order to create a cell by gene expression matrix from image-based transcriptomics data, RNA
+spots must be assigned to cells by segmenting an image. The best quality cell segmentation
+annotations are manually drawn by experts. If you have ROI sets exported with `ROI manager
+<https://imagej.net/docs/guide/146-30.html#fig:The-ROI-Manager>`_ in ImageJ or FIJI, they can be
+loaded into starfish as a :py:class:`.BinaryMaskCollection`. The ROI set for each field of view
+must be passed with the corresponding :py:class:`.ImageStack` to :py:meth:`.from_fiji_roi_set` in
+order to assign accurate ``pixel ticks`` and ``physical ticks`` to the
+:py:class:`.BinaryMaskCollection`.
+
+This tutorial demonstrates how to load an ROI set that was exported form ROI manager as a .zip
+and how to plot the resulting :py:class:`.BinaryMaskCollection`.
+
+Brief segmentation workflow in ImageJ ROI Manager:
+
+* Tools > ROI Manager
+* Click "polygon selection" (third button from left on GUI)
+* Create a polygon, then click the "+" button to finalize it
+* Repeat until segmented
+* Click "more >>>"
+* Click "save"
+* Save the ROISet
+"""
+
+import matplotlib
+import matplotlib.pyplot as plt
+import os
+
+import starfish.data
+from starfish import BinaryMaskCollection
+from starfish.image import Filter
+from starfish.types import Levels
+
+matplotlib.rcParams["figure.dpi"] = 150
+
+# Load MERFISH data to get dapi ImageStack
+experiment = starfish.data.MERFISH()
+fov = experiment["fov_000"]
+dapi = fov.get_image("nuclei")  # nuclei
+
+
+# Preprocess MERFISH dapi imagestack
+def preprocess(dapi):
+    blur = Filter.GaussianLowPass(sigma=5)
+    blurred = blur.run(dapi)
+
+    clip = Filter.Clip(p_min=1, p_max=95, level_method=Levels.SCALE_BY_CHUNK)
+    clipped = clip.run(blurred)
+    return clipped
+
+
+dapi = preprocess(dapi)
+
+# Import RoiSet.zip as BinaryMaskCollection
+roi_path = os.path.join(os.path.dirname("__file__"), 'RoiSet.zip')
+masks = BinaryMaskCollection.from_fiji_roi_set(path_to_roi_set_zip=roi_path, original_image=dapi)
+
+# display BinaryMaskCollection next to original dapi image
+f, (ax1, ax2) = plt.subplots(ncols=2)
+ax1.imshow(dapi.xarray.values.squeeze())
+ax1.set_title("Dapi")
+ax2.imshow(masks.to_label_image().xarray.values.squeeze(), cmap=plt.cm.nipy_spectral)
+ax2.set_title("Imported ROI")
+f.tight_layout()
diff --git a/examples/merfish_dapi.ilp b/examples/merfish_dapi.ilp
diff --git a/examples/run_ilastik.sh b/examples/run_ilastik.sh
@@ -0,0 +1 @@
+THIS IS A FAKE SHELL SCRIPT INTENDED TO TRICK :py:class:`.IlastikPretrainedProbability`
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		THIS IS A FAKE SHELL SCRIPT INTENDED TO TRICK :py:class:`.IlastikPretrainedProbability`