Skip to content

Commit

Permalink
Add image segmentation docs (#1821)
Browse files Browse the repository at this point in the history
Added tutorials and how-tos related to cell segmentation
  • Loading branch information
mattcai authored Mar 12, 2020
1 parent 3a2c0b5 commit 1a47383
Show file tree
Hide file tree
Showing 12 changed files with 583 additions and 25 deletions.
113 changes: 100 additions & 13 deletions docs/source/creating_an_image_processing_pipeline/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ and running it on an experiment.
Loading Data
------------

* :ref:`Formatting your data <data_conversion_examples>`
* :ref:`Using formatted example data <datasets>`
* Tutorial: :ref:`Formatting your data <data_conversion_examples>`
* Tutorial: :ref:`Using formatted example data <datasets>`

.. _section_manipulating_images:

Expand All @@ -43,8 +43,8 @@ Sometimes it can be useful subset the images by, for example, excluding out-of-f
cropping out edge effects. For sparse data, it can be useful to project the z-volume into a single
image, as this produces a much faster processing routine.

* :ref:`Cropping <tutorial_cropping>`
* :ref:`Projecting <tutorial_projection>`
* Tutorial: :ref:`Cropping <tutorial_cropping>`
* Tutorial: :ref:`Projecting <tutorial_projection>`

.. _section_correcting_images:

Expand All @@ -56,11 +56,11 @@ handling or microfluidices that are involved in capturing the images. These step
*independent* of the assay. *Starfish* enables the user to design a pipeline that matches their
imaging system

* :ref:`Illumination Correction <tutorial_illumination_correction>`
* :ref:`Chromatic Aberration <tutorial_chromatic_aberration>`
* :ref:`Deconvolution <tutorial_deconvolution>`
* :ref:`Image Registration <tutorial_image_registration>`
* :ref:`Image Correction Pipeline <tutorial_image_correction_pipeline>`
* Tutorial: :ref:`Illumination Correction <tutorial_illumination_correction>`
* Tutorial: :ref:`Chromatic Aberration <tutorial_chromatic_aberration>`
* Tutorial: :ref:`Deconvolution <tutorial_deconvolution>`
* Tutorial: :ref:`Image Registration <tutorial_image_registration>`
* Tutorial: :ref:`Image Correction Pipeline <tutorial_image_correction_pipeline>`

.. _section_improving_snr:

Expand All @@ -72,7 +72,7 @@ some level of autofluorescence which causes cellular compartments to have more b
intracellular regions. This can confound spot finders, which look for local intensity differences.
These approaches ameliorate these problems.

* :ref:`Removing Autofluorescence <tutorial_removing_autoflourescence>`
* Tutorial: :ref:`Removing Autofluorescence <tutorial_removing_autoflourescence>`

.. _section_normalizing_intensities:

Expand Down Expand Up @@ -116,22 +116,28 @@ How to normalize
How to normalize depends on your data and a key assumption. There are two approaches for
normalizing images in starfish:

:ref:`Normalizing Intensity Distributions<tutorial_normalizing_intensity_distributions>`
Normalizing Intensity Distributions
"""""""""""""""""""""""""""""""""""

If you know a priori that image volumes acquired for every channel and/or every round should have
the same distribution of intensities then the intensity *distributions* of image volumes can be
normalized with :py:class:`.MatchHistograms`. Typically this means the number of spots and amount of
background autofluorescence in every image volume is approximately uniform across channels and/or
rounds.

:ref:`Normalizing Intensity Values <tutorial_normalizing_intensity_values>`
* Tutorial: :ref:`Normalizing Intensity Distributions<tutorial_normalizing_intensity_distributions>`

Normalizing Intensity Values
""""""""""""""""""""""""""""

In most data sets the differences in gene expression leads to too much variation in number of
spots between channels and rounds. Normalizing intensity distributions would incorrectly skew the
intensities. Instead you can use :py:class:`.Clip`, :py:class:`.ClipPercentileToZero`, and
:py:class:`.ClipValueToZero` to normalize intensity *values* by clipping extreme values and
rescaling.

* Tutorial: :ref:`Normalizing Intensity Values <tutorial_normalizing_intensity_values>`

.. _section_finding_and_decoding:

Finding and Decoding Spots
Expand All @@ -142,6 +148,88 @@ Finding and Decoding Spots
Segmenting Cells
----------------

Unlike single-cell RNA sequencing, image-based transcriptomics methods do not physically separate
cells before acquiring RNA information. Therefore, in order to characterize cells, the RNA must be
assigned into single cells by partitioning the image volume. Accurate unsupervised cell-segmentation
is an `open problem <https://www.kaggle.com/c/data-science-bowl-2018>`_ for all biomedical imaging
disciplines ranging from digital pathology to neuroscience.

The challenge of segmenting cells depends on the structural complexity of the sample and quality
of images available. For example, a sparse cell mono-layer with a strong cytosol stain would be
trivial to segment but a dense heterogeneous population of cells in 3D tissue with only a DAPI stain
can be impossible to segment perfectly. On the experimental side, selecting good cell stains and
acquiring images with low background will make segmenting a more tractable task.

There are many approaches for segmenting cells from image-based transcriptomics assays. Below are
a few methods that are implemented or integrated with starfish to output a
:py:class:`.BinaryMaskCollection`, which represents a collection of labeled objects. If you do not
know which segmentation method to use, a safe bet is to start with thresholding and watershed. On
the other hand, if you can afford to manually define ROI masks there is no better way to
guarantee accurate segmentation.

.. note::
While there is no "ground truth" for cell segmentation, the closest approximation is manual
segmentation by an expert in the tissue of interest.

Thresholding and Watershed
^^^^^^^^^^^^^^^^^^^^^^^^^^

The traditional method for segmenting cells in fluorescence microscopy images is to threshold the
image into foreground pixels and background pixels and then label connected foreground as
individual cells. Common issues that affect thresholding such as background noise can be corrected
by preprocessing images before thresholding and filtering connected components after. There are
`many automated image thresholding algorithms <https://imagej.net/Thresholding>`_ but currently
starfish requires manually selecting a global threshold value in :py:class:`.ThresholdBinarize`.

When overlapping cells are labeled as one connected component, they are typically segmented by
using a `distance transformation followed by the watershed algorithm <https://www.mathworks
.com/company/newsletters/articles/the-watershed-transform-strategies-for-image-segmentation
.html>`_. Watershed is a classic image processing algorithm for separating objects in images and
can be applied to all types of images. Pairing it with a distance transform is particularly
useful for segmenting convex shapes like cells.

A segmentation pipeline that consists of thresholding, connected component analysis, and watershed
is simple and fast to implement but its accuracy is highly dependent on image quality.
The signal-to-noise ratio of the cell stain must be high enough for minimal errors after
thresholding and binary operations. And the nuclei or cell shapes must be convex to meet the
assumptions of the distance transform or else it will over-segment. Starfish includes the basic
functions to build a watershed segmentation pipeline and a predefined :py:class:`.Watershed`
segmentation class that uses the :term:`primary images<Primary Images>` as the cell stain.

* Tutorial: :ref:`Ways to segment by thresholding and watershed<tutorial_watershed_segmentation>`

Manually Defining Cells
^^^^^^^^^^^^^^^^^^^^^^^

The most accurate but time-consuming approach is to manually segment images using a tool such as
`ROI manager <https://imagej.net/docs/guide/146-30.html#fig:The-ROI-Manager>`_ in FIJI (ImageJ). It
is a straightforward process that starfish supports by allowing ROI set binaries to be imported as a
:py:class:`.BinaryMaskCollection`. These masks can then be integrated into the pipeline for
visualization and assigning spots to cells.

* Tutorial: :ref:`Loading ImageJ ROI set<tutorial_manual_segmentation>`

Machine-Learning Methods
^^^^^^^^^^^^^^^^^^^^^^^^

Besides the two classic cell segmentation approaches mentioned above, there are machine-learning
methods that aim to replicate the accuracy of manual cell segmentation while reducing the labor
required. Machine-learning algorithms for segmentation are continually improving but there is no
perfect solution for all image types yet. These methods require training data (e.g. stained
images with manually defined labels) to train a model to predict cell or nuclei locations in test
data. There are `exceptions that don't require training on your specific data <http://www.cellpose
.org/>`_ but generally training the model is something to consider when evaluating how much time
each segmentation approach will require.

Starfish currently has built-in functionality to support `ilastik <https://www.ilastik.org/>`_, a
segmentation toolkit that leverages machine-learning. Ilastik has a Pixel Classification
workflow that performs semantic segmentation of the image, returning probability maps for each
label such as cells and background. To transform the images of pixel probabilities to binary
masks, you can use the same thresholding and watershed methods in starfish that are used for
segmenting images of stained cells.

* Tutorial: :ref:`Using ilastik in starfish<tutorial_ilastik_segmentation>`

.. _section_assigning_spots:

Assigning Spots to Cells
Expand Down Expand Up @@ -170,4 +258,3 @@ origin. At this point, it's trivial to create a cell x gene matrix.
* :ref:`Spot Decoding <tutorial_spot_decoding>`
* :ref:`Segmenting Cells <tutorial_segmenting_cells>`
* :ref:`Assigning Spots to Cells <tutorial_assigning_spots_to_cells>`

Binary file added examples/RoiSet.zip
Binary file not shown.
Binary file added examples/dapi_Probabilities.h5
Binary file not shown.
146 changes: 146 additions & 0 deletions examples/ilastik_segmentation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
"""
.. _tutorial_ilastik_segmentation:
Using ilastik in starfish
=========================
In order to create a cell by gene expression matrix from image-based transcriptomics data, RNA
spots must be assigned to cells. Segmenting a microscopy image into single cells is either
time-consuming when done manually or inaccurate when done with thresholding and watershed. The
advent of deep learning models for image segmentation promises a solution that is both fast and
accurate.
Starfish currently has built-in functionality to support `ilastik <https://www.ilastik.org/>`_,
a segmentation toolkit that leverages machine-learning. Ilastik has a `Pixel Classification
workflow <https://www.ilastik.org/documentation/pixelclassification/pixelclassification/>`_
that performs semantic segmentation of the image, returning probability maps for each
label (e.g. cells). To use ilastik in starfish first install ilastik locally and follow the
Pixel Classification workflow using the GUI to train a classifier, which is saved in an ilastik
project file.
There are two options for using ilastik in starfish. The first is running the trained pixel
classifier on images outside of starfish to generate a probability map that can then be loaded in
starfish as an :py:class:`.ImageStack`. The second is to run the pixel classifier from within
starfish by calling out to ilastik and passing it the trained classifier. The results will be the
same.
The probability map is an nd-array with the same size as the input :py:class:`.ImageStack`. The
value of each element in the nd-array equals the probability of the corresponding pixel to belong to
a label. In ilastik, you can train the pixel classifier to predict the probability for multiple
labels such as nuclei, cells, and background. However, starfish will only load the probability
map of the first label so you should use the first label for cells or nuclei only.
This tutorial will first describe how you can run a trained ilastik classifier from within a
starfish process. In the second section this tutorial will demonstrate how to load an ilastik
probability map hdf5 file as an :py:class:`.ImageStack`. The last section transforms the
:py:class:`.ImageStack` into a :py:class:`BinaryMaskCollection` that can be used by
:py:class:`.Label` to assign spots to cells.
"""

###################################################################################################
# Calling Out to ilastik Trained Pixel Classifier
# ===============================================
#
# Running a trained ilastik pixel classifier from within starfish allows a pipeline to take
# advantage of ilastik's machine-learning model without having to process images from every field
# of view, switching to ilastik to run classifier, exporting all the probability maps, and then
# loading the files back into a starfish pipeline.
#
# The requirements for calling out to ilastik from a starfish pipeline is having ilastik
# installed locally and having an ilastik project that contains a trained classifier. The
# classifier can only be run on an :py:class:`ImageStack` with a single round and channel.
# Obviously, the :py:class:`ImageStack` should be an image of the same type that the classifier
# was trained on for expected probability results.

# Load MERFISH data to get dapi ImageStack
import os
import matplotlib
import matplotlib.pyplot as plt

import starfish.data
from starfish.image import Filter
from starfish.types import Axes, Coordinates, Levels

matplotlib.rcParams["figure.dpi"] = 150

experiment = starfish.data.MERFISH()
fov = experiment["fov_000"]
dapi = fov.get_image("nuclei")

# Process dapi images by blurring and clipping
def preprocess(dapi):
blur = Filter.GaussianLowPass(sigma=5)
blurred = blur.run(dapi)
clip = Filter.Clip(p_min=1, p_max=95, level_method=Levels.SCALE_BY_CHUNK)
clipped = clip.run(blurred)
return clipped

dapi = preprocess(dapi)

# Need ilastik script and ilastik project with trained classifier to instantiate filter
ilastik_exe_path = os.path.join(os.path.dirname("__file__"), 'run_ilastik.sh')
ilastik_proj_path = os.path.join(os.path.dirname("__file__"), 'merfish_dapi.ilp')
ipp = Filter.IlastikPretrainedProbability(ilastik_executable=ilastik_exe_path, ilastik_project=ilastik_proj_path)

# Run IlastikPretrainedProbability filter to get probabilities of 'Label 1' as ImageStack
# probabilities = ipp.run(stack=dapi)

###################################################################################################
# Loading ilastik Probability Map
# ===============================
#
# If you already have probability maps of your images from ilastik or prefer to use the ilastik
# GUI it is possible to load the exported hdf5 files into starfish. When using the 'Prediction
# Export' panel in ilastik be sure to use ``Source: Probabilities`` and ``Format: hdf5``. If you
# edit the Dataset name then it must also be passed as ``dataset_name`` to
# :py:meth:`.import_ilastik_probabilities`. The example below loads the same probability map that
# would have been generated in the first section of this tutorial.

# Need to instantiate Filter even though it is not run
ilastik_exe_path = os.path.join(os.path.dirname("__file__"), 'run_ilastik.sh')
ilastik_proj_path = os.path.join(os.path.dirname("__file__"), 'merfish_dapi.ilp')
ipp = Filter.IlastikPretrainedProbability(ilastik_executable=ilastik_exe_path, ilastik_project=ilastik_proj_path)

# Load hdf5 file as ImageStack
h5_file_path = os.path.join(os.path.dirname("__file__"), 'dapi_Probabilities.h5')
imported_probabilities = ipp.import_ilastik_probabilities(path_to_h5_file=h5_file_path)

# View probability map next to dapi image
f, (ax1, ax2) = plt.subplots(ncols=2)
ax1.imshow(dapi.xarray.values.squeeze())
ax1.set_title("Dapi")
ax2.imshow(imported_probabilities.xarray.values.squeeze())
ax2.set_title("Probabilities")
f.tight_layout()

###################################################################################################
# Transforming Probability Map to Masks
# =====================================
#
# In order to use ilastik semantic segmentation probabilities to assign spots to cells in
# starfish they must be transformed into binary cell masks. A straightforward approach to
# transforming the :py:class:`.ImageStack` to a :py:class:`.BinaryMaskCollection` is to threshold
# the probabilities, run connected component analysis, and watershed. This process is analogous to
# segmenting a stained cell image.

# Scale probability map
clip = Filter.Clip(p_min=0, p_max=100, level_method=Levels.SCALE_BY_CHUNK)
clipped_probabilities = clip.run(imported_probabilities)

# Threshold, connected components, watershed, and filter by area
from starfish.morphology import Binarize, Filter
prob_thresh = 0.05
min_dist = 100
min_allowed_size = 5000
max_allowed_size = 100000
binarized_probabilities = Binarize.ThresholdBinarize(prob_thresh).run(clipped_probabilities)
labeled_masks = Filter.MinDistanceLabel(min_dist, 1).run(binarized_probabilities)
masks = Filter.AreaFilter(min_area=min_allowed_size, max_area=max_allowed_size).run(labeled_masks)

# Show probability map along with cell masks
f, (ax1, ax2) = plt.subplots(ncols=2)
ax1.imshow(imported_probabilities.xarray.values.squeeze())
ax1.set_title("Probabilities")
ax2.imshow(labeled_masks.to_label_image().xarray.values.squeeze(), cmap=plt.cm.nipy_spectral)
ax2.set_title("Masks")
f.tight_layout()
72 changes: 72 additions & 0 deletions examples/manual_segmentation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
"""
.. _tutorial_manual_segmentation:
Loading ImageJ ROI Sets
=======================
.. note::
Starfish only supports importing 2D ROIs.
In order to create a cell by gene expression matrix from image-based transcriptomics data, RNA
spots must be assigned to cells by segmenting an image. The best quality cell segmentation
annotations are manually drawn by experts. If you have ROI sets exported with `ROI manager
<https://imagej.net/docs/guide/146-30.html#fig:The-ROI-Manager>`_ in ImageJ or FIJI, they can be
loaded into starfish as a :py:class:`.BinaryMaskCollection`. The ROI set for each field of view
must be passed with the corresponding :py:class:`.ImageStack` to :py:meth:`.from_fiji_roi_set` in
order to assign accurate ``pixel ticks`` and ``physical ticks`` to the
:py:class:`.BinaryMaskCollection`.
This tutorial demonstrates how to load an ROI set that was exported form ROI manager as a .zip
and how to plot the resulting :py:class:`.BinaryMaskCollection`.
Brief segmentation workflow in ImageJ ROI Manager:
* Tools > ROI Manager
* Click "polygon selection" (third button from left on GUI)
* Create a polygon, then click the "+" button to finalize it
* Repeat until segmented
* Click "more >>>"
* Click "save"
* Save the ROISet
"""

import matplotlib
import matplotlib.pyplot as plt
import os

import starfish.data
from starfish import BinaryMaskCollection
from starfish.image import Filter
from starfish.types import Levels

matplotlib.rcParams["figure.dpi"] = 150

# Load MERFISH data to get dapi ImageStack
experiment = starfish.data.MERFISH()
fov = experiment["fov_000"]
dapi = fov.get_image("nuclei") # nuclei


# Preprocess MERFISH dapi imagestack
def preprocess(dapi):
blur = Filter.GaussianLowPass(sigma=5)
blurred = blur.run(dapi)

clip = Filter.Clip(p_min=1, p_max=95, level_method=Levels.SCALE_BY_CHUNK)
clipped = clip.run(blurred)
return clipped


dapi = preprocess(dapi)

# Import RoiSet.zip as BinaryMaskCollection
roi_path = os.path.join(os.path.dirname("__file__"), 'RoiSet.zip')
masks = BinaryMaskCollection.from_fiji_roi_set(path_to_roi_set_zip=roi_path, original_image=dapi)

# display BinaryMaskCollection next to original dapi image
f, (ax1, ax2) = plt.subplots(ncols=2)
ax1.imshow(dapi.xarray.values.squeeze())
ax1.set_title("Dapi")
ax2.imshow(masks.to_label_image().xarray.values.squeeze(), cmap=plt.cm.nipy_spectral)
ax2.set_title("Imported ROI")
f.tight_layout()
Binary file added examples/merfish_dapi.ilp
Binary file not shown.
1 change: 1 addition & 0 deletions examples/run_ilastik.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
THIS IS A FAKE SHELL SCRIPT INTENDED TO TRICK :py:class:`.IlastikPretrainedProbability`
Loading

0 comments on commit 1a47383

Please sign in to comment.