Vitessce Integration Design Document #1074

srivarra · 2023-10-10T23:00:24Z

This is for internal use only; if you'd like to open an issue or request a new feature, please open a bug or enhancement issue

Instructions

This document should be filled out prior to embarking on any project that will take more than a couple of hours to complete. The goal is to make sure that everyone is on the same page for the functionality and requirements of new features. Therefore, it's important that this is detailed enough to catch any misunderstandings beforehand. For larger projects, it can be useful to first give a high-level sketch, and then go back and fill in the details. For smaller ones, filling the entire thing out at once can be sufficient.

Relevant background

Vitessce is a visualization tool for spatial single cell data. It can be used as a standalone web application, an embedded web component, or as a widget in Jupyter Lab. HuBMAP uses this to visualize certain datasets on their data portal.

Here is a brief overview of the components for a Vitessce HuBMAP Dataset: HBM892.CCDZ.345. One thing that's nice is we can download the Jupyter Notebook which creates this Vitessce visualization.

This visualization is composed of the following Views:

Scatter plot
Spatial
Set Views
Heatmap

There are coordination values and scopes with which "glue" views together. For example, hovering over a point in the UMAP scatter plot will also highlight the corresponding point in the Spatial view, along with the corresponding row in the Heatmap view.

Keep in mind that Vitessce works best with a single image per visualization.

File Types and File Formats

With Vitessce, there are many ways to organize the data for ingestion.

Implementation with current Cell Table + TIFF Files

Assuming we have a finalized Cell Table with respect to a single image, we can organize the data for a Vitessce visualization as follows:

Necessary - Observation Feature Matrix (csv)
1. This is an observation by feature matrix (cell segmentation ID x Channel Intensity)
  
  cell_id CD33 MYC
  
  cell_1 15.1 0.0
  
  cell_2 0.0 21.4
  
  cell_3 0.0 0.0

Necessary - Observation Sets (csv)

Maps each observation (cell id) to one or more sets.

Mainly for assigning cells to clusters, cell types, groups, lineages, etc...

cell_id	leiden	pixie	cell_type_coarse	cell_type_fine	pred_cell_type	pred_score
cell_1	1	1	Immune	B cell	B cell	0.81
cell_2	2	1	Immune	T cell	T cell	0.99
cell_3	2	3	Immune	T cell	Macrophage	0.21
cell_4	3	6	Neuron	Excitatory neuron	Inhibitory neuron	0.25

Necessary - Observation Segmentations (OME TIFF)
1. A cell segmentation mask
Necessary - Image (OME TIFF / OME Zarr)
1. Can also be a Zarr store
  1. Convert the FOV structure to a multichannel OME TIFF.
Optional - Observation Embedding (csv)
1. UMAP, TSNE embeddings for each cell (UMAP Axis 1, UMAP Axis 2, TSNE Axis 1, TSNE Axis 2, etc...)
2. Requires 2D embeddings for each cell with respect to a dimensionality reduction technique.
Optional - [Feature labels / Observation Labels] (csv)
1. Alternate labels for cell IDs, and channel names
  
  cell_id alt_cell_id
  
  cell_1 ATGC
  
  cell_2 GTTA

Implementation with AnnData + TIFF Files

In the case where we have an AnnData Zarr object, organizing the data is much easier.

Convert the FOV to a multichannel OME-TIFF.

For example, we mainly need to provide the following:

X: The observation by feature matrix (cell-by-marker)
obs: The observation metadata (cell metadata)
var: The feature metadata (marker metadata)
Optional - obsm: The observation embeddings (UMAP, TSNE, etc...)
1. Prefixed with X_, where X_umap is the UMAP embedding of X.
Optional - layers: The image data (cell segmentation mask, image, etc...)

Implementation with `SpatialData`

In the case where we have a SpatialData object, it's more straightforward than the AnnData implementation.

We just need the SpatialData object subset with a particular FOV of interest. The image is a Zarr store, and the AnnData is the Table object. The same notes about AnnData apply.

Creating Views and Coordination Values

Depending on the data, the use case you can organize views in multiple ways.

For example, with the Bone Marrow data, we have MIBI, MALDI, and HnE images. We can create a Vitessce visualization with the following views:

Spatial
- MIBI
- MALDI
- HnE
- Create a view which syncs all three images together (where a segmentation ID in one image is highlighted in the other two images)
Scatter Plot
- UMAP / TSNE on Axis 1,2 colored by Pixie Clusters
- Layer controller for MIBI and MALDI, along with Feature List and Observation Sets for both

The user has artistic freedom to select what works best for them and the data they're trying to visualize. This is an iterative process which can be run in a Jupyter Notebook.

Example Code from some HuBMAP stuff in July:

from vitessce import (
    VitessceConfig,
    Component as cm,
    CoordinationType as ct,
    OmeTiffWrapper,
    OmeZarrWrapper,
    MultiImageWrapper,
)
from os.path import join


vc = VitessceConfig(schema_version="1.0.16", name='MCMicro Bitmask Visualization', description='Segmentation + Data of Exemplar 001')
dataset = vc.add_dataset(name='MCMicro').add_object(
    MultiImageWrapper(
        image_wrappers=[
            OmeZarrWrapper(
                img_url="https://vitessce-data.storage.googleapis.com/0.0.33/main/human-lymph-node-10x-visium/human_lymph_node_10x_visium.ome.zarr", name="Image",
            ),
            # We can mix and match image types as well
            OmeTiffWrapper(img_url='https://vitessce-demo-data.storage.googleapis.com/exemplar-001/cellMask.pyramid.ome.tif', name='Mask', is_bitmask=True),
        ]
    )
)
spatial = vc.add_view(cm.SPATIAL, dataset=dataset)
status = vc.add_view(cm.STATUS, dataset=dataset)
lc = vc.add_view(cm.LAYER_CONTROLLER, dataset=dataset)

# Organize the layout of the views
vc.layout(spatial | (lc / status));

# Create the Vitessce Object
vw = vc.widget()
vw

This is a rather basic view with just the Image, segmentation mask and a layer controller for the channels.

Vitessce configurations can be applied to other images, segmentation masks and cell tables as well. So a user can develop their ideal visualization structure on one FOV, and then easily apply it to other FOVs of interest.

It is not much more effort to go from a simple view such as in the example code, to a complex view like the HuBMAP one.

Exporting and Data Hosting

Once the user is satisfied with their current Vitessce Visualization, the configuration schema can be exported (along with the data) to a Zarr store a supplementary JSON file.

This can be loaded locally, or ideally hosted on a cloud service (AWS, GCloud) for easy storage.

Viewing the Data

When it comes to viewing the data object, we can set up a lab GitHub Pages site, or add the view to the existing lab site.

GitHub Pages Deployment

Timeline

Estimated date when a fully implemented version will be ready for review: N/A

Estimated date when the finalized project will be merged in: N/A

Resources

There are a lot of examples and resources for Vitessce.

Official Vitessce Python Notebooks:
- https://github.com/vitessce/vitessce-python/tree/main/docs/notebooks
- https://github.com/vitessce/vitessce-python-tutorial
Go to HuBMAP and take a look at Datasets with visualizations, pick one which may resemble a good starting point.

More examples out in the wild:

The text was updated successfully, but these errors were encountered:

srivarra · 2023-10-10T23:02:44Z

@ngreenwald @camisowers @alex-l-kong @jranek Let me know your thoughts.

ngreenwald · 2023-10-11T19:54:06Z

This looks great! If we're already going to be looking into AnnData for post-clustering, seems like we should plan on using that for vitesse as well, at least until we make the full switch over to spatialdata. Is there a way to show segmentations/masks of things that aren't cells?

For example, if we have two or three regions in a given image, is there a way to highlight which cells are in which regions? Or maybe display the regions as a layer?

If it's the case that adding more visualizations isn't much work, it would be great to take a look through what other types of visualizations people have done with Vitesse, so we can make a priority list of which stuff to implement in MVP, which to add to the queue, etc.

ngreenwald · 2023-10-11T19:55:41Z

Maybe tomorrow you, @camisowers, and @jranek can have a brief discussion about how to split up the tasks for AnnData conversion. Seems like 1 important thing to have is a notebook that takes a finalized cell table and creates an AnnData object. Are there any other tools that would be broadly useful?

srivarra · 2023-10-11T20:01:49Z

@ngreenwald Yeah, segmentation masks can be of anything and you can have multiple of them, and use layers to turn them on and off.

camisowers · 2023-10-11T20:56:31Z

Looks cool! We can take the existing OME TIFF conversion code and add it into a notebook that also does the AnnData conversion for the cell table.

alex-l-kong · 2023-10-12T20:43:46Z

Great stuff. I think we can also add something like this into the Pixie pipeline where we store the training and subsetted table as AnnData as well. It will also allow us to visualize the pixel and cell masks a lot more easily. The uns is also great for storing stuff like norm_vals that don't have a great mapping to any of the default AnnData params.

How interactive/customizable is Vitesse for something like the metacluster remapping process? If it will be easy to integrate here then, radically speaking, we could nuke the existing pipeline (it's really clunky).

srivarra · 2023-10-12T20:45:56Z

@alex-l-kong Vitesse would not be able to edit data for the metacluster remapping process, Napari however, would be ideal for that. It's mainly "read-only" intended for data showcases for publications, or projects.

srivarra added the design_doc Detailed implementation plan label Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vitessce Integration Design Document #1074

Vitessce Integration Design Document #1074

srivarra commented Oct 10, 2023 •

edited

Loading

srivarra commented Oct 10, 2023

ngreenwald commented Oct 11, 2023

ngreenwald commented Oct 11, 2023

srivarra commented Oct 11, 2023 •

edited

Loading

camisowers commented Oct 11, 2023

alex-l-kong commented Oct 12, 2023 •

edited

Loading

srivarra commented Oct 12, 2023

Vitessce Integration Design Document #1074

Vitessce Integration Design Document #1074

Comments

srivarra commented Oct 10, 2023 • edited Loading

Instructions

File Types and File Formats

Implementation with current Cell Table + TIFF Files

Implementation with AnnData + TIFF Files

Implementation with SpatialData

Creating Views and Coordination Values

Example Code from some HuBMAP stuff in July:

Exporting and Data Hosting

Viewing the Data

Resources

srivarra commented Oct 10, 2023

ngreenwald commented Oct 11, 2023

ngreenwald commented Oct 11, 2023

srivarra commented Oct 11, 2023 • edited Loading

camisowers commented Oct 11, 2023

alex-l-kong commented Oct 12, 2023 • edited Loading

srivarra commented Oct 12, 2023

srivarra commented Oct 10, 2023 •

edited

Loading

Implementation with `SpatialData`

srivarra commented Oct 11, 2023 •

edited

Loading

alex-l-kong commented Oct 12, 2023 •

edited

Loading