This repository contains the code that accompanies the paper introducing "Multiple direct RNApadlock probing in combination with in-situ sequencing (mudRapp-seq)":
Ahmad S, Gribling-Burrer AS, Schaust J, Fischer SC, Ambil UB, Ankenbrand MJ, Smyth RP. Visualizing the transcription and replication of influenza A viral RNAs in cells by multiple direct RNA padlock probing and in-situ sequencing (mudRapp-seq) (in preparation)
Warning
This repository is a work in progress, so if you notice any errors or have any suggestions, please open an issue.
This repository is archived at Zenodo
Raw data is archived independently in the BioImage Archive (BIA) with accession number S-BIAD1376 (under embargo until publication).
In order to reproduce our analyses, download the raw data from BIA and put them into the data/raw
folder.
Image data, acquired on the Leica DMI8 were maximum intensity projected along the z-axis and instant computational clearing (ICC) was applied using Leica software. These images, with associated metadata consist the raw data of our analysis. Only for one dataset (2nt), the images without ICC were used for cell segmentation. Except for these non-ICC images, all datasets were convertet to spacetx format for further analysis.
All final and some intermediate results are included in the repository to facilitate additional analyses without having to re-process all files from scratch.
- cDNA_vRNA: sensitivity of cDNA vs direct vRNA probing (raw (cDNA, vRNA), raw (cDNA*), spacetx)
- specificity: specificity of PLPs with closely related strains PR8 and StPt (raw)
- plp_individual: sensitivity of direct vRNA probing with 10 individual PLPs on the HA and NA segments (raw HA, raw NA, spacetx)
- plp_cumulative: sensitivity of direct vRNA probing with increasing number of PLPs on the HA, NA, and PB1 segments (initial experiment for HA, NA, and PB1, replicates for HA and PB1, replicates for NA, more replicates for HA, spacetx)
- seq_qc: quality control experiment for the sequencing method, only a single barcode is present in each sample (raw, spacetx)
- seq_2nt: biologically relevant experiment, simulataneous detection of all viral mRNA and vRNA segments using a 2nt barcode (raw initial with ICC,raw initial no ICC,raw replicates with ICC,raw replicates no ICC, spacetx, intensity_scaled)
To re-create the python environments with mamba
run:
mamba env create -f envs/starfish.yml # mudRapp-seq-starfish
mamba env create -f envs/cellpose.yml # mudRapp-seq-cellpose
The R environment for this project is managed via renv
. A local environment is automatically created for you, when you run R
or Rscript
for the first time in the main project directory. This happens because of the .Rprofile
file.
The jupyter files are in sub-folders of code/
but assume the kernel to run in the project root.
This is necessary to make the R kernel use the local renv, and allows to consistently use paths relative to the project root rather than the specific notebook location.
In VS Code this can be achieved by changing the setting jupyter.notebookFileRoot
to ${workspaceFolder}
.
For jupyter lab
there seems to be no simple solution at this moment (see jupyterlab#11619).
As starfish
is used, the raw data needs to be restructured in SpaceTx format.
In order to create the formatted data in data/spacetx
run these steps in the root of the mudRapp-seq
repo and in the mudRapp-seq
environment:
mamba run -n mudRapp-seq-starfish python code/data_formatting/cDNA_vRNA.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/specificity.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/plp_individual.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/plp_cumulative.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/seq_qc.py
mamba run -n mudRapp-seq-starfish python code/data_formatting/seq_2nt.py
For cell segmentation in the seq_2nt dataset, the intensity of raw images without ICC need to be re-scaled, such that the autofluorescence within the cell is amplified:
mamba run -n mudRapp-seq-starfish python code/data_formatting/seq_2nt_scale_intensity.py
Images were separately segmented for nuclei and cell instances. Nuclei segmentation is used to separate spots based on their location into nucleus and cytoplasm. Cell segmentation is used to count spots per cell, filter infected cells and perform single cell analyses.
All segmentation masks (along with training data and models) are depositet at Zenodo:
You can either download from there and unpack them into analysis/segmentation
or follow the instructions below to create masks yourself.
For nucleus segmentation, a cellpose model (models/cellpose/nuclei
) was trained and applied to the raw dapi images (in spaceTx format).
The model was trained on a total of 5 dapi images with human provided sparse labels (seq_2nt, rep3, hpi5, fov1-5).
For cell instance segmentation, two different approaches were used:
- Watershed of the dapi image with nuclei as seeds
- A separate cellpose model with manual correction (details below)
The first strategy was used for most data, as it was deemed sufficient for filtering of infected cells and to calculate summary statistics like spots per cell. However, for single cell analyses the cell borders were not reliable enough.
This code performs nuclei segmentation with the cellpose model and watershed for cell segmentation (strategy 1).
mamba run -n mudRapp-seq-cellpose python code/segmentation/cellpose_nuclei_watershed_cells.py
The separate cellpose model was trained on raw images without ICC (computational clearing by the microscope vendor).
Further, data was preprocessed with intensity scaling (see code).
The model was trained on a total of 7 images with human provided sparse labels (2nt_rep1_0.3MOI_5hpi_fov4, 2nt_rep1_0.3MOI_7hpi_fov1, 2nt_rep1_0.3MOI_8hpi_fov1, 2nt_rep1_0.3MOI_8hpi_fov4, 2nt_rep1_1.0MOI_7hpi_fov1, 2nt_rep1_1.0MOI_8hpi_fov1, 2nt_rep2_0.3MOI_8hpi_fov2).
In order to maximize the number of correctly detected cells, the following parameters were used: cellprob_threshold=-4.0
, flow_threshold=0.7
based on preliminary experiments.
Resulting masks were post-processed, removing small objects and closing small holes and gaps (see code).
mamba run -n mudRapp-seq-cellpose python code/segmentation/cellpose_nuclei.py
mamba run -n mudRapp-seq-cellpose python code/segmentation/cellpose_cells.py
Masks produced this way were manually corrected using label editing tools in napari
.
Manual correction involved extending cells, shrinking cells, moving cell borders and adding new cells (new cells were assigned IDs of 2000 and higher).
The focus of the manual correction was mainly on infected cells, if no visibly infected cells were made out during inspection the FOV was saved unchanged.
The script used for manual correction can be started for a specific replication, MOI, hpi, and fov like this:
mamba run -n mudRapp-seq-starfish python code/segmentation/manual_correction_via_napari.py --rep 1 --moi 0.3MOI --hpi 7 --fov_index 1
The "cells (manually corrected)" layer, can be modified using napari
tools.
When finished, the layer can be saved in the corresponding folder (e.g. analysis/segmentation/seq_2nt/rep1/0.3MOI/7hpi/fov_1_cpmc_cells.png
), the infix _cpmc_
stands for cellpose with manual correction.
Spot detection is performed using starfish methods. The following command creates csv and netCDF files in analysis/spot_detection
mamba run -n mudRapp-seq-starfish python code/spot_detection/cDNA_vRNA.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/specificity.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/plp_individual.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/plp_cumulative.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/seq_qc.py
mamba run -n mudRapp-seq-starfish python code/spot_detection/seq_2nt.py
This creates separate spots files for each fov, they can be combined to a single tsv.xz
file for each experiment using
Rscript code/spot_detection/combine_csvs.R
The result of this step is included in the repository:
analysis/spot_detection/cDNA_vRNA/all_spots.tsv.xz
analysis/spot_detection/plp_cumulative/all_spots.tsv.xz
analysis/spot_detection/plp_individual/all_spots.tsv.xz
analysis/spot_detection/seq_2nt/all_spots.tsv.xz
analysis/spot_detection/specificity/all_spots.tsv.xz
Summary results of the QC experiment are in:
analysis/spot_detection/seq_qc/rep0/A_PB2/results.csv
The main result is the much higher sensitivity for direct vRNA probing compared to cDNA probing. For details, see the analysis notebook. These figures are included in the manuscript as Supp. Fig. 1b and Fig. 1c, respectively.
The main result is that padlock probing has very high specificity. There are almost no false positive results of probes designed for another strain. For details, see the analysis notebook. These figures are included in the manuscript as Supp. Fig. 1b and Fig. 1c, respectively.
To test the sensitivity of individual padlock probes (PLPs), PLPs for ten distinct locations on NA and HA each, have been designed. The main result is that individual PLPs have different levels of sensitivity. While most PLPs produce a good number of spots, some PLPs produce almost none. For details, see the analysis notebook. These figures are included in the manuscript as Fig. 2c,d and Supp. Fig. 2c,d, respectively.
To explain the different efficiencies of the individual PLPs, the reactivity of the binding sites was analyzed with Nano-DMS-MaP, both with and without PLPs bound. PLP binding reduces the reactivity of the binding sites (as expected). Further, the binding reactivity (without PLPs) positively correlates with PLP efficiency (both looking at the whole binding site, and only looking at a small window around the junctions). For details, see the analysis notebook. An overview of the reactivities along the NA segment, and the correlations with efficiencies are shown in the manuscript in Fig. 3b,c and Supp. Fig. 3a. The same plots were generated for the HA segment in the same notebook and shown in Supp. Fig. 4a,4b,4c.
To test the sensitivity with increasing number of padlock probes per segment, an increasing number of the ten distinct locations on NA and HA, have been used. The main result is that sensitivity increases with number of PLPs used, but saturates around 6PLPs. For details, see the analysis notebook. These figures are included in the manuscript as Fig. 2f,g and Supp. Fig. 2f,g, respectively.
Dedicated experiments have been performed, to check the quality of the sequencing procedure.
Based on four experiments, in which only one of the channels (A,G,T, and C) is active in the first round, the bleed-through of signal from each channel to each other channel was estimated.
Only a moderate bleed-through (factor 0.656) was detected from channel A to channel T.
For details, see the notebooks (analysis, plot). This figure is included in the manuscript as Supp. Fig. 5b.
Detailed analysis of the decoding correctness showed, that in an experiment with only a single valid 6nt barcode present, more than 92% of all detected spots were correct after 2 rounds of sequencing and 84% spots after 6 rounds of sequencing.
For details, see the analysis notebook. These figures are included in the manuscript as Supp. Fig. 6b,c.
- channel order: "AGTC"
- magnification: images were taken at 63x oil objective and pixel size is 0.103 µm
Segment | vRNA/mRNA | Code |
---|---|---|
PB2 | vRNA | TT |
PB1 | vRNA | TG |
PA | vRNA | TC |
HA | vRNA | TA |
NP | vRNA | GT |
NA | vRNA | GG |
M | vRNA | GC |
NS | vRNA | GA |
PB2 | mRNA | CT |
PB1 | mRNA | CG |
PA | mRNA | CC |
HA | mRNA | CA |
NP | mRNA | AT |
NA | mRNA | AG |
M | mRNA | AC |
NS | mRNA | AA |
The molecule and segment counts, relative abundances and ambiguous spots were analysed by MOI and hpi, separately for nucleus and cytoplasm.
For details, see the analysis notebook. These figures are included in the manuscript as Fig. 5c,d and Supp. Fig. 7a,b, and 9a,b respectively.
The temporal correlation analysis of single segment mRNA expression with total vRNA abundance revealed a high correlation of the M segment.
For details, see the analysis notebook. This figure is included in the manuscript as Supp. Fig. 8.
The single-cell analysis reveals extensive cell-to-cell heterogeneity. A substantial proportion of cells fails to replicate all vRNA segments. Cells missing either component of the polymerase complex vRNA segments or NP are associated with very low replication of the vRNA.
For details, see the analysis notebook. The results of the linear modelling are included in Table 2 in the manuscript. These figures are included in the manuscript as Fig. 6b,c,d,e and Supp. Fig. 10a,b, and 12a,b respectively.