CellBender remove-background report¶

This output report from cellbender remove-background contains a summary of the run, including counts remaining, counts removed, further analyses, and any warnings or suggestions if the run seems to be abnormal.

This HTML report is created from a jupyter notebook at

cellbender/cellbender/remove-background/report.ipynb

within the CellBender codebase. Feel free to run the notebook yourself and make any changes you see fit, or use it as a starting point for further analyses.

The commentary in this report is generated using automated heuristics and best guesses based on hundreds of real datasets. If any of the automated commentary in this report seems incorrect for your dataset, please submit a question or an issue at our github repository https://github.com/broadinstitute/CellBender

Cellarium Lab .. Methods Group .. Data Sciences Platform .. Broad Institute


Input and output files¶

(Modify this section if you run this notebook yourself.)

Input file: /data/cephfs-1/work/groups/cubi/projects/2023-10-17_Hubertus_SpinalCord/20231018_cellranger/Brain_10X/5k_mouse_brain_CNIK_3pv3_raw_feature_bc_matrix.h5
Output file: /data/cephfs-1/work/groups/cubi/projects/2023-10-17_Hubertus_SpinalCord/20231018_cellranger/Brain_10X/Brain_10X.h5

Report¶

CellBender version 0.3.1¶

2023-11-08 16:47:44

Brain_10X.h5¶

Loaded dataset¶

AnnData object with n_obs × n_vars = 50000 × 32285
    obs: 'background_fraction', 'cell_probability', 'cell_size', 'droplet_efficiency', 'n_raw', 'n_cellbender'
    var: 'ambient_expression', 'feature_type', 'genome', 'gene_id', 'cellbender_analyzed', 'n_raw', 'n_cellbender'
    uns: 'cell_size_lognormal_std', 'empty_droplet_size_lognormal_loc', 'empty_droplet_size_lognormal_scale', 'swapping_fraction_dist_params', 'estimator', 'features_analyzed_inds', 'fraction_data_used_for_testing', 'learning_curve_learning_rate_epoch', 'learning_curve_learning_rate_value', 'learning_curve_test_elbo', 'learning_curve_test_epoch', 'learning_curve_train_elbo', 'learning_curve_train_epoch', 'target_false_positive_rate'
    obsm: 'cellbender_embedding'
    layers: 'raw', 'cellbender'

Examine how many counts were removed in total¶

removed 3613126 counts from non-empty droplets
removed 7.88% of the counts in non-empty droplets
Rough estimate of expectations based on nothing but the plot above:
roughly 4638392 noise counts should be in non-empty droplets
that is approximately 10.12% of the counts in non-empty droplets
with a false positive rate [FPR] of 1.0%, we would expect to remove about 11.12% of the counts in non-empty droplets

The algorithm removed a bit less than naive expectations would indicate, but this is likely okay. If removal seems insufficient, the FPR can be increased.

Assessing convergence of the algorithm¶

The learning curve tells us about the progress of the algorithm in inferring all the latent variables in our model. We want to see the ELBO increasing as training epochs increase. Generally it is desirable for the ELBO to converge at some high plateau, and be fairly stable.

What to watch out for:

1. large downward spikes in the ELBO (of value more than a few hundred) 2. the test ELBO can be smaller than the train ELBO, but generally we want to see both curves increasing and reaching a stable plateau. We do not want the test ELBO to dip way back down at the end. 3. lack of convergence, where it looks like the ELBO would change quite a bit if training went on for more epochs.

Automated assessment --------

  • WARNING: The training ELBO deviates quite a bit from the max value during the second half of training.
  • We typically expect to see the training ELBO increase almost monotonically. This curve seems to have a concerted period of motion in the wrong direction near epoch 40. If this is early in training, this is probably okay.

Summary:

This is unusual behavior, and a reduced --learning-rate is indicated. Re-run with half the current learning rate and compare the results.

Examine count removal per gene¶

Pearson correlation coefficient for the above is 0.9582

This meets expectations.

Table of top genes removed¶

Ranked by fraction removed, and excluding genes with fewer than 3535 total raw counts (90th percentile)

ambient_expression feature_type genome gene_id cellbender_analyzed n_raw n_cellbender n_removed fraction_removed fraction_remaining n_raw_cells n_cellbender_cells n_removed_cells fraction_removed_cells fraction_remaining_cells
gene_name
Tmem131 0.000048 Gene Expression mm10 ENSMUSG00000026116 True 4294 -115611 119905 27.923847 -26.923847 3658 -115611 119269 32.604975 -31.604975
Smap1 0.000071 Gene Expression mm10 ENSMUSG00000026155 True 4810 -27997 32807 6.820582 -5.820582 3859 -27997 31856 8.254988 -7.254988
Arhgef4 0.000080 Gene Expression mm10 ENSMUSG00000037509 True 3923 -18782 22705 5.787662 -4.787662 2812 -18782 21594 7.679232 -6.679232
Prex2 0.000036 Gene Expression mm10 ENSMUSG00000048960 True 3868 -16362 20230 5.230093 -4.230093 3344 -16362 19706 5.892943 -4.892943
Atp6v1h 0.000054 Gene Expression mm10 ENSMUSG00000033793 True 5252 -21867 27119 5.163557 -4.163557 4495 -21867 26362 5.864739 -4.864739
Ogfrl1 0.000122 Gene Expression mm10 ENSMUSG00000026158 True 4604 -18454 23058 5.008254 -4.008254 2906 -18454 21360 7.350310 -6.350310
Gdap1 0.000064 Gene Expression mm10 ENSMUSG00000025777 True 3967 -13803 17770 4.479455 -3.479455 3116 -13803 16919 5.429718 -4.429718
Fam135a 0.000035 Gene Expression mm10 ENSMUSG00000026153 True 4968 -16780 21748 4.377617 -3.377617 4516 -16780 21296 4.715678 -3.715678
Tcea1 0.000058 Gene Expression mm10 ENSMUSG00000033813 True 4251 -12688 16939 3.984709 -2.984709 3441 -12688 16129 4.687300 -3.687300
Inpp4a 0.000072 Gene Expression mm10 ENSMUSG00000026113 True 5403 -15519 20922 3.872293 -2.872293 4377 -15519 19896 4.545579 -3.545579

WARNING: The expression of the highly-expressed gene Rgs20 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Atp6v1h decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Rb1cc1 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene St18 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Pcmtd1 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Cspp1 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Arfgef1 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene A830018L16Rik decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Ncoa2 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Rpl7 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Stau2 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Ube2w decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Gm28376 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Smap1 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Fam135a decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Col19a1 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Lmbrd1 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Phf3 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Khdrbs2 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Bend6 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Dst decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Fam168b decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Cox5b decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Actr1b decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Tmem131 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Inpp4a decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene 2010300C02Rik decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Rev1 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Aff3 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Npas2 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Rpl31 decreases quite markedly after CellBender. Check to ensure this makes sense!

WARNING: The expression of the highly-expressed gene Map4k4 decreases quite markedly after CellBender. Check to ensure this makes sense!

Cell probabilities¶

The inferred posterior probability that each droplet is non-empty.

We sometimes write "non-empty" instead of "cell" because dead cells and other cellular debris can still lead to a "non-empty" droplet, which will have a high posterior cell probability. But these kinds of low-quality droplets should be removed during cell QC to retain only high-quality cells for downstream analyses.

Concordance of data before and after remove-background¶

The intent is to change the input data as little as possible while achieving noise removal. These plots show general summary statistics about similarity of the input and output data. We expect to see the data lying close to a straight line (gray). There may be outlier genes/features, which are often those highest-expressed in the ambient RNA.

The plots here show data for inferred cell-containing droplets, and exclude the empty droplets.

PCA of encoded gene expression¶

We are not looking for anything specific in the PCA plot of the gene expression embedding, but often we see clusters that correspond to different cell types. If you see only a single large blob, then the dataset might contain only one cell type, or perhaps there are few counts per droplet.

Summary of warnings:¶

Large deviation in training ELBO from max value late in learning.

Back-tracking in training ELBO.

Expression of gene Rgs20 decreases quite a bit

Expression of gene Atp6v1h decreases quite a bit

Expression of gene Rb1cc1 decreases quite a bit

Expression of gene St18 decreases quite a bit

Expression of gene Pcmtd1 decreases quite a bit

Expression of gene Cspp1 decreases quite a bit

Expression of gene Arfgef1 decreases quite a bit

Expression of gene A830018L16Rik decreases quite a bit

Expression of gene Ncoa2 decreases quite a bit

Expression of gene Rpl7 decreases quite a bit

Expression of gene Stau2 decreases quite a bit

Expression of gene Ube2w decreases quite a bit

Expression of gene Gm28376 decreases quite a bit

Expression of gene Smap1 decreases quite a bit

Expression of gene Fam135a decreases quite a bit

Expression of gene Col19a1 decreases quite a bit

Expression of gene Lmbrd1 decreases quite a bit

Expression of gene Phf3 decreases quite a bit

Expression of gene Khdrbs2 decreases quite a bit

Expression of gene Bend6 decreases quite a bit

Expression of gene Dst decreases quite a bit

Expression of gene Fam168b decreases quite a bit

Expression of gene Cox5b decreases quite a bit

Expression of gene Actr1b decreases quite a bit

Expression of gene Tmem131 decreases quite a bit

Expression of gene Inpp4a decreases quite a bit

Expression of gene 2010300C02Rik decreases quite a bit

Expression of gene Rev1 decreases quite a bit

Expression of gene Aff3 decreases quite a bit

Expression of gene Npas2 decreases quite a bit

Expression of gene Rpl31 decreases quite a bit

Expression of gene Map4k4 decreases quite a bit