Skip to content

Commit

Permalink
Merge pull request #25 from ccb-hms/add_exercises
Browse files Browse the repository at this point in the history
minor formatting changes to cell type annotation
  • Loading branch information
andrewGhazi authored Sep 8, 2024
2 parents cccfa8e + b16b7e1 commit ef7f591
Showing 1 changed file with 42 additions and 12 deletions.
54 changes: 42 additions & 12 deletions episodes/cell_type_annotation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,25 @@ editor_options:
```{r chunk-opts, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(BiocStyle)
options(digits = 3)
```

Again we'll start by loading the libraries we'll be using:

```{r setup}
library(AUCell)
library(MouseGastrulationData)
library(SingleR)
library(bluster)
library(scater)
library(scran)
library(pheatmap)
library(GSEABase)
```

## Data retrieval

We'll be using the same set of WT chimeric mouse embryo data:
We'll be using the fifth processed sample from the WT chimeric mouse embryo data:

```{r data, message = FALSE}
sce <- WTChimeraData(samples = 5, type = "processed")
Expand All @@ -49,7 +54,9 @@ To speed up the computations, we take a random subset of 1,000 cells.

```{r}
set.seed(123)
ind <- sample(ncol(sce), 1000)
sce <- sce[,ind]
```

Expand All @@ -59,6 +66,7 @@ The SCE object needs to contain log-normalized expression counts as well as PCA

```{r preproc, warning = FALSE}
sce <- logNormCounts(sce)
sce <- runPCA(sce)
```

Expand Down Expand Up @@ -102,6 +110,7 @@ colLabels(sce) <- clusterCells(sce, use.dimred = "PCA",
table(colLabels(sce))
```
You can see we ended up with `r length(table(colLabels(sce)))` clusters of varying sizes.

We can now overlay the cluster labels as color on a UMAP plot:

Expand Down Expand Up @@ -143,7 +152,9 @@ cluster when compared to another cluster.

```{r marker-detect}
rownames(sce) <- rowData(sce)$SYMBOL
markers <- findMarkers(sce, test.type = "wilcox", direction = "up", lfc = 1)
markers
```

Expand Down Expand Up @@ -249,15 +260,17 @@ most of the examples here are derived.

```{r ref-data, message = FALSE}
ref <- EmbryoAtlasData(samples = 29)
ref
table(ref$stage)
```

To speed up the computations, we subsample the dataset to 1,000 cells.
In order to reduce the computational load, we subsample the dataset to 1,000 cells.

```{r}
set.seed(123)
ind <- sample(ncol(ref), 1000)
ref <- ref[,ind]
```

Expand All @@ -268,15 +281,18 @@ tab <- sort(table(ref$celltype), decreasing = TRUE)
tab
```

We need the normalized log counts, so we add those on:

```{r ref-preproc}
ref <- logNormCounts(ref)
```

Some cleaning - remove cells of the reference dataset for which the cell
type annotation is missing.
type annotation is missing:

```{r na-celltype}
nna <- !is.na(ref$celltype)
ref <- ref[,nna]
```

Expand All @@ -285,24 +301,30 @@ to remove noise prior to subsequent annotation tasks.

```{r low-abu-ct}
abu.ct <- names(tab)[tab >= 10]
ind <- ref$celltype %in% abu.ct
ref <- ref[,ind]
```

Restrict to genes shared between query and reference dataset.

```{r shared-genes}
rownames(ref) <- rowData(ref)$SYMBOL
isect <- intersect(rownames(sce), rownames(ref))
sce <- sce[isect,]
ref <- ref[isect,]
shared_genes <- intersect(rownames(sce), rownames(ref))
sce <- sce[shared_genes,]
ref <- ref[shared_genes,]
```

Convert sparse assay matrices to regular dense matrices for input to
SingleR.
SingleR:

```{r sparse-to-dense-mat}
sce.mat <- as.matrix(assay(sce, "logcounts"))
ref.mat <- as.matrix(assay(ref, "logcounts"))
```

Expand Down Expand Up @@ -337,8 +359,8 @@ type, indicating that the clustering represents finer subdivisions within these
cell types.

```{r, fig.width = 10, fig.height = 10}
library(pheatmap)
tab <- table(anno = res$pruned.labels, cluster = colLabels(sce))
pheatmap(log2(tab + 10), color = colorRampPalette(c("white", "blue"))(101))
```

Expand All @@ -350,6 +372,7 @@ experts.

```{r anno-vs-preanno, fig.width = 10, fig.height = 10}
tab <- table(res$pruned.labels, sce$celltype.mapped)
pheatmap(log2(tab + 10), color = colorRampPalette(c("white", "blue"))(101))
```

Expand Down Expand Up @@ -389,10 +412,11 @@ the markers are available. We demonstrate this approach using cell type
markers derived from the mouse embryo atlas dataset.

```{r atlas-markers}
library(scran)
wilcox.z <- pairwiseWilcox(ref, ref$celltype, lfc = 1, direction = "up")
markers.z <- getTopMarkers(wilcox.z$statistics, wilcox.z$pairs,
pairwise = FALSE, n = 50)
lengths(markers.z)
```

Expand All @@ -414,19 +438,22 @@ set, but involving only the top ranking genes by expression in each
cell.

```{r create-gsets, message = FALSE}
library(GSEABase)
all.sets <- lapply(names(markers.z),
function(x) GeneSet(markers.z[[x]], setName = x))
all.sets <- GeneSetCollection(all.sets)
all.sets
```

```{r aucell}
library(AUCell)
rankings <- AUCell_buildRankings(as.matrix(counts(sce)),
plotStats = FALSE, verbose = FALSE)
cell.aucs <- AUCell_calcAUC(all.sets, rankings)
results <- t(assay(cell.aucs))
head(results)
```

Expand All @@ -443,7 +470,9 @@ they should at least represent closely related cell states).

```{r anno-vs-cluster}
new.labels <- colnames(results)[max.col(results)]
tab <- table(new.labels, sce$celltype.mapped)
tab
```

Expand All @@ -460,6 +489,7 @@ may indicate that its gene set is not sufficiently informative.

```{r auc-dist, results="hide", fig.width = 10, fig.height = 10}
par(mfrow = c(3,3))
AUCell_exploreThresholds(cell.aucs[1:9], plotHist = TRUE, assign = TRUE)
```

Expand Down

0 comments on commit ef7f591

Please sign in to comment.