Skip to content

Commit

Permalink
easier singleR exercise
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewGhazi committed Sep 25, 2024
1 parent 88d94c8 commit 02f33c6
Showing 1 changed file with 53 additions and 19 deletions.
72 changes: 53 additions & 19 deletions episodes/cell_type_annotation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -167,8 +167,6 @@ markers <- scoreMarkers(sce)
markers
```

<!-- TODO change this ^ to scoreMarkers() -->

The resulting object contains a sorted marker gene list for each
cluster, in which the top genes are those that contribute the most to
the separation of that cluster from all other clusters.
Expand Down Expand Up @@ -218,7 +216,7 @@ You can see that at least among the top markers, cluster 6 (pale green) tends to
plotReducedDim(sce, "UMAP", color_by = "label")
```

Looking at the UMAP again, we can see that the marker gene overlap of clusters 1 and 6 makes sense. They're right next to each other on the UMAP. They're probably very closely related cell types, and a less granular clustering would probably lump them together.
Looking at the UMAP again, we can see that the marker gene overlap of clusters 1 and 6 makes sense. They're right next to each other on the UMAP. They're probably closely related cell types, and a less granular clustering would probably lump them together.

:::

Expand Down Expand Up @@ -278,6 +276,8 @@ in the [*SingleR*
book](https://bioconductor.org/books/release/SingleRBook) from which
most of the examples here are derived.

Here we take a single sample from `EmbryoAtlasData` as our reference dataset. In practice you would want to take more/all samples, possibly with batch-effect correction (see the next episode).

```{r ref-data, message = FALSE}
ref <- EmbryoAtlasData(samples = 29)
Expand All @@ -298,6 +298,7 @@ You can see we have an assortment of different cell types in the reference (with

```{r ref-celltypes}
tab <- sort(table(ref$celltype), decreasing = TRUE)
tab
```

Expand Down Expand Up @@ -398,28 +399,15 @@ pheatmap(log2(tab + 10), color = colorRampPalette(c("white", "blue"))(101))

:::: challenge

SingleR can be computationally expensive. How do you set it to run in parallel?
Assign the SingleR annotations as a column in the colData for the query object `sce`.

::: solution

Use `BiocParallel` and the `BPPARAM` argument! This example will set it to use four cores on your laptop, but you can also configure BiocParallel to use cluster jobs.

```{r eval=FALSE, echo = TRUE}
library(BiocParallel)
my_bpparam = MulticoreParam(workers = 4)
res <- SingleR(test = sce.mat,
ref = ref.mat,
labels = ref$celltype,
BPPARAM = my_bpparam)
```{r}
sce$SingleR_label = res$pruned.labels
```

`BiocParallel` is the most common way to enable parallel computation in Bioconductor packages, so you can expect to see it elsewhere outside of SingleR.

:::

::::

### Assigning cell labels from gene sets
Expand All @@ -440,6 +428,25 @@ markers.z <- getTopMarkers(wilcox.z$statistics, wilcox.z$pairs,
lengths(markers.z)
```

<!---
This version with scoreMarkers() produces worse looking diagnostics, so let's leave it with the pairwise Wilcox version.
```{r atlas-markers}
ref_markers <- scoreMarkers(ref, groups = ref$celltype, lfc = 1)
get_top_markers <- function(marker_df, n = 100) {
ord <- order(marker_df$mean.AUC, decreasing = TRUE)
rownames(marker_df[ord,])[1:n]
}
markers.z <- lapply(ref_markers, get_top_markers)
```
-->

Our test dataset will be as before the wild-type chimera dataset.

```{r wt-sce}
Expand Down Expand Up @@ -622,6 +629,33 @@ Generally, it's good to keep in mind that the concept of "everything else" is no
:::
::::

:::: challenge

#### Extension Challenge 2z
SingleR can be computationally expensive. How do you set it to run in parallel?

::: solution

Use `BiocParallel` and the `BPPARAM` argument! This example will set it to use four cores on your laptop, but you can also configure BiocParallel to use cluster jobs.

```{r eval=FALSE, echo = TRUE}
library(BiocParallel)
my_bpparam = MulticoreParam(workers = 4)
res2 <- SingleR(test = sce.mat,
ref = ref.mat,
labels = ref$celltype,
BPPARAM = my_bpparam)
```

`BiocParallel` is the most common way to enable parallel computation in Bioconductor packages, so you can expect to see it elsewhere outside of SingleR.

:::

::::

::: checklist
## Further Reading

Expand Down

0 comments on commit 02f33c6

Please sign in to comment.