carpentries-incubator · andrewGhazi · Sep 30, 2024 · Sep 30, 2024
diff --git a/episodes/cell_type_annotation.Rmd b/episodes/cell_type_annotation.Rmd
@@ -572,47 +572,52 @@ of 0.5.
 :::
 
 ::: solution
-TODO
-:::
-:::
-
-::: challenge
-#### Exercise 2: Cluster annotation
+```{r}
+arg_list <- list(objective_function = "modularity",
+                 resolution_parameter = .5)
 
-Another strategy for annotating the clusters is to perform a gene set
-enrichment analysis on the marker genes defining each cluster. This
-identifies the pathways and processes that are (relatively) active in
-each cluster based on upregulation of the associated genes compared to
-other clusters. Focus on the top 100 up-regulated genes in a cluster of
-your choice and perform a gene set enrichment analysis of biological
-process (BP) gene sets from the Gene Ontology (GO).
+sce$leiden_clust <- clusterCells(sce, use.dimred = "PCA",
+                               BLUSPARAM = NNGraphParam(cluster.fun = "leiden", 
+                                                        cluster.args = arg_list))
 
-::: hint
-Use the `goana()` function from the `r Biocpkg("limma")` package to
-identify GO BP terms that are overrepresented in the list of marker
-genes.
-:::
+plotReducedDim(sce, "UMAP", color_by = "leiden_clust")
+```
 
-::: solution
-TODO
 :::
 :::
 
 ::: challenge
-#### Exercise 3: Workflow
-
-The [scRNAseq](https://bioconductor.org/packages/scRNAseq) package
-provides gene-level counts for a collection of public scRNA-seq
-datasets, stored as `SingleCellExperiment` objects with annotated cell-
-and gene-level metadata. Consult the vignette of the
-[scRNAseq](https://bioconductor.org/packages/scRNAseq) package to
-inspect all available datasets and select a dataset of your choice.
-Perform a typical scRNA-seq analysis on this dataset including QC,
-normalization, feature selection, dimensionality reduction, clustering,
-and marker gene detection.
+#### Exercise 2: Reference marker genes
+
+Identify the marker genes in the reference single cell experiment, using the `celltype` labels that come with the dataset as the groups. Compare the top 100 marker genes of two cell types that are close in UMAP space. Do they share similar marker sets?
 
 ::: solution
-TODO
+
+```{r}
+markers <- scoreMarkers(ref, groups = ref$celltype)
+
+markers
+
+# It comes with UMAP precomputed too
+plotReducedDim(ref, dimred = "umap", color_by = "celltype") 
+
+# Repetitive work -> write a function
+order_marker_df <- function(m_df, n = 100) {
+
+  ord <- order(m_df$mean.AUC, decreasing = TRUE)
+
+  rownames(m_df[ord,][1:n,])
+}
+
+x <- order_marker_df(markers[["Erythroid2"]])
+
+y <- order_marker_df(markers[["Erythroid3"]])
+
+length(intersect(x,y)) / 100
+```
+
+Turns out there's pretty substantial overlap between `Erythroid2` and `Erythroid3`. It would also be interesting to plot the expression of the set difference to confirm that the remainder are the the genes used to distinguish these two types from each other.
+
 :::
 :::