carpentries-incubator · csmagnano · May 3, 2024 · May 3, 2024
diff --git a/episodes/cell_type_annotation.Rmd b/episodes/cell_type_annotation.Rmd
@@ -23,7 +23,7 @@ exercises: 15 # Minutes of exercises in the lesson
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 
-# Setup
+## Setup
 
 ```{r setup, message = FALSE}
 library(BiocStyle)
@@ -35,7 +35,7 @@ library(scater)
 library(scran)
 ```
 
-# Data retrieval
+## Data retrieval
 
 ```{r data, message = FALSE}
 sce <- WTChimeraData(samples = 5, type = "processed")
@@ -50,14 +50,14 @@ ind <- sample(ncol(sce), 1000)
 sce <- sce[,ind]
 ```
 
-# Preprocessing
+## Preprocessing
 
 ```{r preproc, warning = FALSE}
 sce <- logNormCounts(sce)
 sce <- runPCA(sce)
 ```
 
-# Clustering
+## Clustering
 
 Clustering is an unsupervised learning procedure that is used to empirically 
 define groups of cells with similar expression profiles. 
@@ -104,7 +104,7 @@ sce <- runUMAP(sce, dimred = "PCA")
 plotReducedDim(sce, "UMAP", color_by = "label")
 ```
 
-# Marker gene detection
+## Marker gene detection
 
 To interpret clustering results as obtained in the previous section, we identify
 the genes that drive separation between clusters. These marker genes allow us to 
@@ -156,7 +156,7 @@ top.markers <- head(rownames(markers[[1]]))
 plotExpression(sce, features = top.markers, x = "label", color_by = "label")
 ```
 
-# Cell type annotation
+## Cell type annotation
 
 The most challenging task in scRNA-seq data analysis is arguably the
 interpretation of the results.
@@ -182,7 +182,7 @@ reference datasets where each sample or cell has already been annotated with its
 putative biological state by domain experts.
 Here, we will demonstrate both approaches on the wild-type chimera dataset.
 
-## Assigning cell labels from reference data
+### Assigning cell labels from reference data
 
 A conceptually straightforward annotation approach is to compare the single-cell
 expression profiles with previously annotated reference datasets.
@@ -303,7 +303,7 @@ tab <- table(res$pruned.labels, sce$celltype.mapped)
 pheatmap(log2(tab + 10), color = colorRampPalette(c("white", "blue"))(101))
 ```
 
-## Assigning cell labels from gene sets
+### Assigning cell labels from gene sets
 
 A related strategy is to explicitly identify sets of marker genes that are highly
 expressed in each individual cell.
@@ -397,19 +397,15 @@ a fitted three-component mixture, and the grey curve represents a fitted normal
 distribution. Vertical lines represent threshold estimates corresponding to each
 estimate of the distribution. 
 
-# Session Info
+## Session Info
 
 ```{r sessionInfo}
 sessionInfo()
 ```
 
-# Further Reading
 
-* OSCA book, [Chapters 5-7](https://bioconductor.org/books/release/OSCA.basic/clustering.html)
-* Assigning cell types with SingleR ([the book](https://bioconductor.org/books/release/SingleRBook/)).
-* The [AUCell](https://bioconductor.org/packages/AUCell) package vignette.
 
-# Exercises
+## Exercises
 
 :::::::::::::::::::::::::::::::::: challenge
 
@@ -484,6 +480,15 @@ TODO
 
 :::::::::::::::::::::::::::::::::::::::::::::
 
+:::::::::::::: checklist
+## Further Reading
+
+* OSCA book, [Chapters 5-7](https://bioconductor.org/books/release/OSCA.basic/clustering.html)
+* Assigning cell types with SingleR ([the book](https://bioconductor.org/books/release/SingleRBook/)).
+* The [AUCell](https://bioconductor.org/packages/AUCell) package vignette.
+
+::::::::::::::
+
 ::::::::::::::::::::::::::::::::::::: keypoints 
 
 - TODO

diff --git a/episodes/hca.Rmd b/episodes/hca.Rmd
@@ -18,7 +18,7 @@
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
-# HCA Project
+## HCA Project
 
 The Human Cell Atlas (HCA) is a large project that aims to learn from and map
 every cell type in the human body. The project extracts spatial and molecular
@@ -27,15 +27,15 @@
 ages. There are about 37.2 trillion cells in the human body. To read more about
 the project, head over to their website at https://www.humancellatlas.org.
 
-# CELLxGENE
+## CELLxGENE
 
 CELLxGENE is a database and a suite of tools that help scientists to find,
 download, explore, analyze, annotate, and publish single cell data. It includes
 several analytic and visualization tools to help you to discover single cell
 data patterns. To see the list of tools, browse to
 https://cellxgene.cziscience.com/.
 
-# CELLxGENE | Census
+## CELLxGENE | Census
 
 The Census provides efficient computational tooling to access, query, and
 analyze all single-cell RNA data from CZ CELLxGENE Discover. Using a new access
@@ -44,7 +44,7 @@
 accelerating your research by significantly minimizing data harmonization at
 https://chanzuckerberg.github.io/cellxgene-census/.
 
-# The CuratedAtlasQueryR Project
+## The CuratedAtlasQueryR Project
 
 To systematically characterize the immune system across tissues, demographics
 and multiple studies, single cell transcriptomics data was harmonized from the
@@ -69,9 +69,9 @@
 transcriptional profiles. This platform offers a standardized workflow for
 accessing atlas-level datasets programmatically and reproducibly.
 
 ![](figures/curatedAtlasQuery.png)
 
-# Data Sources in R / Bioconductor
+## Data Sources in R / Bioconductor
 
 There are a few options to access single cell data with R / Bioconductor.
 
@@ -81,7 +81,7 @@
 | [cellxgenedp](https://bioconductor.org/packages/cellxgenedp) | [CellxGene](https://cellxgene.cziscience.com/) | Human and mouse SC data including HCA |
 | [CuratedAtlasQueryR](https://stemangiola.github.io/CuratedAtlasQueryR/) | [CellxGene](https://cellxgene.cziscience.com/) | fine-grained query capable CELLxGENE data including HCA |
 
-# Installation
+## Installation
 
 ```{r, eval=FALSE}
 if (!requireNamespace("BiocManager", quietly = TRUE))
@@ -90,14 +90,14 @@
 BiocManager::install("CuratedAtlasQueryR")
 ```
 
-# Package load 
+## Package load 
 
 ```{r, include = TRUE, results = "hide", message = FALSE, warning = FALSE}
 library(CuratedAtlasQueryR)
 library(dplyr)
 ```
 
-# HCA Metadata
+## HCA Metadata
 
 The metadata allows the user to get a lay of the land of what is available
 via the package. In this example, we are using the sample database URL which
@@ -115,7 +115,7 @@
   glimpse()
 ```
 
-# A note on the piping operator
+## A note on the piping operator
 
 The vignette materials provided by `CuratedAtlasQueryR` show the use of the
 'native' R pipe (implemented after R version `4.1.0`). For those not familiar
@@ -136,7 +136,7 @@
   aggregate(. ~ Species, data = _, mean)
 ```
 
-# Summarizing the metadata
+## Summarizing the metadata
 
 For each distinct tissue and dataset combination, count the number of datasets
 by tissue type. 
@@ -147,36 +147,36 @@
   count(tissue)
 ```
 
-# Columns available in the metadata
+## Columns available in the metadata
 
 ```{r, message = FALSE}
 head(names(metadata), 10)
 ```
 
-# Available assays
+## Available assays
 
 ```{r}
 metadata |>
     distinct(assay, dataset_id) |>
     count(assay)
 ```
 
-# Available organisms
+## Available organisms
 
 ```{r}
 metadata |>
     distinct(organism, dataset_id) |>
     count(organism)
 ```
 
-## Download single-cell RNA sequencing counts 
+### Download single-cell RNA sequencing counts 
 
 The data can be provided as either "counts" or counts per million "cpm" as given
 by the `assays` argument in the `get_single_cell_experiment()` function. By
 default, the `SingleCellExperiment` provided will contain only the 'counts'
 data.
 
-### Query raw counts
+#### Query raw counts
 
 ```{r, message = FALSE}
 single_cell_counts <- 
@@ -192,7 +192,7 @@
 single_cell_counts
 ```
 
-### Query counts scaled per million
+#### Query counts scaled per million
 
 This is helpful if just few genes are of interest, as they can be compared
 across samples.
@@ -208,7 +208,7 @@
   get_single_cell_experiment(assays = "cpm")
 ```
 
-### Extract only a subset of genes
+#### Extract only a subset of genes
 
 ```{r, message = FALSE}
 single_cell_counts <-
@@ -224,7 +224,7 @@
 single_cell_counts
 ```
 
-### Extracting counts as a Seurat object
+#### Extracting counts as a Seurat object
 
 If needed, the H5 `SingleCellExperiment` can be converted into a Seurat object.
 Note that it may take a long time and use a lot of memory depending on how many
@@ -244,9 +244,9 @@
 single_cell_counts
 ```
 
-## Save your `SingleCellExperiment`
+### Save your `SingleCellExperiment`
 
-### Saving as HDF5 
+#### Saving as HDF5 
 
 The recommended way of saving these `SingleCellExperiment` objects, if
 necessary, is to use `saveHDF5SummarizedExperiment` from the `HDF5Array`
@@ -256,7 +256,7 @@
 single_cell_counts |> saveHDF5SummarizedExperiment("single_cell_counts")
 ```
 
-# Exercises
+## Exercises
 
 :::::::::::::::::::::::::::::::::: challenge
 

diff --git a/episodes/intro-sce.Rmd b/episodes/intro-sce.Rmd
@@ -20,7 +20,7 @@
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
-# Setup
+## Setup
 
 ```{r setup, message = FALSE, warning=FALSE}
 library(SummarizedExperiment)
@@ -29,17 +29,17 @@
 library(BiocStyle)
 ```
 
-# Bioconductor
+## Bioconductor
 
-## Overview 
+### Overview 
 
 Within the R ecosystem, the Bioconductor project provides tools for the analysis and comprehension of high-throughput genomics data.
 The scope of the project covers microarray data, various forms of sequencing (RNA-seq, ChIP-seq, bisulfite, genotyping, etc.), proteomics, flow cytometry and more.
 One of Bioconductor's main selling points is the use of common data structures to promote interoperability between packages,
 allowing code written by different people (from different organizations, in different countries) to work together seamlessly in complex analyses. 
 By extending R to genomics, Bioconductor serves as a powerful addition to the computational biologist's toolkit.
 
-## Installing Bioconductor Packages
+### Installing Bioconductor Packages
 
 The default repository for R packages is the [Comprehensive R Archive Network](https://cran.r-project.org/mirrors.html) (CRAN), which is home to over 13,000 different R packages. 
 We can easily install packages from CRAN - say, the popular `r CRANpkg("ggplot2")` package for data visualization - by opening up R and typing in:
@@ -78,7 +78,7 @@
 Packages only need to be installed once, and then they are available for all subsequent uses of a particular R installation.
 There is no need to repeat the installation every time we start R.
 
-## Finding relevant packages
+### Finding relevant packages
 
 To find relevant Bioconductor packages, one useful resource is the [BiocViews](https://bioconductor.org/packages/release/BiocViews.html) page.
 This provides a hierarchically organized view of annotations associated with each Bioconductor package.
@@ -87,7 +87,7 @@
 CRAN uses the similar concept of ["Task views"](https://cran.r-project.org/web/views/), though this is understandably more general than genomics.
 For example, the [Cluster task view page](https://cran.r-project.org/web/views/Cluster.html) lists an assortment of packages that are relevant to cluster analyses.
 
-## Staying up to date
+### Staying up to date
 
 Updating all R/Bioconductor packages is as simple as running `BiocManager::install()` without any arguments.
 This will check for more recent versions of each package (within a Bioconductor release) and prompt the user to update if any are available.
@@ -96,7 +96,7 @@
 BiocManager::install()
 ```
 
-# The `SingleCellExperiment` class
+## The `SingleCellExperiment` class
 
 One of the main strengths of the Bioconductor project lies in the use of a common data infrastructure that powers interoperability across packages. 
 
@@ -110,7 +110,7 @@
 
 Let's start with an example dataset.
 
-```{r, message = FALSE}
+```{r, message = FALSE, warning=FALSE}
 sce <- WTChimeraData(samples=5)
 sce
 ```
@@ -121,7 +121,7 @@
 
 Depending on the object, slots can contain different types of data (e.g., numeric matrices, lists, etc.). We will here review the main slots of the SingleCellExperiment class as well as their getter/setter methods.
 
-## The `assays`
+### The `assays`
 
 This is arguably the most fundamental part of the object that contains the count matrix, and potentially other matrices with transformed data. We can access the _list_ of matrices with the `assays` function and individual matrices with the `assay` function. If one of these matrices is called "counts", we can use the special `counts` getter (and the analogous `logcounts`).
 
@@ -132,7 +132,7 @@
 
 You will notice that in this case we have a sparse matrix of class "dgTMatrix" inside the object. More generally, any "matrix-like" object can be used, e.g., dense matrices or HDF5-backed matrices (see "Working with large data").
 
-## The `colData` and `rowData`
+### The `colData` and `rowData`
 
 Conceptually, these are two data frames that annotate the columns and the rows of your assay, respectively.
 
@@ -151,9 +151,9 @@
 colData(sce)
 ```
 
-## The `reducedDims`
+### The `reducedDims`
 
 Everything that we have described so far (except for the `counts` getter) is part of the `SummarizedExperiment` class that SingleCellExperiment extends. You can find a complete lesson on the `SummarizedExperiment` class [here](https://carpentries-incubator.github.io/bioc-intro/60-next-steps.html).
 
 One of the peculiarity of SingleCellExperiment is its ability to store reduced dimension matrices within the object. These may include PCA, t-SNE, UMAP, etc.

@@ -196,7 +196,7 @@
 
 :::::::::::::: checklist
 
-# Further Reading
+## Further Reading
 
 * OSCA book, [Introduction](https://bioconductor.org/books/release/OSCA.intro)