From 978ada2ca9520c91b95c20e1163c820551235509 Mon Sep 17 00:00:00 2001 From: lgeistlinger Date: Fri, 19 Apr 2024 14:28:03 -0400 Subject: [PATCH] key points for large data and HCA sessions --- episodes/hca.Rmd | 9 +++++---- episodes/intro-sce.Rmd | 6 +++--- episodes/large_data.Rmd | 25 ++++++++++++++++--------- 3 files changed, 24 insertions(+), 16 deletions(-) diff --git a/episodes/hca.Rmd b/episodes/hca.Rmd index 5f7fcef..6be8def 100644 --- a/episodes/hca.Rmd +++ b/episodes/hca.Rmd @@ -6,15 +6,15 @@ exercises: 10 # Minutes of exercises in the lesson :::::::::::::::::::::::::::::::::::::: questions -- How to obtain comprehensive single-cell reference maps from the Human Cell Atlas? +- How to obtain single-cell reference maps from the Human Cell Atlas? :::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::: objectives - Learn about different resources for public single-cell RNA-seq data. -- Learn how to access data from the Human Cell Atlas using the CuratedAtlasQueryR package. -- Learn how to query for cells of interest and how to download them into a SingleCellExperiment object. +- Learn how to access data from the Human Cell Atlas using the `CuratedAtlasQueryR` package. +- Learn how to query for cells of interest and how to download them into a `SingleCellExperiment` object. :::::::::::::::::::::::::::::::::::::::::::::::: @@ -350,7 +350,8 @@ metadata |> ::::::::::::::::::::::::::::::::::::: keypoints -- TODO +- The `CuratedAtlasQueryR` package provides programmatic access to single-cell reference maps from the Human Cell Atlas. +- The package provides functionality to query for cells of interest and to download them into a `SingleCellExperiment` object. :::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/episodes/intro-sce.Rmd b/episodes/intro-sce.Rmd index 5f8b760..baa3995 100644 --- a/episodes/intro-sce.Rmd +++ b/episodes/intro-sce.Rmd @@ -8,7 +8,7 @@ exercises: 10 # Minutes of exercises in the lesson - What is Bioconductor? - How is single-cell data stored in the Bioconductor ecosystem? -- What is a `SingleCellObject`? +- What is a `SingleCellExperiment` object? :::::::::::::::::::::::::::::::::::::::::::::::: @@ -207,9 +207,9 @@ TODO ::::::::::::::::::::::::::::::::::::: keypoints -- Bioconductor is a project provide support and packages for the comprehension of high high-throughput biology data. +- Bioconductor is a project that provides open-source software packages for the comprehension of high-throughput biology data. - A `SingleCellExperiment` object is an extension of the `SummarizedExperiment` object. -- `SingleCellExperiment` objects contain specialized data fields for storing data unique to single cell analyses, such as the `reducedDims` field. +- `SingleCellExperiment` objects contain specialized data fields for storing data unique to single-cell analyses, such as the `reducedDims` field. :::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/episodes/large_data.Rmd b/episodes/large_data.Rmd index 77c45aa..92b42a1 100644 --- a/episodes/large_data.Rmd +++ b/episodes/large_data.Rmd @@ -6,13 +6,18 @@ exercises: 2 # Minutes of exercises in the lesson :::::::::::::::::::::::::::::::::::::: questions -- TODO +- How to work with single-cell datasets that are too large to fit in memory? +- How to speed up single-cell analysis workflows for large datasets? +- How to convert between popular single-cell data formats? :::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::: objectives -- TODO +- Learn how to work with out-of-memory data representations such as HDF5. +- Learn how to speed up single-cell analysis with parallel computation. +- Learn how to invoke fast approximations for essential analysis steps. +- Learn how to convert SingleCellExperiment objects to SeuratObjects and AnnData objects. :::::::::::::::::::::::::::::::::::::::::::::::: @@ -60,7 +65,7 @@ We demonstrate with a subset of 20,000 cells from the 1.3 million brain cell data set, as provided by the [TENxBrainData](https://bioconductor.org/packages/TENxBrainData) package. -```{r tenx-brain, message = FALSE} +```{r tenx-brain, message = FALSE, warning = FALSE} library(TENxBrainData) sce.brain <- TENxBrainData20k() sce.brain @@ -104,7 +109,7 @@ using [beachmat](https://bioconductor.org/packages/beachmat). For example, we compute QC metrics below with the same `calculateQCMetrics()` function that we used in the other workflows. -```{r} +```{r, message = FALSE} library(scater) is.mito <- grepl("^mt-", rowData(sce.brain)$Symbol) qcstats <- perCellQCMetrics(sce.brain, subsets = list(Mt = is.mito)) @@ -344,8 +349,7 @@ We therefore need to first install the from GitHub only. ```{r, eval = FALSE} -#BiocManager::install("satijalab/seurat-data") -BiocManager::install("lgeistlinger/SeuratData") +BiocManager::install("satijalab/seurat-data") ``` We then proceed by loading all required packages and installing the PBMC dataset: @@ -417,7 +421,7 @@ The `readH5AD()` function can be used to read a `SingleCellExperiment` from an H5AD file. Here, we use an example H5AD file contained in the `r Biocpkg("zellkonverter")` package. -```{r read-h5ad} +```{r read-h5ad, message = FALSE, warning = FALSE} example_h5ad <- system.file("extdata", "krumsiek11.h5ad", package = "zellkonverter") readH5AD(example_h5ad) @@ -427,7 +431,7 @@ We can also write a `SingleCellExperiment` to an H5AD file with the `writeH5AD()` function. This is demonstrated below on the wild-type chimera mouse gastrulation dataset. -```{r write-h5ad} +```{r write-h5ad, message = FALSE} out.file <- tempfile(pattern = ".h5ad") writeH5AD(sce, file = out.file) ``` @@ -522,7 +526,10 @@ Use Seurat's `DimPlot` function. ::::::::::::::::::::::::::::::::::::: keypoints -- TODO +- Out-of-memory representations can be used to work with single-cell datasets that are too large to fit in memory +- Parallelization of calculations across genes or cells is an effective strategy for speeding up analysis of large single-cell datasets +- Fast approximations for nearest neighbor search and singular value composition can speed up essential steps of single-cell analysis with minimal loss of accuracy +- Converter functions between existing single-cell data formats enable analysis workflows that leverage complementary functionality from poplular single-cell analysis ecosystems ::::::::::::::::::::::::::::::::::::::::::::::::