From 5f25569b351dfd5d027429dadc84901241d15a33 Mon Sep 17 00:00:00 2001 From: Andrew Ghazi <6763470+andrewGhazi@users.noreply.github.com> Date: Fri, 11 Oct 2024 14:36:41 -0400 Subject: [PATCH 1/6] intstructor note on extension challenges --- instructors/instructor-notes.md | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/instructors/instructor-notes.md b/instructors/instructor-notes.md index 7c75435..6045b80 100644 --- a/instructors/instructor-notes.md +++ b/instructors/instructor-notes.md @@ -3,6 +3,30 @@ title: 'Instructor Notes' --- -This workshop is based on the [OSCA tutorial](https://github.com/Bioconductor/ISMB.OSCA) that Davide Risso, Dario Righelli, Marcel Ramos, and myself gave at the ISMB conference last year. The tutorial is a light version of the [OSCA book](https://bioconductor.org/books/release/OSCA/) that concentrates on essential aspects for getting started with the book ("The OSCA book in a day"). The tutorial is in large parts a faithful copy of the OSCA book, but also adds contents that are not (yet) covered in the OSCA book such as interoperability with other popular single-cell analysis ecosystems and accessing data from the Human Cell Atlas. +This workshop is based on the [OSCA +tutorial](https://github.com/Bioconductor/ISMB.OSCA) that Davide Risso, Dario +Righelli, Marcel Ramos, and myself gave at the ISMB conference last year. The +tutorial is a light version of the [OSCA +book](https://bioconductor.org/books/release/OSCA/) that concentrates on +essential aspects for getting started with the book ("The OSCA book in a day"). +The tutorial is in large parts a faithful copy of the OSCA book, but also adds +contents that are not (yet) covered in the OSCA book such as interoperability +with other popular single-cell analysis ecosystems and accessing data from the +Human Cell Atlas. -The learner profiles page covers the pre-requisites including the basics of statistics, R, and molecular biology. Point this out to students in order to help direct those missing large pieces of the foundation toward where they need to go. +The learner profiles page covers the pre-requisites including the basics of +statistics, R, and molecular biology. Point this out to students in order to +help direct those missing large pieces of the foundation toward where they need +to go. + +The exercises in the middle of episodes should be worked through as students +progress through the lesson unless you're really trying to fly through +everything at lightning speed. + +Some episodes feature additional exercises labelled "Extension challenges" at +the end. These are generally more difficult exercises that are less about +repetition of the concepts presented in the lesson and more about critically +examining issues related to the lesson's topic. They can safely be skipped if +pressed for time, but may be useful for driving post-lesson discussion or to +provide additional learning material for more advanced students that reach the +end of the lesson faster than others. From 955a8801808318cbfe5ac4806a2253905d0d606d Mon Sep 17 00:00:00 2001 From: Andrew Ghazi <6763470+andrewGhazi@users.noreply.github.com> Date: Fri, 11 Oct 2024 14:39:26 -0400 Subject: [PATCH 2/6] suggest other Bioc carpentries --- profiles/learner-profiles.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/profiles/learner-profiles.md b/profiles/learner-profiles.md index 196b791..c3cbdb2 100644 --- a/profiles/learner-profiles.md +++ b/profiles/learner-profiles.md @@ -8,10 +8,11 @@ This course is aimed at graduate students, postdocs, and faculty interested in l * Statistics basics such as hypothesis tests, summary statistics, principal component analysis * R basics such as variable assignment, accessing object components, and looking up help documentation -The following online textbooks provide excellent coverage of these and related topics if you would like a refresher: +The following online textbooks/resources provide excellent coverage of these and related topics if you would like a refresher: * *Molecular Biology of the Cell*, Alberts et al. * [Modern Statistics for Modern Biology](https://www.huber.embl.de/msmb/) +* Bioconductor Carpentries on [Introduction to data analysis with R](https://carpentries-incubator.github.io/bioc-intro/) and [RNA-seq analysis](https://carpentries-incubator.github.io/bioc-rnaseq/) * [R for Data Science (2e)](https://r4ds.hadley.nz/) The following are *not* required: From f38de791655fce2e6eff2e62a9baf68a4d1a3295 Mon Sep 17 00:00:00 2001 From: Andrew Ghazi <6763470+andrewGhazi@users.noreply.github.com> Date: Fri, 11 Oct 2024 14:51:28 -0400 Subject: [PATCH 3/6] pbmc ex hint --- episodes/eda_qc.Rmd | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/episodes/eda_qc.Rmd b/episodes/eda_qc.Rmd index 32df3f5..33d251f 100644 --- a/episodes/eda_qc.Rmd +++ b/episodes/eda_qc.Rmd @@ -651,18 +651,18 @@ You can see that we get largely similar results, though for clusters 3 and 9 the #### Exercise 2: PBMC Data -The package `DropletTestFiles` includes the raw output from Cell Ranger of the peripheral blood mononuclear cell (PBMC) dataset from 10X Genomics, publicly available from the 10X Genomics website. Repeat the analysis of this vignette using those data. +The [`DropletTestFiles` package](https://www.bioconductor.org/packages/release/data/experiment/html/DropletTestFiles.html) includes the raw output from Cell Ranger of the peripheral blood mononuclear cell (PBMC) dataset from 10X Genomics, publicly available from the 10X Genomics website. Repeat the analysis of this vignette using those data. -::: solution +The hint demonstrates how to identify, download, extract, and read the data starting from the help documentation of `?DropletTestFiles::listTestFiles`, but try working through those steps on your own for extra challenge (they're useful skills to develop in practice). -The first few lines here read the data from ExperimentHub and the mitochondrial genes are identified by gene symbols in the row data. Otherwise the steps are the same: +::: hint -```{r eval = FALSE} +```{r eval=FALSE} library(DropletTestFiles) set.seed(100) -listTestFiles(dataset="tenx-3.1.0-5k_pbmc_protein_v3") # look up the remote data path of the raw data +listTestFiles(dataset = "tenx-3.1.0-5k_pbmc_protein_v3") # look up the remote data path of the raw data raw_rdatapath <- "DropletTestFiles/tenx-3.1.0-5k_pbmc_protein_v3/1.0.0/raw.tar.gz" @@ -676,6 +676,15 @@ untar(paste0(local_path, ".tar.gz"), sce <- read10xCounts(file.path(dirname(local_path), "raw_feature_bc_matrix/")) +``` + +::: + +::: solution + +The first few lines here read the data from ExperimentHub and the mitochondrial genes are identified by gene symbols in the row data. Otherwise the steps are the same: + +```{r eval = FALSE} e.out <- emptyDrops(counts(sce)) sce <- sce[,which(e.out$FDR <= 0.001)] From d75eac3a7fd6d10008f09a9310f81237800cba96 Mon Sep 17 00:00:00 2001 From: Andrew Ghazi <6763470+andrewGhazi@users.noreply.github.com> Date: Fri, 11 Oct 2024 14:59:14 -0400 Subject: [PATCH 4/6] reference annotation callout --- episodes/cell_type_annotation.Rmd | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/episodes/cell_type_annotation.Rmd b/episodes/cell_type_annotation.Rmd index b7060f0..c2a8feb 100644 --- a/episodes/cell_type_annotation.Rmd +++ b/episodes/cell_type_annotation.Rmd @@ -276,6 +276,12 @@ in the [*SingleR* book](https://bioconductor.org/books/release/SingleRBook) from which most of the examples here are derived. +::: callout + +Remember, the quality of reference-based cell type annotation can only be as good as the cell type assignments in the reference. Garbage in, garbage out. In practice, it's worthwhile to spend time carefully assessing your to make sure the original assignments make sense and that it's compatible with the query dataset you're trying to annotate. + +::: + Here we take a single sample from `EmbryoAtlasData` as our reference dataset. In practice you would want to take more/all samples, possibly with batch-effect correction (see the next episode). ```{r ref-data, message = FALSE} From eee14c8e3c9546a5c58efa1a471167c6036f7574 Mon Sep 17 00:00:00 2001 From: Andrew Ghazi <6763470+andrewGhazi@users.noreply.github.com> Date: Fri, 11 Oct 2024 14:59:38 -0400 Subject: [PATCH 5/6] exercise solution note --- episodes/eda_qc.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/episodes/eda_qc.Rmd b/episodes/eda_qc.Rmd index 33d251f..34cc2be 100644 --- a/episodes/eda_qc.Rmd +++ b/episodes/eda_qc.Rmd @@ -682,7 +682,7 @@ sce <- read10xCounts(file.path(dirname(local_path), "raw_feature_bc_matrix/")) ::: solution -The first few lines here read the data from ExperimentHub and the mitochondrial genes are identified by gene symbols in the row data. Otherwise the steps are the same: +After getting the data, the steps are largely copy-pasted from above. For the sake of simplicity the mitochondrial genes are identified by gene symbols in the row data. Otherwise the steps are the same: ```{r eval = FALSE} e.out <- emptyDrops(counts(sce)) From 912bc140e884cb6861f79089ec895499326de518 Mon Sep 17 00:00:00 2001 From: Andrew Ghazi <6763470+andrewGhazi@users.noreply.github.com> Date: Fri, 11 Oct 2024 15:01:57 -0400 Subject: [PATCH 6/6] note on names --- episodes/eda_qc.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/episodes/eda_qc.Rmd b/episodes/eda_qc.Rmd index 34cc2be..c251066 100644 --- a/episodes/eda_qc.Rmd +++ b/episodes/eda_qc.Rmd @@ -425,7 +425,7 @@ Imagine you have data that were prepared by three people with varying level of e ::: ::: solution -Use the `block` argument in the call to `modelGeneVar()` like so: +Use the `block` argument in the call to `modelGeneVar()` like so. We don't have experimenter information in this dataset, so in order to have some names to work with we assign them randomly from a set of names. ```{r eval=FALSE}