Merge pull request #61 from ccb-hms/post_sprint

Post sprint
carpentries-incubator · Oct 11, 2024 · 6253ed6 · 6253ed6
2 parents ea710ab + 912bc14
commit 6253ed6
Show file tree

Hide file tree

Showing 4 changed files with 49 additions and 9 deletions.
diff --git a/episodes/cell_type_annotation.Rmd b/episodes/cell_type_annotation.Rmd
@@ -276,6 +276,12 @@ in the [*SingleR*
 book](https://bioconductor.org/books/release/SingleRBook) from which
 most of the examples here are derived.
 
+::: callout
+
+Remember, the quality of reference-based cell type annotation can only be as good as the cell type assignments in the reference. Garbage in, garbage out. In practice, it's worthwhile to spend time carefully assessing your to make sure the original assignments make sense and that it's compatible with the query dataset you're trying to annotate.
+
+:::
+
 Here we take a single sample from `EmbryoAtlasData` as our reference dataset. In practice you would want to take more/all samples, possibly with batch-effect correction (see the next episode).
 
 ```{r ref-data, message = FALSE}

diff --git a/episodes/eda_qc.Rmd b/episodes/eda_qc.Rmd
@@ -425,7 +425,7 @@ Imagine you have data that were prepared by three people with varying level of e
 :::
 
 ::: solution
-Use the `block` argument in the call to `modelGeneVar()` like so:
+Use the `block` argument in the call to `modelGeneVar()` like so. We don't have experimenter information in this dataset, so in order to have some names to work with we assign them randomly from a set of names.
 
 ```{r eval=FALSE}
 
@@ -651,18 +651,18 @@ You can see that we get largely similar results, though for clusters 3 and 9 the
 
 #### Exercise 2: PBMC Data
 
-The package `DropletTestFiles` includes the raw output from Cell Ranger of the peripheral blood mononuclear cell (PBMC) dataset from 10X Genomics, publicly available from the 10X Genomics website. Repeat the analysis of this vignette using those data.
+The [`DropletTestFiles` package](https://www.bioconductor.org/packages/release/data/experiment/html/DropletTestFiles.html) includes the raw output from Cell Ranger of the peripheral blood mononuclear cell (PBMC) dataset from 10X Genomics, publicly available from the 10X Genomics website. Repeat the analysis of this vignette using those data. 
 
-::: solution
+The hint demonstrates how to identify, download, extract, and read the data starting from the help documentation of `?DropletTestFiles::listTestFiles`, but try working through those steps on your own for extra challenge (they're useful skills to develop in practice).
 
-The first few lines here read the data from ExperimentHub and the mitochondrial genes are identified by gene symbols in the row data. Otherwise the steps are the same:
+::: hint
 
-```{r eval = FALSE}
+```{r eval=FALSE}
 library(DropletTestFiles)
 
 set.seed(100)
 
-listTestFiles(dataset="tenx-3.1.0-5k_pbmc_protein_v3") # look up the remote data path of the raw data
+listTestFiles(dataset = "tenx-3.1.0-5k_pbmc_protein_v3") # look up the remote data path of the raw data
 
 raw_rdatapath <- "DropletTestFiles/tenx-3.1.0-5k_pbmc_protein_v3/1.0.0/raw.tar.gz"
 
@@ -676,6 +676,15 @@ untar(paste0(local_path, ".tar.gz"),
 
 sce <- read10xCounts(file.path(dirname(local_path), "raw_feature_bc_matrix/"))
 
+```
+
+:::
+
+::: solution
+
+After getting the data, the steps are largely copy-pasted from above. For the sake of simplicity the mitochondrial genes are identified by gene symbols in the row data. Otherwise the steps are the same:
+
+```{r eval = FALSE}
 e.out <- emptyDrops(counts(sce))
 
 sce <- sce[,which(e.out$FDR <= 0.001)]

diff --git a/instructors/instructor-notes.md b/instructors/instructor-notes.md
@@ -3,6 +3,30 @@ title: 'Instructor Notes'
 ---
 
 
-This workshop is based on the [OSCA tutorial](https://github.com/Bioconductor/ISMB.OSCA) that Davide Risso, Dario Righelli, Marcel Ramos, and myself gave at the ISMB conference last year. The tutorial is a light version of the [OSCA book](https://bioconductor.org/books/release/OSCA/) that concentrates on essential aspects for getting started with the book ("The OSCA book in a day"). The tutorial is in large parts a faithful copy of the OSCA book, but also adds contents that are not (yet) covered in the OSCA book such as interoperability with other popular single-cell analysis ecosystems and accessing data from the Human Cell Atlas.
+This workshop is based on the [OSCA
+tutorial](https://github.com/Bioconductor/ISMB.OSCA) that Davide Risso, Dario
+Righelli, Marcel Ramos, and myself gave at the ISMB conference last year. The
+tutorial is a light version of the [OSCA
+book](https://bioconductor.org/books/release/OSCA/) that concentrates on
+essential aspects for getting started with the book ("The OSCA book in a day").
+The tutorial is in large parts a faithful copy of the OSCA book, but also adds
+contents that are not (yet) covered in the OSCA book such as interoperability
+with other popular single-cell analysis ecosystems and accessing data from the
+Human Cell Atlas.
 
-The learner profiles page covers the pre-requisites including the basics of statistics, R, and molecular biology. Point this out to students in order to help direct those missing large pieces of the foundation toward where they need to go.
+The learner profiles page covers the pre-requisites including the basics of
+statistics, R, and molecular biology. Point this out to students in order to
+help direct those missing large pieces of the foundation toward where they need
+to go.
+
+The exercises in the middle of episodes should be worked through as students
+progress through the lesson unless you're really trying to fly through
+everything at lightning speed.
+
+Some episodes feature additional exercises labelled "Extension challenges" at
+the end. These are generally more difficult exercises that are less about
+repetition of the concepts presented in the lesson and more about critically
+examining issues related to the lesson's topic. They can safely be skipped if
+pressed for time, but may be useful for driving post-lesson discussion or to
+provide additional learning material for more advanced students that reach the
+end of the lesson faster than others.
diff --git a/profiles/learner-profiles.md b/profiles/learner-profiles.md
@@ -8,10 +8,11 @@ This course is aimed at graduate students, postdocs, and faculty interested in l
 * Statistics basics such as hypothesis tests, summary statistics, principal component analysis
 * R basics such as variable assignment, accessing object components, and looking up help documentation
 
-The following online textbooks provide excellent coverage of these and related topics if you would like a refresher:
+The following online textbooks/resources provide excellent coverage of these and related topics if you would like a refresher:
 
 * *Molecular Biology of the Cell*, Alberts et al.
 * [Modern Statistics for Modern Biology](https://www.huber.embl.de/msmb/)
+* Bioconductor Carpentries on [Introduction to data analysis with R](https://carpentries-incubator.github.io/bioc-intro/) and [RNA-seq analysis](https://carpentries-incubator.github.io/bioc-rnaseq/)
 * [R for Data Science (2e)](https://r4ds.hadley.nz/)
 
 The following are *not* required: