Skip to content

Commit

Permalink
Merge pull request #61 from ccb-hms/post_sprint
Browse files Browse the repository at this point in the history
Post sprint
  • Loading branch information
andrewGhazi authored Oct 11, 2024
2 parents ea710ab + 912bc14 commit 6253ed6
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 9 deletions.
6 changes: 6 additions & 0 deletions episodes/cell_type_annotation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,12 @@ in the [*SingleR*
book](https://bioconductor.org/books/release/SingleRBook) from which
most of the examples here are derived.

::: callout

Remember, the quality of reference-based cell type annotation can only be as good as the cell type assignments in the reference. Garbage in, garbage out. In practice, it's worthwhile to spend time carefully assessing your to make sure the original assignments make sense and that it's compatible with the query dataset you're trying to annotate.

:::

Here we take a single sample from `EmbryoAtlasData` as our reference dataset. In practice you would want to take more/all samples, possibly with batch-effect correction (see the next episode).

```{r ref-data, message = FALSE}
Expand Down
21 changes: 15 additions & 6 deletions episodes/eda_qc.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ Imagine you have data that were prepared by three people with varying level of e
:::

::: solution
Use the `block` argument in the call to `modelGeneVar()` like so:
Use the `block` argument in the call to `modelGeneVar()` like so. We don't have experimenter information in this dataset, so in order to have some names to work with we assign them randomly from a set of names.

```{r eval=FALSE}
Expand Down Expand Up @@ -651,18 +651,18 @@ You can see that we get largely similar results, though for clusters 3 and 9 the

#### Exercise 2: PBMC Data

The package `DropletTestFiles` includes the raw output from Cell Ranger of the peripheral blood mononuclear cell (PBMC) dataset from 10X Genomics, publicly available from the 10X Genomics website. Repeat the analysis of this vignette using those data.
The [`DropletTestFiles` package](https://www.bioconductor.org/packages/release/data/experiment/html/DropletTestFiles.html) includes the raw output from Cell Ranger of the peripheral blood mononuclear cell (PBMC) dataset from 10X Genomics, publicly available from the 10X Genomics website. Repeat the analysis of this vignette using those data.

::: solution
The hint demonstrates how to identify, download, extract, and read the data starting from the help documentation of `?DropletTestFiles::listTestFiles`, but try working through those steps on your own for extra challenge (they're useful skills to develop in practice).

The first few lines here read the data from ExperimentHub and the mitochondrial genes are identified by gene symbols in the row data. Otherwise the steps are the same:
::: hint

```{r eval = FALSE}
```{r eval=FALSE}
library(DropletTestFiles)
set.seed(100)
listTestFiles(dataset="tenx-3.1.0-5k_pbmc_protein_v3") # look up the remote data path of the raw data
listTestFiles(dataset = "tenx-3.1.0-5k_pbmc_protein_v3") # look up the remote data path of the raw data
raw_rdatapath <- "DropletTestFiles/tenx-3.1.0-5k_pbmc_protein_v3/1.0.0/raw.tar.gz"
Expand All @@ -676,6 +676,15 @@ untar(paste0(local_path, ".tar.gz"),
sce <- read10xCounts(file.path(dirname(local_path), "raw_feature_bc_matrix/"))
```

:::

::: solution

After getting the data, the steps are largely copy-pasted from above. For the sake of simplicity the mitochondrial genes are identified by gene symbols in the row data. Otherwise the steps are the same:

```{r eval = FALSE}
e.out <- emptyDrops(counts(sce))
sce <- sce[,which(e.out$FDR <= 0.001)]
Expand Down
28 changes: 26 additions & 2 deletions instructors/instructor-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,30 @@ title: 'Instructor Notes'
---


This workshop is based on the [OSCA tutorial](https://github.com/Bioconductor/ISMB.OSCA) that Davide Risso, Dario Righelli, Marcel Ramos, and myself gave at the ISMB conference last year. The tutorial is a light version of the [OSCA book](https://bioconductor.org/books/release/OSCA/) that concentrates on essential aspects for getting started with the book ("The OSCA book in a day"). The tutorial is in large parts a faithful copy of the OSCA book, but also adds contents that are not (yet) covered in the OSCA book such as interoperability with other popular single-cell analysis ecosystems and accessing data from the Human Cell Atlas.
This workshop is based on the [OSCA
tutorial](https://github.com/Bioconductor/ISMB.OSCA) that Davide Risso, Dario
Righelli, Marcel Ramos, and myself gave at the ISMB conference last year. The
tutorial is a light version of the [OSCA
book](https://bioconductor.org/books/release/OSCA/) that concentrates on
essential aspects for getting started with the book ("The OSCA book in a day").
The tutorial is in large parts a faithful copy of the OSCA book, but also adds
contents that are not (yet) covered in the OSCA book such as interoperability
with other popular single-cell analysis ecosystems and accessing data from the
Human Cell Atlas.

The learner profiles page covers the pre-requisites including the basics of statistics, R, and molecular biology. Point this out to students in order to help direct those missing large pieces of the foundation toward where they need to go.
The learner profiles page covers the pre-requisites including the basics of
statistics, R, and molecular biology. Point this out to students in order to
help direct those missing large pieces of the foundation toward where they need
to go.

The exercises in the middle of episodes should be worked through as students
progress through the lesson unless you're really trying to fly through
everything at lightning speed.

Some episodes feature additional exercises labelled "Extension challenges" at
the end. These are generally more difficult exercises that are less about
repetition of the concepts presented in the lesson and more about critically
examining issues related to the lesson's topic. They can safely be skipped if
pressed for time, but may be useful for driving post-lesson discussion or to
provide additional learning material for more advanced students that reach the
end of the lesson faster than others.
3 changes: 2 additions & 1 deletion profiles/learner-profiles.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ This course is aimed at graduate students, postdocs, and faculty interested in l
* Statistics basics such as hypothesis tests, summary statistics, principal component analysis
* R basics such as variable assignment, accessing object components, and looking up help documentation

The following online textbooks provide excellent coverage of these and related topics if you would like a refresher:
The following online textbooks/resources provide excellent coverage of these and related topics if you would like a refresher:

* *Molecular Biology of the Cell*, Alberts et al.
* [Modern Statistics for Modern Biology](https://www.huber.embl.de/msmb/)
* Bioconductor Carpentries on [Introduction to data analysis with R](https://carpentries-incubator.github.io/bioc-intro/) and [RNA-seq analysis](https://carpentries-incubator.github.io/bioc-rnaseq/)
* [R for Data Science (2e)](https://r4ds.hadley.nz/)

The following are *not* required:
Expand Down

0 comments on commit 6253ed6

Please sign in to comment.