markdown source builds

Auto-generated via {sandpaper} Source : adc26ed Branch : main Author : Andrew Ghazi <[email protected]> Time : 2024-10-02 14:08:17 +0000 Message : Merge pull request #48 from ccb-hms/ex_updates Ex updates
carpentries-incubator · Oct 2, 2024 · e9b09f2 · e9b09f2
1 parent 7dc4aa9
commit e9b09f2
Show file tree

Hide file tree

Showing 7 changed files with 20 additions and 914 deletions.
diff --git a/cell_type_annotation.md b/cell_type_annotation.md
diff --git a/eda_qc.md b/eda_qc.md
@@ -106,9 +106,9 @@ The distribution of total counts (called the unique molecular identifier or UMI
 A simple approach would be to apply a threshold on the total count to only retain those barcodes with large totals. However, this may unnecessarily discard libraries derived from cell types with low RNA content.
 
 ::: callout
-Depending on your data source, identifying and discarding empty droplets may not be necessary. Some academic institutions have research cores dedicated to single cell work that perform the sample preparation and sequencing. Many of these cores will also perform empty droplet filtering and other initial QC steps. If the sequencing outputs were provided to you by someone else, make sure to communicate with them about what pre-processing steps have been performed, if any.
+Depending on your data source, identifying and discarding empty droplets may not be necessary. Some academic institutions have research cores dedicated to single cell work that perform the sample preparation and sequencing. Many of these cores will also perform empty droplet filtering and other initial QC steps. Specific details on the steps in common pipelines like [10x Genomics' CellRanger](https://www.10xgenomics.com/support/software/cell-ranger/latest/tutorials) can usually be found in the documentation that came with the sequencing material. 
 
-<!-- TODO: cite official 10x CellRanger docs -->
+The main point is: if the sequencing outputs were provided to you by someone else, make sure to communicate with them about what pre-processing steps have been performed, if any. 
 :::
 
 :::: challenge 

diff --git a/fig/cell_type_annotation-rendered-auc-dist2-1.png b/fig/cell_type_annotation-rendered-auc-dist2-1.png
diff --git a/fig/cell_type_annotation-rendered-unnamed-chunk-7-1.png b/fig/cell_type_annotation-rendered-unnamed-chunk-7-1.png
diff --git a/fig/cell_type_annotation-rendered-unnamed-chunk-8-1.png b/fig/cell_type_annotation-rendered-unnamed-chunk-8-1.png
diff --git a/intro-sce.md b/intro-sce.md
@@ -116,6 +116,17 @@ This class implements a data structure that stores all aspects of our single-cel
 
 <img src="http://bioconductor.org/books/release/OSCA.intro/images/SingleCellExperiment.png" style="display: block; margin: auto;" />
 
+:::: spoiler
+
+### Before `SingleCellExperiment`
+
+Before `SingleCellExperiment`, coders working with single cell data would sometimes keep all of these components in separate objects e.g. a matrix of counts, a data.frame of sample metadata, a data.frame of gene annotations and so on. There were two main disadvantages to this sort of "from scratch" approach:
+
+1. Tons of book-keeping. If you performed a QC step that removed dead cells, you also had to remember to remove that same set of cells from the cell-wise metadata. Un-expressed genes were dropped? Don't forget to filter the gene metadata table too. 
+2. All the downstream steps had to be "from scratch" as well. All the data munging, analysis, and visualization code had to be customized to the idiosyncrasies of a given input set.
+
+::::
+
 Let's look at an example dataset. `WTChimeraData` comes from a study on mouse development. We can assign one sample to a `SingleCellExperiment` object named `sce` like so:
 
 
@@ -261,7 +272,7 @@ Here we add a column called "conservation" that is just an integer sequence from
 
 
 ``` r
-rowData(sce)$conservation = 1:nrow(sce)
+rowData(sce)$conservation = rnorm(nrow(sce))
 ```
 
 This is just a made-up example with a simple sequence of numbers, but in practice its convenient to store any sort of gene-wise information in the columns of the rowData.
@@ -384,21 +395,6 @@ altExpNames(0):
 
 :::::::::::::::::::::::::::::::::::::::::::::
 
-:::: challenge
-
-#### Extension Challenge 1
-
-Before `SingleCellExperiment`, coders working with single cell data would sometimes keep all of these components in separate objects e.g. a matrix of counts, a data.frame of sample metadata, a data.frame of gene annotations and so on. What are the main disadvantages of this sort of "from scratch" approach?
-
-::: solution
-
-1. You have to do tons of book-keeping! If you perform a QC step that removes dead cells, now you also have to remember to remove that same set of cells from the cell-wise metadata. Dropped un-expressed genes? Don't forget to filter the gene metadata table too. 
-
-2. All the downstream steps have to be "from scratch" as well! All the data munging, analysis, and visualization code has to be customized to the idiosyncrasies of your input. Agh!
-
-:::
-
-::::
 
 :::::::::::::: checklist
 

diff --git a/md5sum.txt b/md5sum.txt
@@ -4,9 +4,9 @@
 "config.yaml" "b0d664d3d6abdd0e98b16282e1c03107" "site/built/config.yaml" "2024-09-24"
 "index.md" "495939ddd3f110be3bbcd49b60f4a7ce" "site/built/index.md" "2024-09-24"
 "links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-09-24"
-"episodes/intro-sce.Rmd" "709fc538c9872b9494fa37f1059ea4a0" "site/built/intro-sce.md" "2024-10-02"
-"episodes/eda_qc.Rmd" "b4800ddfe2d5deb5047311658f254e6d" "site/built/eda_qc.md" "2024-10-02"
-"episodes/cell_type_annotation.Rmd" "5bd585c6e4c6fc09a7443ce4da35899f" "site/built/cell_type_annotation.md" "2024-10-02"
+"episodes/intro-sce.Rmd" "b704934867b22d804de1e0fa0a9600eb" "site/built/intro-sce.md" "2024-10-02"
+"episodes/eda_qc.Rmd" "b21851ec0c9912dd9cb8a0cf230bcf9d" "site/built/eda_qc.md" "2024-10-02"
+"episodes/cell_type_annotation.Rmd" "8fab6e0cbb60d6fe6a67a0004c5ce5ab" "site/built/cell_type_annotation.md" "2024-10-02"
 "episodes/multi-sample.Rmd" "4711a38fd8b29961424215dd17fb7528" "site/built/multi-sample.md" "2024-09-30"
 "episodes/large_data.Rmd" "f19fa53e9e63d4cb8fe0f6ab61c8fc3a" "site/built/large_data.md" "2024-10-02"
 "episodes/hca.Rmd" "3f2af9dc9e53fd617512a37db87f20a7" "site/built/hca.md" "2024-10-02"