Skip to content

Commit

Permalink
markdown source builds
Browse files Browse the repository at this point in the history
Auto-generated via {sandpaper}
Source  : adc26ed
Branch  : main
Author  : Andrew Ghazi <[email protected]>
Time    : 2024-10-02 14:08:17 +0000
Message : Merge pull request #48 from ccb-hms/ex_updates

Ex updates
  • Loading branch information
actions-user committed Oct 2, 2024
1 parent 7dc4aa9 commit e9b09f2
Show file tree
Hide file tree
Showing 7 changed files with 20 additions and 914 deletions.
896 changes: 3 additions & 893 deletions cell_type_annotation.md

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions eda_qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,9 +106,9 @@ The distribution of total counts (called the unique molecular identifier or UMI
A simple approach would be to apply a threshold on the total count to only retain those barcodes with large totals. However, this may unnecessarily discard libraries derived from cell types with low RNA content.

::: callout
Depending on your data source, identifying and discarding empty droplets may not be necessary. Some academic institutions have research cores dedicated to single cell work that perform the sample preparation and sequencing. Many of these cores will also perform empty droplet filtering and other initial QC steps. If the sequencing outputs were provided to you by someone else, make sure to communicate with them about what pre-processing steps have been performed, if any.
Depending on your data source, identifying and discarding empty droplets may not be necessary. Some academic institutions have research cores dedicated to single cell work that perform the sample preparation and sequencing. Many of these cores will also perform empty droplet filtering and other initial QC steps. Specific details on the steps in common pipelines like [10x Genomics' CellRanger](https://www.10xgenomics.com/support/software/cell-ranger/latest/tutorials) can usually be found in the documentation that came with the sequencing material.

<!-- TODO: cite official 10x CellRanger docs -->
The main point is: if the sequencing outputs were provided to you by someone else, make sure to communicate with them about what pre-processing steps have been performed, if any.
:::

:::: challenge
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified fig/cell_type_annotation-rendered-unnamed-chunk-7-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified fig/cell_type_annotation-rendered-unnamed-chunk-8-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 12 additions & 16 deletions intro-sce.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,17 @@ This class implements a data structure that stores all aspects of our single-cel

<img src="http://bioconductor.org/books/release/OSCA.intro/images/SingleCellExperiment.png" style="display: block; margin: auto;" />

:::: spoiler

### Before `SingleCellExperiment`

Before `SingleCellExperiment`, coders working with single cell data would sometimes keep all of these components in separate objects e.g. a matrix of counts, a data.frame of sample metadata, a data.frame of gene annotations and so on. There were two main disadvantages to this sort of "from scratch" approach:

1. Tons of book-keeping. If you performed a QC step that removed dead cells, you also had to remember to remove that same set of cells from the cell-wise metadata. Un-expressed genes were dropped? Don't forget to filter the gene metadata table too.
2. All the downstream steps had to be "from scratch" as well. All the data munging, analysis, and visualization code had to be customized to the idiosyncrasies of a given input set.

::::

Let's look at an example dataset. `WTChimeraData` comes from a study on mouse development. We can assign one sample to a `SingleCellExperiment` object named `sce` like so:


Expand Down Expand Up @@ -261,7 +272,7 @@ Here we add a column called "conservation" that is just an integer sequence from


``` r
rowData(sce)$conservation = 1:nrow(sce)
rowData(sce)$conservation = rnorm(nrow(sce))
```

This is just a made-up example with a simple sequence of numbers, but in practice its convenient to store any sort of gene-wise information in the columns of the rowData.
Expand Down Expand Up @@ -384,21 +395,6 @@ altExpNames(0):

:::::::::::::::::::::::::::::::::::::::::::::

:::: challenge

#### Extension Challenge 1

Before `SingleCellExperiment`, coders working with single cell data would sometimes keep all of these components in separate objects e.g. a matrix of counts, a data.frame of sample metadata, a data.frame of gene annotations and so on. What are the main disadvantages of this sort of "from scratch" approach?

::: solution

1. You have to do tons of book-keeping! If you perform a QC step that removes dead cells, now you also have to remember to remove that same set of cells from the cell-wise metadata. Dropped un-expressed genes? Don't forget to filter the gene metadata table too.

2. All the downstream steps have to be "from scratch" as well! All the data munging, analysis, and visualization code has to be customized to the idiosyncrasies of your input. Agh!

:::

::::

:::::::::::::: checklist

Expand Down
6 changes: 3 additions & 3 deletions md5sum.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
"config.yaml" "b0d664d3d6abdd0e98b16282e1c03107" "site/built/config.yaml" "2024-09-24"
"index.md" "495939ddd3f110be3bbcd49b60f4a7ce" "site/built/index.md" "2024-09-24"
"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-09-24"
"episodes/intro-sce.Rmd" "709fc538c9872b9494fa37f1059ea4a0" "site/built/intro-sce.md" "2024-10-02"
"episodes/eda_qc.Rmd" "b4800ddfe2d5deb5047311658f254e6d" "site/built/eda_qc.md" "2024-10-02"
"episodes/cell_type_annotation.Rmd" "5bd585c6e4c6fc09a7443ce4da35899f" "site/built/cell_type_annotation.md" "2024-10-02"
"episodes/intro-sce.Rmd" "b704934867b22d804de1e0fa0a9600eb" "site/built/intro-sce.md" "2024-10-02"
"episodes/eda_qc.Rmd" "b21851ec0c9912dd9cb8a0cf230bcf9d" "site/built/eda_qc.md" "2024-10-02"
"episodes/cell_type_annotation.Rmd" "8fab6e0cbb60d6fe6a67a0004c5ce5ab" "site/built/cell_type_annotation.md" "2024-10-02"
"episodes/multi-sample.Rmd" "4711a38fd8b29961424215dd17fb7528" "site/built/multi-sample.md" "2024-09-30"
"episodes/large_data.Rmd" "f19fa53e9e63d4cb8fe0f6ab61c8fc3a" "site/built/large_data.md" "2024-10-02"
"episodes/hca.Rmd" "3f2af9dc9e53fd617512a37db87f20a7" "site/built/hca.md" "2024-10-02"
Expand Down

0 comments on commit e9b09f2

Please sign in to comment.