Skip to content

Commit

Permalink
markdown source builds
Browse files Browse the repository at this point in the history
Auto-generated via `{sandpaper}`
Source  : 76665d4
Branch  : main
Author  : Andrew Ghazi <[email protected]>
Time    : 2024-11-12 17:18:34 +0000
Message : Merge pull request #62 from ccb-hms/post_sprint_2

post sprint 2
  • Loading branch information
actions-user committed Nov 12, 2024
1 parent 9dad280 commit 1a30848
Show file tree
Hide file tree
Showing 10 changed files with 47 additions and 13 deletions.
Binary file modified fig/multi-sample-rendered-unnamed-chunk-10-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/multi-sample-rendered-unnamed-chunk-11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/multi-sample-rendered-unnamed-chunk-14-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/multi-sample-rendered-unnamed-chunk-16-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified fig/multi-sample-rendered-unnamed-chunk-2-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified fig/multi-sample-rendered-unnamed-chunk-24-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified fig/multi-sample-rendered-unnamed-chunk-24-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/multi-sample-rendered-unnamed-chunk-4-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion md5sum.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"episodes/intro-sce.Rmd" "88de9550fb00214022d4d0ada77a964b" "site/built/intro-sce.md" "2024-11-11"
"episodes/eda_qc.Rmd" "1ea032eab862e75d5f3adefa23197a3d" "site/built/eda_qc.md" "2024-11-11"
"episodes/cell_type_annotation.Rmd" "68a299eb32d85b6a30af9215e7a941ca" "site/built/cell_type_annotation.md" "2024-11-11"
"episodes/multi-sample.Rmd" "4422d860318f365ac88f7d7f0253b47a" "site/built/multi-sample.md" "2024-11-11"
"episodes/multi-sample.Rmd" "bba478a623adc5dcfd260ae0529cc49c" "site/built/multi-sample.md" "2024-11-12"
"episodes/large_data.Rmd" "f64ef24d0547fa7a29c3a57f100f77ab" "site/built/large_data.md" "2024-11-11"
"episodes/hca.Rmd" "873df251787c01aff0a1e7671f463880" "site/built/hca.md" "2024-11-11"
"instructors/instructor-notes.md" "79f31f78e0c09e7771975b1a14d6cd08" "site/built/instructor-notes.md" "2024-10-11"
Expand Down
58 changes: 46 additions & 12 deletions multi-sample.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ As before, we will use the the wild-type data from the Tal1 chimera experiment:

Note that this is a paired design in which for each biological replicate (pool 3, 4, and 5), we have both host and injected cells.

We start by loading the data and doing a quick exploratory analysis, essentially applying the normalization and visualization techniques that we have seen in the previous lectures to all samples. Note that this time we're selecting samples 5 to 10, not just 5 by itself.
We start by loading the data and doing a quick exploratory analysis, essentially applying the normalization and visualization techniques that we have seen in the previous lectures to all samples. Note that this time we're selecting samples 5 to 10, not just 5 by itself. Also note the `type = "processed"` argument: we are explicitly selecting the version of the data that has already been QC processed.



Expand Down Expand Up @@ -116,7 +116,7 @@ cell_30702 0.00108837 0.550807
cell_30703 0.82369305 1.184919
```

To speed up computations, after removing doublets, we randomly select 50% cells per sample.
For the sake of making these examples run faster, we drop some problematic types (stripped nuclei and doublets) and also randomly select 50% cells per sample.


``` r
Expand All @@ -134,7 +134,7 @@ idx <- unlist(tapply(colnames(sce), sce$sample, function(x) {
sce <- sce[,idx]
```

We now normalize the data, run some dimensionality reduction steps, and visualize them in a tSNE plot.
We now normalize the data, run some dimensionality reduction steps, and visualize them in a tSNE plot. In this case we happen to have a ton of cell types to visualize, so we define a custom palette with a lot of visually distinct colors (adapted from the `polychrome` palette in the [`pals` package](https://cran.r-project.org/web/packages/pals/vignettes/pals_examples.html)).


``` r
Expand All @@ -156,8 +156,15 @@ plotTSNE(sce, colour_by = "sample")
<img src="fig/multi-sample-rendered-unnamed-chunk-2-1.png" style="display: block; margin: auto;" />

``` r
color_vec <- c("#5A5156", "#E4E1E3", "#F6222E", "#FE00FA", "#16FF32", "#3283FE",
"#FEAF16", "#B00068", "#1CFFCE", "#90AD1C", "#2ED9FF", "#DEA0FD",
"#AA0DFE", "#F8A19F", "#325A9B", "#C4451C", "#1C8356", "#85660D",
"#B10DA1", "#3B00FB", "#1CBE4F", "#FA0087", "#333333", "#F7E1A0",
"#C075A6", "#782AB6", "#AAF400", "#BDCDFF", "#822E1C", "#B5EFB5",
"#7ED7D1", "#1C7F93", "#D85FF7", "#683B79", "#66B0FF", "#FBE426")

plotTSNE(sce, colour_by = "celltype.mapped") +
scale_color_discrete() +
scale_color_manual(values = color_vec) +
theme(legend.position = "bottom")
```

Expand Down Expand Up @@ -208,6 +215,18 @@ plotTSNE(merged, colour_by = "batch")

<img src="fig/multi-sample-rendered-unnamed-chunk-3-1.png" style="display: block; margin: auto;" />

We can also see that when coloring by cell type, the cell types are now nicely confined to their own clusters for the most part:


``` r
plotTSNE(merged, colour_by = "celltype.mapped") +
scale_color_manual(values = color_vec) +
theme(legend.position = "bottom")
```

<img src="fig/multi-sample-rendered-unnamed-chunk-4-1.png" style="display: block; margin: auto;" />


Once we removed the sample batch effect, we can proceed with the Differential
Expression Analysis.

Expand Down Expand Up @@ -424,7 +443,7 @@ for (i in seq_len(ncol(y))) {
}
```

<img src="fig/multi-sample-rendered-unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
<img src="fig/multi-sample-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />

``` r
par(mfrow = c(1,1))
Expand All @@ -441,7 +460,7 @@ limma::plotMDS(cpm(y, log = TRUE),
col = ifelse(y$samples$tomato, "red", "blue"))
```

<img src="fig/multi-sample-rendered-unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
<img src="fig/multi-sample-rendered-unnamed-chunk-11-1.png" style="display: block; margin: auto;" />

We then construct a design matrix by including both the pool and the tomato as factors.
This design indicates which samples belong to which pool and condition, so we can
Expand Down Expand Up @@ -496,7 +515,7 @@ Additionally, the Common and Trend BCV are shown in `red` and `blue`.
plotBCV(y)
```

<img src="fig/multi-sample-rendered-unnamed-chunk-13-1.png" style="display: block; margin: auto;" />
<img src="fig/multi-sample-rendered-unnamed-chunk-14-1.png" style="display: block; margin: auto;" />

We then fit a Quasi-Likelihood (QL) negative binomial generalized linear model for each gene.
The `robust = TRUE` parameter avoids distortions from highly variable clusters.
Expand Down Expand Up @@ -530,7 +549,7 @@ QL dispersion estimates for each gene as a function of abundance. Raw estimates
plotQLDisp(fit)
```

<img src="fig/multi-sample-rendered-unnamed-chunk-15-1.png" style="display: block; margin: auto;" />
<img src="fig/multi-sample-rendered-unnamed-chunk-16-1.png" style="display: block; margin: auto;" />

We then use an empirical Bayes quasi-likelihood F-test to test for differential expression (due to tomato injection) per each gene at a False Discovery Rate (FDR) of 5%.
The low amount of DGEs highlights that the tomato injection effect has a low
Expand Down Expand Up @@ -814,7 +833,7 @@ Use the `pheatmap` package to create a heatmap of the abundances table. Does it

:::::::::::::: hint

You can simply hand `pheatmap()` a matrix as its only argument. `pheatmap()` has a million options you can tweak, but the defaults are usually pretty good.
You can simply hand `pheatmap()` a matrix as its only argument. `pheatmap()` has a million options you can adjust, but the defaults are usually pretty good. Try to overlay sample-level information with the `annotation_col` argument for an extra challenge.

:::::::::::::::::::::::

Expand All @@ -825,7 +844,22 @@ You can simply hand `pheatmap()` a matrix as its only argument. `pheatmap()` has
pheatmap(y.ab$counts)
```

<img src="fig/multi-sample-rendered-unnamed-chunk-23-1.png" style="display: block; margin: auto;" />
<img src="fig/multi-sample-rendered-unnamed-chunk-24-1.png" style="display: block; margin: auto;" />

``` r
anno_df <- y.ab$samples[,c("tomato", "pool")]

anno_df$pool = as.character(anno_df$pool)

anno_df$tomato <- ifelse(anno_df$tomato,
"tomato+",
"tomato-")

pheatmap(y.ab$counts,
annotation_col = anno_df)
```

<img src="fig/multi-sample-rendered-unnamed-chunk-24-2.png" style="display: block; margin: auto;" />

The top DA result was a decrease in ExE ectoderm in the tomato condition, which you can sort of see, especially if you `log1p()` the counts or discard rows that show much higher values. ExE ectoderm counts were much higher in samples 8 and 10 compared to 5, 7, and 9.

Expand Down Expand Up @@ -875,7 +909,7 @@ ggplot(comp_df, aes(logFC.x, logFC.y)) +
geom_point()
```

<img src="fig/multi-sample-rendered-unnamed-chunk-24-1.png" style="display: block; margin: auto;" />
<img src="fig/multi-sample-rendered-unnamed-chunk-25-1.png" style="display: block; margin: auto;" />

``` r
# Reshape to long format for ggplot facets. This is 1000x times easier to do
Expand All @@ -893,7 +927,7 @@ ggplot(pval_df, aes(Pvalue)) +
facet_wrap("pool_factor")
```

<img src="fig/multi-sample-rendered-unnamed-chunk-24-2.png" style="display: block; margin: auto;" />
<img src="fig/multi-sample-rendered-unnamed-chunk-25-2.png" style="display: block; margin: auto;" />

We can see that in this case, the logFC estimates are strongly consistent between the two models, which tells us that the inclusion of the `pool` factor in the model doesn't strongly influence the estimate of the `tomato` coefficients in this case.

Expand Down

0 comments on commit 1a30848

Please sign in to comment.