generated from kevinrue/MyBioconductorPackage
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmethods.Rmd
362 lines (274 loc) · 12 KB
/
methods.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
---
title: "Supported differential expression methods"
author:
- name: Kevin Rue-Albrecht
affiliation:
- University of Oxford
email: [email protected]
output:
BiocStyle::html_document:
self_contained: yes
toc: true
toc_float: true
toc_depth: 2
code_folding: show
date: "`r doc_date()`"
package: "`r pkg_ver('iSEEde')`"
vignette: >
%\VignetteIndexEntry{Supported differential expression methods}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
)
options(width = 100)
```
```{r, eval=!exists("SCREENSHOT"), include=FALSE}
SCREENSHOT <- function(x, ...) knitr::include_graphics(x)
```
```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE}
## Track time spent on making the vignette
startTime <- Sys.time()
## Bib setup
library("RefManageR")
## Write bibliography information
bib <- c(
R = citation(),
BiocStyle = citation("BiocStyle")[1],
knitr = citation("knitr")[1],
RefManageR = citation("RefManageR")[1],
rmarkdown = citation("rmarkdown")[1],
sessioninfo = citation("sessioninfo")[1],
testthat = citation("testthat")[1],
iSEEde = citation("iSEEde")[1]
)
```
# Implementation
## User-facing storage and access
Differential expression results are generally reported as tables of statistics,
including (log~2~) fold-change, p-value, average expression, etc.
Those statistics being reported for individual features (e.g., genes), the
`rowData()` component of `SummarizedExperiment()` objects provides a natural
home for that information.
Specifically, `r BiocStyle::Biocpkg("iSEEde")` stores and retrieves differential expression results in `rowData(se)[["iSEEde"]]`.
However, the `rowData()` function should not be used to access or edit that information.
Instead, the functions `embedContrastResults()` and `contrastResults()`, should be used to store and retrieve contrast results, respectively, as those functions ensure that feature names are kept synchronised with the enclosing `SummarizedExperiment` object.
Moreover, the function `contrastResultsNames()` can be used to retrieve the names of contrast available in a given `SummarizedExperiment` object.
## Additional considerations
The first challenge arises when differential expression statistics are computed only for a subset of features.
In that case, `embedContrastResults()` populates missing information with `NA` values.
The second challenge arises from the different names of columns used by individual differential expression methods to store differential expression common statistics.
To address this, `r BiocStyle::Biocpkg("iSEEde")` provides S4 classes creating a common interface to supported differential expression methods.
Each class of differential expression results implements the following methods:
- `pValue()` returns the vector of raw p-values.
- `log2FoldChange()` returns the vector of log2-fold-change values.
- `averageLog2()` returns the vector of average log2-expression values.
# Example data
In this example, we use the `?airway` data set.
We briefly adjust the reference level of the treatment factor to the untreated condition.
```{r, message=FALSE, warning=FALSE}
library("airway")
data("airway")
airway$dex <- relevel(airway$dex, "untrt")
```
# Supported methods
## Limma
We first use `edgeR::filterByExpr()` to remove genes whose counts are too low to
support a rigorous differential expression analysis. Then we run a standard
Limma-Voom analysis using `edgeR::voomLmFit()`, `limma::makeContrasts()`, and
`limma::eBayes()`. (Alternatively, we could have used `limma::treat()` instead
of `limma::eBayes()`.)
The linear model includes the `dex` and `cell` covariates, indicating the
treatment conditions and cell types, respectively. Here, we are interested in
differences between treatments, adjusted by cell type, and define this
comparison as the `dextrt - dexuntrt` contrast.
The final differential expression results are fetched using `limma::topTable()`.
```{r, message=FALSE, warning=FALSE}
library("edgeR")
design <- model.matrix(~ 0 + dex + cell, data = colData(airway))
keep <- filterByExpr(airway, design)
fit <- voomLmFit(airway[keep, ], design, plot = FALSE)
contr <- makeContrasts("dextrt - dexuntrt", levels = design)
fit <- contrasts.fit(fit, contr)
fit <- eBayes(fit)
res_limma <- topTable(fit, sort.by = "P", n = Inf)
head(res_limma)
```
Then, we embed this set of differential expression results in the `airway`
object using the `embedContrastResults()` method.
Because the output of `limma::topTable()` is a standard `data.frame`, the
`class=` argument must be used to manually identify the method that produced
those results.
Supported classes are listed in the object `iSEEde::embedContrastResultsMethods`, i.e.
```{r, message=FALSE}
library(iSEEde)
embedContrastResultsMethods
```
This manual method is preferable to any automated heuristic (e.g. using the
column names of the `data.frame` for identifying it as a `limma::topTable()`
output).
The results embedded in the `airway` object can be accessed using the `contrastResults()` function.
```{r}
airway <- embedContrastResults(res_limma, airway, name = "Limma-Voom", class = "limma")
contrastResults(airway)
contrastResults(airway, "Limma-Voom")
```
## DESeq2
First, we use `DESeqDataSet()` to construct a `DESeqDataSet` object
for the analysis. Then we run a standard `r BiocStyle::Biocpkg("DESeq2")`
analysis using `DESeq()`.
The differential expression results are fetched using `results()`.
```{r, message=FALSE, warning=FALSE}
library("DESeq2")
dds <- DESeqDataSet(airway, ~ 0 + dex + cell)
dds <- DESeq(dds)
res_deseq2 <- results(dds, contrast = list("dextrt", "dexuntrt"))
head(res_deseq2)
```
Then, we embed this set of differential expression results in the `airway`
object using the `embedContrastResults()` method.
In this instance, the `r BiocStyle::Biocpkg("DESeq2")` results are stored in a
recognisable `?DESeqResults` object, which can be given *as is* directly to the
`embedContrastResults()` method.
The function will automatically pass the results object to the
`iSEEDESeq2Results()` constructor, to identify it as such.
The results embedded in the airway object can be accessed using the `contrastResults()` function.
```{r}
airway <- embedContrastResults(res_deseq2, airway, name = "DESeq2")
contrastResults(airway)
contrastResults(airway, "DESeq2")
```
## edgeR
We run a standard `r BiocStyle::Biocpkg("edgeR")` analysis using `glmFit()` and
`glmLRT()`.
The differential expression results are fetched using `topTags()`.
```{r, message=FALSE, warning=FALSE}
library("edgeR")
design <- model.matrix(~ 0 + dex + cell, data = colData(airway))
fit <- glmFit(airway, design, dispersion = 0.1)
lrt <- glmLRT(fit, contrast = c(-1, 1, 0, 0, 0))
res_edger <- topTags(lrt, n = Inf)
head(res_edger)
```
Then, we embed this set of differential expression results in the `airway`
object using the `embedContrastResults()` method.
In this instance, the `r BiocStyle::Biocpkg("edgeR")` results are stored in a
recognisable `?TopTags` object. As such, the results object can be given *as is*
directly to the `embedContrastResults()` method. The function will automatically pass
the results object to the `iSEEedgeRResults()` constructor, to identify it as
such.
The results embedded in the airway object can be accessed using the `contrastResults()` function.
```{r}
airway <- embedContrastResults(res_edger, airway, name = "edgeR")
contrastResults(airway)
contrastResults(airway, "edgeR")
```
# Live app
In this example, we used the `r BiocStyle::Biocpkg("iSEEde")` functions `DETable()`, `VolcanoPlot()`, and `MAPlot()` to add panels that facilitate the visualisation of differential expression results in an `r BiocStyle::Biocpkg("iSEE")` app.
Specifically, we add one set of panels for each differential expression method used in this vignette (i.e., Limma-Voom, DESeq2, edgeR).
```{r, message=FALSE}
library(iSEE)
app <- iSEE(airway, initial = list(
DETable(ContrastName="Limma-Voom", HiddenColumns = c("AveExpr",
"t", "B"), PanelWidth = 4L),
VolcanoPlot(ContrastName = "Limma-Voom", PanelWidth = 4L),
MAPlot(ContrastName = "Limma-Voom", PanelWidth = 4L),
DETable(ContrastName="DESeq2", HiddenColumns = c("baseMean",
"lfcSE", "stat"), PanelWidth = 4L),
VolcanoPlot(ContrastName = "DESeq2", PanelWidth = 4L),
MAPlot(ContrastName = "DESeq2", PanelWidth = 4L),
DETable(ContrastName="edgeR", HiddenColumns = c("logCPM",
"LR"), PanelWidth = 4L),
VolcanoPlot(ContrastName = "edgeR", PanelWidth = 4L),
MAPlot(ContrastName = "edgeR", PanelWidth = 4L)
))
if (interactive()) {
shiny::runApp(app)
}
```
```{r, echo=FALSE, out.width="100%"}
SCREENSHOT("screenshots/methods_side_by_side.png", delay = 20)
```
# Comparing two contrasts
The `?LogFCLogFCPlot` class allows users to compare the log~2~ fold-change value of features between two differential expression contrasts.
In this example, we add one `?LogFCLogFCPlot` panel comparing the same contrast using the Limma-Voom and DESeq2 methods, alongside one `?VolcanoPlot` panel for each of those two contrasts.
Moreover, we pre-select an area of the `?LogFCLogFCPlot` and highlight the selected features in the two `?VolcanoPlot` panels.
```{r, message=FALSE, warning=FALSE}
library(iSEE)
app <- iSEE(airway, initial = list(
VolcanoPlot(ContrastName="Limma-Voom",
RowSelectionSource = "LogFCLogFCPlot1", ColorBy = "Row selection",
PanelWidth = 4L),
LogFCLogFCPlot(ContrastNameX="Limma-Voom", ContrastNameY="DESeq2",
BrushData = list(
xmin = 3.6, xmax = 8.2, ymin = 3.8, ymax = 9.8,
mapping = list(x = "X", y = "Y"),
direction = "xy", brushId = "LogFCLogFCPlot1_Brush",
outputId = "LogFCLogFCPlot1"),
PanelWidth = 4L),
VolcanoPlot(ContrastName="DESeq2",
RowSelectionSource = "LogFCLogFCPlot1", ColorBy = "Row selection",
PanelWidth = 4L)
))
if (interactive()) {
shiny::runApp(app)
}
```
```{r, echo=FALSE, out.width="100%"}
SCREENSHOT("screenshots/logfc_logfc_plot.png", delay = 20)
```
# Reproducibility
The `r Biocpkg("iSEEde")` package `r Citep(bib[["iSEEde"]])` was made possible
thanks to:
* R `r Citep(bib[["R"]])`
* `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])`
* `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
* `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`
* `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])`
* `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])`
* `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])`
This package was developed using `r BiocStyle::Biocpkg("biocthis")`.
Code for creating the vignette
```{r createVignette, eval=FALSE}
## Create the vignette
library("rmarkdown")
system.time(render("methods.Rmd", "BiocStyle::html_document"))
## Extract the R code
library("knitr")
knit("methods.Rmd", tangle = TRUE)
```
Date the vignette was generated.
```{r reproduce1, echo=FALSE}
## Date the vignette was generated
Sys.time()
```
Wallclock time spent generating the vignette.
```{r reproduce2, echo=FALSE}
## Processing time in seconds
totalTime <- diff(c(startTime, Sys.time()))
round(totalTime, digits = 3)
```
`R` session information.
```{r reproduce3, echo=FALSE}
## Session info
library("sessioninfo")
options(width = 120)
session_info()
```
# Bibliography
This vignette was generated using `r Biocpkg("BiocStyle")` `r
Citep(bib[["BiocStyle"]])` with `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
and `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` running behind the
scenes.
Citations made with `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`.
```{r vignetteBiblio, results = "asis", echo = FALSE, warning = FALSE, message = FALSE}
## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
```
<!-- Links -->
[scheme-wikipedia]: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax
[iana-uri]: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml