-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.Rmd
238 lines (181 loc) · 9.19 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
echo = TRUE,
message = FALSE,
error = FALSE,
comment = "#>",
fig.path = "man/figures/"
)
pkgdownsite <- !is.null(getOption("PKGDOWN_BUILD"))
```
# FacileAnalysis
<!-- badges: start -->
[](https://github.com/facilebio/FacileAnalysis/actions)

[](https://www.repostatus.org/#active)
[](https://www.tidyverse.org/lifecycle/#maturing)
[](https://codecov.io/gh/facilebio/FacileAnalysis?branch=master)
<!-- badges: end -->
The FacileAnalysis package defines a set of analysis tasks over genomic data
in a modular fashion, which requires these data to be stored in a container that
implements the FacileData API (aka a `FacileDataStore`).
The over-arching goal of developing analyses within this framework is to enable
quick and effortless interrogation of genomic data by enabling a **hybrid**
interactive and code-driven approach to data analysis. Analyses can be either
completely code-driven, GUI driven, or some mix of the two.
To achieve this goal, analysis modules break down a general analysis task into
smaller constituent steps, the results of which can:
1. be piped together (`|>`) to perform a complete analysis;
2. be explored at **different levels of interactivity**, via the `shine()`,
`viz()`, and `report()` methods; and
3. act as starting (or reference) points for **the next** analysis task, either
directly or by via the `ranks()`, `signature()`, or `compare()` methods.
Please refer to the [RNA-seq Analysis: The Facile Way][rnafway] to get an idea
of how to use these tools when analyzing RNA-seq data.
<!--
You can find a more in-depth overview and philosophy of this package in the
[FacileAnalysis vignette][pkgvign] (this is incomplete).
-->
The analyses implemented within this package are listed below, with links to
vignettes that describe their functionality in more detail:
* `fpca`: [Interactive Principal Components Analyses][fpcavign]
* `fdge`: [Interactive Differential Gene Expression Analysis][fdgevign]
* `fsea`: [Interactive (Gene) Set Enrichment Analysis][fseavign]
**A note on the experimental lifecycle**: This package is tagged as
"experimental" due to its limited use by a broader audience, and not as
a sign of the commitment to its development or how long it has been in
(internal) use. As we find the edge cases and pain points in the APIs through
broader adoption, we expect to soon move to "maturing" (and eventually "stable")
[lifecycle][lifecycle].
[lifecycle]: https://www.tidyverse.org/lifecycle/
## Example Analysis: Differential Expression
We'll include an exemplar differential expression analysis here in order to
touch on some of the guiding principles of an analysis modules mentioned above.
We have first defined a complete differential expression analysis (`fdge`),
by breaking it down into the following steps:
1. identifying the subset of samples from a `FacileDataStore` to perform the
analysis over;
2. defining the model (design) over the samples, ie. the covariate to test and
which covariates to use to control batch effects
3. defining the parameters to run the statistical test, ie.
i) the assay from the samples to run the analysis on
ii) the differential expression pipeline to use (`"voom"`, `"edgeR-qlf"`,
`"limma-trend"`, `"limma"`)
iii) advanced options, like the threshold to test against (limma/treat), or
whether to incorporate sample level weights
The example below identifies genes differentially expression between tumor and
normal samples (and controlling for sex) in the "BLCA" indication of the example
TCGA dataset included in the [FacileData][faciledata] package.
```{r voom-analysis, message=FALSE, warning=FALSE}
library(FacileData)
library(FacileAnalysis)
efds <- exampleFacileDataSet()
# Step 1: define the samples implicated in our test
samples <- filter_samples(efds, indication == "BLCA")
# Step 2: define the model (design) for the test
model <- flm_def(samples, covariate = "sample_type",
numer = "tumor", denom = "normal",
batch = "sex")
# Step 3: configure the options to run the test, which include the assay that
# holds the data used to test and the statistical method/framework we
# should use to perform the test
vdge <- fdge(model, assay_name = "rnaseq", method = "voom")
```
Perhaps you prefer the [edgeR/QLF][qlf] analysis framework, instead? No problem,
we only need to tweak one of the parameters in the last step of the pipeline:
```{r qlf-analysis, eval = FALSE}
qdge <- fdge(model, assay_name = "rnaseq", method = "edgeR-qlf")
```
... or DESeq2, perhaps? No problem, [we accept pull requests][PR]!
## Interrogation of Analyses at Different Levels of Interactivity
There are a number of S3 methods a `FacileAnalysisResult` needs to define in
order to be complete. The `shine()`, `viz()`, and `report()` methods allow
the end-user to interrogate (and report) the results of an analysis at
different levels of interactivity.
Let's take a look at how these work over a `FacileDgeAnalysisResult`
### Deep Interaction via shine()
The `shine(aresult, ...)` method provides the richest interactive view over a
`FacileAnalysisResult` by launching a [shiny gadget][sgadget] that enables
the end-user to fully interrogate the results.
```{r shine, eval=FALSE, message=FALSE, warning=FALSE}
shine(vdge)
```
<img src="man/figures/fdge-shine.gif" width="75%" />
### Interactive Graphics via viz()
The `viz(aresult, ...)` methods leverage [htmlwidgets][htmlwidgets] to create a
JavaScript-powered interactive view of the analysis result, which is detached
from a running R-process.
```{r viz, eval=FALSE, eval=pkgdownsite}
viz(vdge)
```
### Reporting Results via report()
While the output of the `viz()` functions can be used directly in Rmarkdown
reports, `report(aresult, ...)` is meant to create a "more complete"
(perhaps multi-htmlwidget) view over the result that can be more suitable
for inclusion into an Rmarkdown report.
```{r report, message=FALSE, warning=FALSE, eval=pkgdownsite}
cap <- paste(
"Differential expression of tumor vs normal samples, in the TCGA bladder",
"cancer indication (BLCA)")
report(vdge, caption = cap)
```
<!--
<img src="man/figures/fdge-report.gif" width = "75%" />
-->
## Hybrid Analyses
The same differential expression analysis that created the `vdge` object above
can be performed entirely interactively, with the same results.
We can either start the analysis from the same predefined set of samples, but
define the linear model and testing framework to use interactively, by launching
a the **f**acile **d**ifferential **g**ene **e**xpression gadget:
```{r vdge2, eval=FALSE}
vdge2 <- fdgeGadget(samples)
```
Or we can perform the whole thing via a GUI which lets us select the subset
of samples and run the differential expression analysis without using any
code at all:
```{r vdge3, eval=FALSE}
vdge3 <- fdgeGadget(efds)
```
<img src="man/figures/vignette-fdgeGadget-trim.gif" width="75%" />
Assuming the same filtering and testing strategies were selected in the GUI
using the `fdgeGadget` calls above, the objects they return will all be
equivalent to the `vdge` result, which was entirely generated programmatically.
**Note**: the screen capture of the `fdgeGadget()` above actually corresponds
to the analyses in the [RNA-seq the facile way][rnafway] vignette.
## Installation
Because the packages in the facileverse have not officially been released yet
and they also depend heavily on [bioconductor][bioc] packages, we recommend
using the [BiocManager][biocmanager] package for installation.
```{r, eval = FALSE}
# install.packages("BiocManager")
BiocManager::install("facilebio/FacileAnalysis")
```
## Resources
* [facilebio.github.io/FacileAnalysis][facileanalysis]
(online documentation and vignettes)
* [Open an issue][issues] (GitHub issues for questions, bug reports, and feature
requests)
## Acknowledgements
* Thanks to Morten Just, who developed the
[droptogif](https://github.com/mortenjust/droptogif), which we used to
convert screen-captured `*.mov` files to animated gifs.
[bioc]: https://bioconductor.org
[biocmanager]: https://cran.r-project.org/package=BiocManager
[faciledata]: https://facilebio.github.io/FacileData
[facileanalysis]: https://facilebio.github.io/FacileAnalysis
[fpcavign]: FacileAnalysis-fpca.html
[fdgevign]: FacileAnalysis-fdge.html
[fseavign]: FacileAnalysis-fsea.html
[htmlwidgets]: https://www.htmlwidgets.org
[issues]: https://github.com/facilebio/FacileAnalysis/issues
[pkgvign]: FacileAnalysis.html
[rnafway]: https://facilebio.github.io/FacileAnalysis/articles/FacileAnalysis-RNAseq.html
[PR]: https://github.com/facilebio/FacileAnalysis/pulls
[qlf]: https://f1000research.com/articles/5-1438
[sgadget]: https://shiny.rstudio.com/articles/gadgets.html