forked from ropensci/cRegulome
-
Notifications
You must be signed in to change notification settings - Fork 0
/
using_cRegulome.Rmd
257 lines (211 loc) · 7.83 KB
/
using_cRegulome.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
---
title: "Using cRegulome"
author: "Mahmoud Ahmed"
date: "August 22, 2017"
vignette: >
%\VignetteIndexEntry{Using cRegulome}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo=FALSE}
knitr::opts_chunk$set(message = FALSE, warning = FALSE, fig.align = 'center')
```
# Overview
Transcription factors and microRNAs are important for regulating the gene
expression in normal physiology and pathological conditions. Many
bioinformatic tools were built to predict and identify transcription
factors and microRNA targets and their role in development of diseases
including cancers. The availability of public access high-throughput data
allowed for data-driven discoveries and validations of these predictions.
Here, we build on that kind of tools and integrative analyses to provide a
tool to access, manage and visualize data from open source databases.
cRegulome provides a programmatic access to the regulome (microRNA and
transcription factor) correlations with target genes in cancer. The package
obtains a local instance of Cistrome Cancer and miRCancerdb databases and
provides objects and methods to interact with and visualize the correlation
data.
# Getting started
To get started with cRegulome, we show a very quick example. We first start
by downloading a small test database file, make a simple query and convert
the output to a cRegulome object to print and visualize.
```{r load_libraries}
# load required libraries
library(cRegulome)
library(RSQLite)
library(ggplot2)
```
```{r prepare database file, eval=FALSE}
# download the db file when using it for the first time
destfile = paste(tempdir(), 'cRegulome.db.gz', sep = '/')
if(!file.exists(destfile)) {
get_db(test = TRUE)
}
# connect to the db file
db_file = paste(tempdir(), 'cRegulome.db', sep = '/')
conn <- dbConnect(SQLite(), db_file)
```
```{bash eval=FALSE}
# alternative to downloading the database file
wget https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/9537385/cRegulome.db.gz
gunzip cRegulome.db.gz
```
```{r connect_db, include=FALSE}
# locate the testset file and connect
fl <- system.file('extdata', 'cRegulome.db', package = 'cRegulome')
conn <- dbConnect(SQLite(), fl)
```
```{r simple_query}
# enter a custom query with different arguments
dat <- get_mir(conn,
mir = 'hsa-let-7g',
study = 'STES',
min_abs_cor = .3,
max_num = 5)
# make a cmicroRNA object
ob <- cmicroRNA(dat)
```
```{r print_object}
# print object
ob
```
```{r plot_object}
# plot object
cor_plot(ob)
```
# Package Description
## Data sources
The two main sources of data used by this package are Cistrome Cancer and
miRCancerdb databases. Cistrome Cancer is based on an integrative analysis of
The Cancer Genome Atlas (TCGA) and public ChIP-seq data. It provides
calculated correlations of (n = 320) transcription factors and their target
genes in (n = 29) cancer study. In addition, Cistrome Cancer provides the
transcription factors regulatory potential to target and non-target genes.
miRCancerdb uses TCGA data and TargetScan annotations to correlate known
microRNAs (n = 750) and target and non-target genes in (n = 25) cancer studies.
## Database file
cRegulome obtains a pre-build SQLite database file of the Cistrome Cancer
and miRCancerdb databases. The details of this build is provided at
[cRegulomedb](https://github.com/MahShaaban/cRegulomedb) in addition to the
scripts used to pull, format and deposit the data at an on-line repository.
Briefly, the SQLite database consist of 4 tables `cor_mir` and `cor_tf` for
correlation values and `targets_mir` and `targets_tf` for microRNA miRBase
ID and transcription factors symbols to genes mappings. Two indices were
created to facilitate the database search using the miRBase IDs and
transcription factors symbols. The database file can be downloaded using the
function `get_db`.
To show the details of the database file, the following code connects to
the database and show the names of tables and fields in each of them.
```{r database_file}
# table names
tabs <- dbListTables(conn)
print(tabs)
# fields/columns in the tables
for(i in seq_along(tabs)) {
print(dbListFields(conn, tabs[i]))
}
```
## Database query
To query the database using cRegulome, we provide two main functions;
`get_mir` and `get_tf` for querying microRNA and transcription factors
correlations respectively. Users need to provide the proper IDs for
microRNA, transcription factor symbols and/or TCGA study identifiers.
microRNAs are referred to by the official miRBase IDs, transcription
factors by their corresponding official gene symbols that contains them
and TCGA studies with their common identifiers. In either cases, the output of
calling the these functions is a tidy data frame of 4 columns; `mirna_base`/
`tf`, `feature`, `cor` and `study` These correspond to the miRBase IDs or
transcription factors symbol, gene symbol, correlation value and the TCGA
study identifier.
Here we show an example of such a query. Then, we illustrate how this query
is executed on the database using basic `RSQLite` and `dbplyr` which is what
the `get_*` functions are doing.
```{r database_query}
# query the db for two microRNAs
dat_mir <- get_mir(conn,
mir = c('hsa-let-7g', 'hsa-let-7i'),
study = 'STES')
# query the db for two transcription factors
dat_tf <- get_tf(conn,
tf = c('LEF1', 'MYB'),
study = 'STES')
# show first 6 line of each of the data.frames
head(dat_mir); head(dat_tf)
```
## Objects
Two S3 objects are provided by cRegulome to store and dispatch methods on
the correlation data. cmicroRNA and cTF for microRNA and transcription
factors respectively. The structure of these objects is very similar.
Basically, as all S3 objects, it’s a list of 4 items; microRNA or TF for
the regulome element, features for the gene hits, studies for the TCGA
studies and finally corr is either a `data.frame` when the object has
data.from a single TCGA study or a named list of data.frames when it has data
from multiple studies. Each of these data.frames has the regulome element
(microRNAs or transcription factors) in columns and features/genes in rows.
To construct these objects, users need to call a constructor function with
the corresponding names on the data.frame output form `get_*`. The reverse
is possible by calling the function `cor_tidy` on the object to get back the
tidy data.frame.
```{r cmicroRNA_object}
# explore the cmicroRNA object
ob_mir <- cmicroRNA(dat_mir)
class(ob_mir)
str(ob_mir)
```
```{r cTF_object}
# explore the cTF object
ob_tf <- cTF(dat_tf)
class(ob_tf)
str(ob_tf)
```
## Methods
cRegulome provides S3 methods to interact a visualize the correlations data
in the cmicroRNA and cTF objects. Table 1 provides an over view of these
functions. These methods dispatch directly on the objects and could be
customized and manipulated in the same way as their generics.
```{r methods_cmicroRNA}
# cmicroRNA object methods
methods(class = 'cmicroRNA')
```
```{r methods_cTF}
# cTF object methods
methods(class = 'cTF')
```
```{r tidy_method}
# tidy method
head(cor_tidy(ob_mir))
```
```{r cor_hist_method}
# cor_hist method
cor_hist(ob_mir,
breaks = 100,
main = '', xlab = 'Correlation')
dev.off()
```
```{r cor_joy_method}
# cor_joy method
cor_joy(ob_mir) +
labs(x = 'Correlation', y = '')
dev.off()
```
```{r cor_venn_diagram_method}
# cor_venn_diagram method
cor_venn_diagram(ob_mir, cat.default.pos = 'text')
dev.off()
```
```{r cor_upset_method}
# cor_upset method
cor_upset(ob_mir)
dev.off()
```
# Contributions
Comments, issues and contributions are welcomed at:
[https://github.com/MahShaaban/cRegulome](https://github.com/MahShaaban/cRegulome)
# Citations
Please cite:
```{r citation, eval=FALSE}
citation('cRegulome')
```
```{r clean, echo=FALSE}
dbDisconnect(conn)
unlink('./Venn*')
```