-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAnnotation.Rnw
565 lines (387 loc) · 23.4 KB
/
Annotation.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
%\VignetteIndexEntry{Genomic Annotation Practical}
%\VignettePackage{GeneticsHTSCourse}
%\VignetteEngine{knitr::knitr}
% To compile this document
% library('knitr'); rm(list=ls()); knit('DESeq2.Rnw')
\documentclass[12pt]{article}
\newcommand{\usecase}{\textit{\textbf{Use Case: }}}
<<knitr, echo=FALSE, results="hide">>=
library("knitr")
opts_chunk$set(tidy=FALSE,dev="png",fig.show="hide",
fig.width=4,fig.height=4.5,
message=FALSE,eval=FALSE)
@
<<style, eval=TRUE, echo=FALSE, results="asis">>=
BiocStyle::latex()
@
\title{Annotation and Visualisation of sequencing data in Bioconductor}
\author{Mark Dunning}
\date{Last modified: \today}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\maketitle
\tableofcontents
\section{Introduction}
This practical uses data from a Drosophilla dataset where the \emph{pasilla} gene has been knocked-out. The dataset has been used many times to demonstrate RNA-seq workflows and the data are available and discussed in various packages in Bioconductor (\Biocexptpkg{pasilla}, \Biocexptpkg{pasillaBamSubset}, \Biocpkg{DEseq}). Here we show the analysis of this dataset using edgeR and we will start this practical from the table of top tags. \textbf{You do not need to type this code, it is for your reference only, or if you want to run the practical outside of the course.}
<<eval=FALSE>>=
library(edgeR)
datafile = system.file( "extdata/pasilla_gene_counts.tsv", package="pasilla" )
datafile
pasillaCountTable = read.table( datafile, header=TRUE, row.names=1 )
head(pasillaCountTable)
pasillaDesign = data.frame(
row.names = colnames( pasillaCountTable ),
condition = c( "untreated", "untreated", "untreated",
"untreated", "treated", "treated", "treated" ),
libType = c( "single-end", "single-end", "paired-end",
"paired-end", "single-end", "paired-end", "paired-end" ) )
pasillaDesign
pairedSamples = pasillaDesign$libType == "paired-end"
countTable = pasillaCountTable[ , pairedSamples ]
condition = pasillaDesign$condition[ pairedSamples ]
y <- DGEList(counts=countTable,group=condition)
y <- calcNormFactors(y)
y <- estimateCommonDisp(y)
y <- estimateTagwiseDisp(y)
et <- exactTest(y)
topTags(et)
pasillaRes <- topTags(et,n=nrow(countTable))$table
@
The \Robject{pasillaRes} object has been saved in the {\tt exampleData} folder.
\usecase
Load the pasilla analysis and familiarise yourself with the contents. Create a data frame of the 100 most significant results.
<<>>=
load("exampleData//pasillaRes.rda")
topHits <- pasillaRes[order(pasillaRes$PValue, decreasing = F)[1:100],]
head(topHits)
@
The \Robject{topHits} data frame provides us with a list of genes that are found to be \textit{statistically significant}. However, it is a separate investigation to determine if the genes are \textit{Biologically significant}. We will \textit{annotate} our list of differentially expressed genes using the online resource biomart.
\section{Annotation using an online resource}
The \Biocpkg{biomaRt} package provides an interface to the \href{www.biomart.org}{{\color{blue}biomart}} website; allowing the data to be accessed in a consistent manner and without having write complex SQL queries or understand the underlying schema. The databases available through BioMart can be obtained with the \Rfunction{listMarts} function
\subsection{General BiomaRt Usage}
\usecase
Get a list of all databases available through the \Biocpkg{biomaRt} package and connect to the Drosophilla one.
<<>>=
library(biomaRt)
listMarts()
ensembl <- useMart("ensembl", dataset ="dmelanogaster_gene_ensembl")
attr <- listAttributes(ensembl)
head(attr, 10)
@
\warning{biomaRt will automatically connect to the latest annotation version. If you are unsure what this version is, you can use the following code to access this information}
<<>>=
datasets <- listDatasets(ensembl)
datasets[which(datasets[,1] == "dmelanogaster_gene_ensembl"),]
@
The function \Rfunction{getBM} is used to build-up queries in biomart. We must specify one or more attributes that we want to retreive. If no values or filters are specified, then all values will be returned.
\usecase
Retrieve the names of all genes for Drosophilla. How many genes are there?
<<>>=
allGenes <- getBM(attributes = c("ensembl_gene_id"), mart = ensembl)
dim(allGenes)
@
When restricting the search to particular query items (e.g. genes) we need to specify both filters and values.
\usecase
Retrieve the ensembl, transcript and peptide ID for the first 10 genes
<<>>=
filt <- listFilters(ensembl)
head(filt, 10)
getBM(attributes = attr[1:3, 1], filters = "ensembl_gene_id",
values = allGenes[1:10,1], mart = ensembl)
@
For situations when we want to specify multiple filters, we have to supply the values in list form.
\usecase
Retrieve all the genes between 1100000 and 1250000 on chromsome X.
<<>>=
getBM(attributes = "ensembl_gene_id", filters = c("chromosome_name",
"start", "end"), values = list("X", 1100000, 1250000), mart = ensembl)
@
\subsection{Annotating the RNA-seq results}
We can use \Biocpkg{biomaRt} to annotate the results of our RNA-seq analysis. Firstly, we have to recognise that the row names of the \Robject{topHits} data frame are Ensembl IDs. Then it is a matter of deciding what attributes we want to retrieve from the database.
\usecase
Annotate the top hits from the DEseq analysis with their genomic locations and external names and create a merged table containing both DESeq results and the annotations.
<<>>=
myInfo <- getBM(attributes = c("flybase_gene_id", "chromosome_name",
"band", "start_position", "end_position", "external_gene_name"),
filters = "ensembl_gene_id", values = rownames(topHits), mart = ensembl)
myInfo
annotatedHits <- merge(topHits, myInfo, by.x = 0, by.y = 1)
head(annotatedHits)
@
The resulting table, \Robject{annotatedHits}, can now be used to ease the interpretation of the RNA-seq analysis and can be shared with collaborators.
\section{Annotation using pre-built packages}
As an alternative to \Biocpkg{biomaRt}, we can use the variety of pre-built databases that are available as Bioconductor packages. The main advantage of this approach is that we do not require an internet connection to perform the annotation, once the package is installed. A full list of annotation packages is available on the Bioconductor \href{http://tinyurl.com/l7hsus5}{{\color{blue}website}}.
\subsection{Genome sequences}
For instance, we can obtain pre-built and easily accessible packages to interrogate genome sequences. Such packages take advantage of the \Biocpkg{Biostrings} infrastructure that you will have already seen, and employ a database scheme under-the-hood. Thus, making queries is extremely efficient.
\usecase
Load the package that provides the latest genome representation for Drosophila ('dm6')
<<>>=
library(BSgenome)
available.genomes()
library(BSgenome.Dmelanogaster.UCSC.dm6)
Dmelanogaster
@
We can retrieve the sequence for a particular chromosome by subsetting using the {\tt [[} operator and providing the name of a chromosome.
\usecase
Retrieve the sequence for chromosome X by subsetting. Then create a plot to show the lengths of all chromosomes. Restrict the plot to chromosomes over 2,000,000 bases only.
\bioccomment{You can use the length once you have retrieved each chromosome sequence}.
<<>>=
names(Dmelanogaster)
Dmelanogaster[["chrX"]]
chrLen <- sapply(names(Dmelanogaster), function(x) length(Dmelanogaster[[x]]))
barplot(chrLen[which(chrLen > 2000000)], horiz=TRUE,las=2)
@
Biostrings genomes have also been designed to inter-operate with the GenomicRanges functions and classes. A useful consequence is that you can access the genomic sequence for a particular gene by translating the coordinates of the gene into a \Robject{GRanges} object.
\usecase
Use \Biocpkg{biomaRt} to get the coordinates of the fly gene FBgn0040357 on Chromosome X. Translate this into a \Robject{GRanges} object and retrieve the genomic sequence for this gene.
<<>>=
geneInfo <- getBM(attributes = c("chromosome_name", "start_position", "end_position"),
filters = "ensembl_gene_id", values = "FBgn0040357", ensembl)
gr <- GRanges("chrX", IRanges(geneInfo$start_position, geneInfo$end_position))
seq <- getSeq(Dmelanogaster , gr)
seq
@
We can, of course, specify ranges for mutliple genes in the same \Robject{GRanges} object. They can also be manipulated, expanded, collapsed, etc using any of the operations we saw yesterday.
\usecase
Create a \Robject{GRanges} object to represent the top genes from the RNA-seq analysis. Find the genomic sequences of each of these genes, and also the sequences 50 bases upstream.
<<>>=
gr2 <- GRanges(paste0("chr",annotatedHits$chromosome_name),
IRanges(annotatedHits$start_position, annotatedHits$end_position))
flankGr <- flank(gr2, 50)
myseqs <- getSeq(Dmelanogaster,gr2)
myseqs
upSeqs <- getSeq(Dmelanogaster,flankGr)
upSeqs
@
\bioccomment{In the next section, we will see how to retrieve \Robject{GRanges} for genes so that we do not have to create these object manually.}
Various Biostrings operations could now be performed on these sequences. Some suggestions are given below.
<<>>=
substr(myseqs , 1, 10)
translate(myseqs)
source("scripts/gcFunction.R")
gcFunction(myseqs)
@
\Biocpkg{Biostrings} also allows us to export sequence data so that we can process them in another tool.
\usecase
Export the upstream sequences a \textit{fasta} file.
<<>>=
names(upSeqs) <- annotatedHits[, 1]
writeXStringSet(upSeqs , file = "topGenes.fa")
@
\subsection{Organism-level Annotation}
So-called organism level packages provide an alternative to \Biocpkg{biomaRt} and permit annotation queries offline. Each package is built around a central identifier (e.g. Entrez ID) and meta data surrounding the annotation sources used can be displayed once such an organism package has been installed and loaded.
The terminology of {\tt columns} and {\tt keys} is equivalent to {\tt attributes} and {\tt filters} used by \Biocpkg{biomaRt}. The valid columns and keys can be interrogated using \Rfunction{columns} and \Rfunction{keytypes}.
\usecase Load the Drosophilla annotation package and display the metadata describing how the package was created. Find out what information can be retrieved from the package and what types of key can be used.
\bioccomment{For covenience, we often assign a shorter name to the annotation package.}
<<>>=
library(org.Dm.eg.db)
dm <- org.Dm.eg.db
columns(dm)
keytypes(dm)
@
The \Rfunction{select} function is used to make the query. We have to supply a set of valid keys and columns.
\usecase
Annotate your RNA-seq results using the organism package. Choose any columns that you feel might be interesting.
<<>>=
myInfo2 <- select(dm, keytype = "FLYBASE",
columns =c("ENSEMBL", "MAP","CHR","GENENAME","SYMBOL") ,keys=rownames(topHits))
myInfo2
@
In the analysis of the pasilla dataset, we allocated reads to genes. However, in this section we will explore the ways of counting reads that align to other genomic regions of interest. In this case we will use \textit{exons} and have to re-visit the orginal aligned reads. An example bam file is provided in the \Biocexptpkg{pasillaBamSubset} package, but only reads on chromosome 4 are provided.
In order to use the example bam file we will have to get the IDs of genes that occur on chromosome 4. We illustrate this using the \Biocannopkg{org.Dm.eg.db} organism package.
\usecase
Use the appropriate organism package to retrieve the ensembl IDs of all genes located on chromosome 4 of Drosophilla.
<<>>=
chr4Genes <- select(dm, columns = "ENSEMBL", keytype = "CHR", keys = "4")
head(chr4Genes)
chr4ID <- chr4Genes[, 2]
@
\warning{If you get the error message:\\ \Rcode{unused arguments (columns = "ENSEMBL", keytype = "CHR", keys = "4")} \\ when trying to run the \Rfunction{select}, it is because another R package has defined another function called \Rfunction{select} and over-written the function from \Rpackage{AnnotationDBI} that we intended to use. If this occurs the statement \\ \Rcode{select <- AnnotationDbi::select} \\ should allow you to run the line of code.}
\subsection{Transcript packages}
Bioconductor provide a number of pre-built packages that have a database of all transcript information for a particular organism. The package we are going to use is \Biocannopkg{TxDb.Dmelanogaster.UCSC.dm3.ensGene}. A full list of available transcript packages is available on the Bioconductor \href{http://tinyurl.com/l7hsus5}{{\color{blue}website}}. Look for the packages that start with {\tt TxDb.}.
\bioccomment{At present, there is no transcript database package relating to the dm6 genome, so we will use dm3 instead.}
As its name suggests, \Biocannopkg{TxDb.Dmelanogaster.UCSC.dm3.ensGene} was built from the UCSC tables for the dm3 genome and uses Ensembl genes as an identifier. Convenient functions exist that can return the gene structure as a \Rclass{GRanges} object. One such function is \Rfunction{exonsBy} which returns a list of \Robject{GRanges} objects, each item in the list pertaining to a particular gene. Thus, the output of \Rfunction{exonsBy} can be subset in the usual manner ({\tt '[[}').
\bioccomment{transcripts and cds regions can be selected in a similar fashion. See the help pages of \Rfunction{intronsBy} and \Rfunction{cdsBy}}
\usecase
Get all the exons for all genes on chromosome 4
<<>>=
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
tx <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
exo <- exonsBy(tx, "gene")
exo
length(exo)
exo4 <- exo[names(exo) %in% chr4ID]
length(exo4)
@
\bioccomment{If you cannot find a TranscriptDb package that provides the annotation you need, you can use the \Rfunction{makeTxDbPackage} function to create one from the UCSC browser, biomaRt or another source. See the help page for more details.}
\usecase
Read the example {\tt untreated3\_chr4} bam file from the \Biocexptpkg{pasillaBamSubset} package. Verify that only chromosome 4 reads are included in the object
\bioccomment{untreated3\_chr4() is a convenient function to give the absolute location of the example file in the \Biocexptpkg{pasillaBamSubset} package.}
<<>>=
library(pasillaBamSubset)
library(GenomicAlignments)
bam <- readGAlignments(untreated3_chr4(), use.name = TRUE)
table(seqnames(bam))
@
We are now in a position to begin overlapping reads with exons. You should have already seen the \Rfunction{findOverlaps} function that will report the amount of ovelap between two sets of ranges. We will start simple by just doing an overlap of a single gene to make sure that we understand the output
\usecase
Use the \Rfunction{findOverlaps} function to count the number of reads covering each exon of FBgn0002521.
<<>>=
olaps <- findOverlaps(exo4[["FBgn0002521"]], bam)
olaps
bam[subjectHits(olaps)]
@
\Rfunction{summarizeOverlaps} extends \Rfunction{findOverlaps} by providing options to resolve reads that overlap multiple features. Each read is counted a maximum of once. Different modes of counting are available. See the help page for \Rfunction{summarizeOverlaps} for more details.
\usecase
Repeat the counting for all genes on chromosome 4
<<>>=
olapGenes <- summarizeOverlaps(exo4, bam)
countGenes <- assays(olapGenes)$counts
countGenes
@
The count vector has entries for each gene on chromosome 4. If we wanted counts for all exons, then we would have to supply a slightly different argument to the function, by collapsing the list structure of the exons object.
\usecase
Obtain per-exon counts for each gene
<<>>=
exonRanges <- unlist(exo4)
olapExon <- summarizeOverlaps(exonRanges , bam)
countExons <- assays(olapExon)$counts
countExons
@
The \Rfunction{summarizeOverlaps} function can also take the location of one or more bam files as an argument. This means that the processing of reads and counting is handled in one step. However, we have to use the special \Rfunction{BamFileList} constructor when dealing with multiple files.
\usecase
Obtain counts of chromosome 4 genes for the bam files included with the \Biocexptpkg{pasillaSubset} package.
<<>>=
fls <- c(untreated3_chr4(), untreated1_chr4())
names(fls) <- basename(fls)
multiOlap <- summarizeOverlaps(exo4, BamFileList(fls))
head(assays(multiOlap)$counts)
@
\subsection{Importing and annotating using custom tracks}
Genome browser or other tracks are often distributed as files in the formats gff, bed, wig etc. The package \Biocpkg{rtracklayer} provides infrastrucutre to import files in these formats and create a \Rclass{GRanges} representation.
\usecase
Read the gff file supplied with the pasilla package and use these intervals to produce a table of overlaps.
<<>>=
library(rtracklayer)
gffFile <- "Dmel.BDGP5.25.62.DEXSeq.chr.gff"
gff <- paste(system.file("extdata", package = "pasilla"), gffFile,sep="/")
read.table(gff, nrows = 10, sep = "\t")
gffRange <- import(gff)
gffRange
gffOverlap <- summarizeOverlaps(gffRange , bam)
@
%5\usecase
%5Find out what tracks are available for the dm3 genome. Download the RepeatMasker track.
%%\warning{The \Rfunction{getTable} function below seems to take a long time to load. Please don't wait too long for it to finish}
%%%<<>>=
%%mySession <- browserSession()
%%track.names <- trackNames(ucscTableQuery(mySession))
%%repeats <- getTable(ucscTableQuery(mySession, track = "rmsk",
%%table = "rmsk"))
%%head(repeats)
%%@
\section{Visualisation}
\subsection{Exporting tracks}
It is also possible to save the results of a Bioconductor analysis in a browser to enable interactive analysis and integration with other data types, or sharing with collaborators. We shall use the bed format for illustration.
\usecase
Select all significant genes from the pasilla analysis (lets use a p-value of 0.1) and annotate their genome location. Create a \Rclass{GRanges} representation of the locations.
<<>>=
sigResults <- pasillaRes[which(pasillaRes$PValue< 0.1), ]
myInfo <- getBM(attributes = c("flybase_gene_id", "chromosome_name",
"band", "start_position", "end_position", "external_gene_name"),
filters = "flybase_gene_id", values = rownames(sigResults), mart = ensembl)
myInfo <- merge(sigResults, myInfo, by.x = 0, by.y = 1)
sigRanges <- GRanges(paste("chr", myInfo$chromosome_name, sep = ""),
IRanges(start = myInfo$start_position , end = myInfo$end_position ,
names = myInfo[, 1]))
@
Rather than just representing the genomic locations, the .bed format is also able to colour each range according to some property of the analysis (e.g. direction and magnitude of change) to help highlight particular regions of interest. A score can also be displayed when a particular region is clicked-on. A useful propery of \Robject{GenomicRanges} is that we can attach metadata to each range using the \Rfunction{mcols} function.
\usecase
Store the adjusted p-values and log fold-change in the ranges object
<<>>=
mcols(sigRanges)$padj = myInfo$PValue
mcols(sigRanges)$logfc = myInfo$logFC
sigRanges
@
\usecase
Create a score from the p-values and colour scheme based on the fold-change. For convenience, restrict the fold changes to be within the region -3 to 3.
<<>>=
Score <- -log10(sigRanges$padj)
rbPal <-colorRampPalette(c("red", "blue"))
logfc <- pmax(sigRanges$logfc, -3)
logfc <- pmin(logfc , 3)
Col <- rbPal(10)[as.numeric(cut(logfc, breaks = 10))]
@
The colours and score can be saved in the \Rclass{GRanges} object and \Rcode{score} and \Rcode{itemRgb} columns respectively and will be used to construct the track when the export is called.
\usecase
Export the signifcant results from the DE analysis as a \textit{.bed} track. You can load the resulting file in IGV, if you wish.
<<>>=
mcols(sigRanges)$score <- Score
mcols(sigRanges)$itemRgb <- Col
export(sigRanges , con = "topHits.bed")
@
The export of other formats such as \textit{gff}, \textit{bedGraph}, \textit{wig} and \textit{bigwig} is also possible with \Biocpkg{rtracklayer}. We can also call the UCSC browser directly to display the tracks. See the \Biocpkg{rtracklayer} vignette for details.
\subsection{ggbio}
We will now take a brief look at one of the visualisation packages in Bioconductor that takes advantages of the GenomicRanges and GenomicFeatures object-types.
The documentation for \Biocpkg{ggbio} is very extensive and contains lots of examples.\newline
{\tt http://www.tengfei.name/ggbio/docs/}
Extra background reading can be found on the ggplot2 website; the package on which the principles of ggbio is based.\newline
{\tt http://ggplot2.org/}
The \Biocpkg{Gviz} package is another Bioconductor package that specialising in genomic visualisations, but we will not explore this package in the course.
A useful function within \Biocpkg{ggbio} is \Rfunction{autoplot}, which will construct an appropriate plot based on the object-type of the input. For example, if we pass a GRanges object to the function it will plot the genomic locations on a linear scale and label chromosome and positional information on the plot. The style of the plot can be changed by the layout argument.
\usecase
Plot the locations of the significant hits from the DE analysis. Try the default, karyogram and circle layouts
<<>>=
library(ggbio)
autoplot(sigRanges)
autoplot(sigRanges[seqnames(sigRanges) == "chrX"])
autoplot(sigRanges , layout = "karyogram")
@
The \textit{Manhattan} plot is a common way of visualising genome-wide results, and this is implemented as the \Rfunction{plotGrandLinear} function. We have to supply a value to display on the y-axis. This is done by using the \Rfunction{aes} function, which is inherited from \Rpackage{ggplot2}.
\usecase
Plot the genomic locations of the significant genes on a linear scale. Modify the colour scheme to show whether each gene is up- or down-regulated
<<>>=
plotGrandLinear(sigRanges , aes(y = score))
mcols(sigRanges)$Up <- logfc > 0
plotGrandLinear(sigRanges, aes(y = score, col = Up))
@
\Biocpkg{ggbio} is also able to plot the structure of genes according to a particular model represented by a \Robject{GenomicFeatures} object.
\usecase
Find the gene which has the most exon counts in the pasilla bam. Plot the struture of this gene.
<<>>=
myGene <- names(exonRanges)[which.max(countExons)]
autoplot(tx, which = exo4[[myGene]])
@
We can even plot the location of sequencing reads if they have been imported using \Rfunction{readGAlignments} function (or similar).
\usecase
Plot the number of reads in this region surrounding the gene.
<<>>=
myreg <- flank(reduce(exo4[[myGene]]), 500, both = T)
bamSubset <- bam[bam %over% myreg]
autoplot(bamSubset)
@
\usecase
Repeat the plot, but with a smoothed coverage
<<>>=
autoplot(bamSubset , stat = "coverage")
@
The plots produced by \Biocpkg{ggbio} (and indeed \Rpackage{ggplot2} on which the package is based) are not in standard R graphics format, so techniques like \Rcode{par(mfrow)} for arranging plots on the same page are not applicable. However, \Biocpkg{ggbio} allows the plots to be saved as objects, which can later be arranged by specialist functions such as \Rfunction{tracks}. This function automatically aligns the x-axis of each plot to be on the same scale.
\usecase
Make a combined plot of the aligned reads and transcripts for the selected gene.
<<>>=
p1 <- autoplot(tx, which = myreg)
p2 <- autoplot(bamSubset, stat = "coverage")
tracks(p1, p2)
@
%%The combined plot can also be embellished by adding an Ideogram to show the location of the region being plotted in relation to the chromosome.
%%\usecase
%%Get the ideogram data for the drosophilla genome and add to the plot
%%<<eval=FALSE>>=
%%plotIdeogram(genome = "dm3")
%%plotIdeogram(genome = "dm3", subchr = "chr4")
%%start <- min(start(bamSubset))
%%end <- max(end(bamSubset))
%%id <- plotIdeogram(genome = "dm3", subchr = "chr4", zoom.region = c(start,end))
%%tracks(id, p1, p2)
%%@
\end{document}