-
Notifications
You must be signed in to change notification settings - Fork 441
/
volcanoplot.xml
325 lines (271 loc) · 15.6 KB
/
volcanoplot.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
<tool id="volcanoplot" name="Volcano Plot" version="0.0.5">
<description>create a volcano plot</description>
<edam_topics>
<edam_topic>topic_0092</edam_topic>
</edam_topics>
<edam_operations>
<edam_operation>operation_0337</edam_operation>
</edam_operations>
<requirements>
<requirement type="package" version="3.3.3">r-ggplot2</requirement>
<requirement type="package" version="0.9.1">r-ggrepel</requirement>
<requirement type="package" version="1.0.6">r-dplyr</requirement>
</requirements>
<version_command><![CDATA[
echo $(R --version | grep version | grep -v GNU)", ggplot2 version" $(R --vanilla --slave -e "library(ggplot2); cat(sessionInfo()\$otherPkgs\$ggplot2\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", ggrepel version" $(R --vanilla --slave -e "library(ggrepel); cat(sessionInfo()\$otherPkgs\$ggrepel\$Version)" 2> /dev/null | grep -v -i "WARNING: ")", dplyr version" $(R --vanilla --slave -e "library(dplyr); cat(sessionInfo()\$otherPkgs\$dplyr\$Version)" 2> /dev/null | grep -v -i "WARNING: ")
]]></version_command>
<command detect_errors="exit_code"><![CDATA[
#if $out_options.rscript_out:
cp '${volcanoplot_script}' rscript.txt &&
#end if
Rscript '${volcanoplot_script}'
]]></command>
<configfiles>
<configfile name="volcanoplot_script"><![CDATA[
# Galaxy settings start ---------------------------------------------------
# setup R error handling to go to stderr
options(show.error.messages = F, error = function() {cat(geterrmessage(), file = stderr()); q("no", 1, F)})
# we need that to not crash galaxy with an UTF8 error on German LC settings.
loc <- Sys.setlocale("LC_MESSAGES", "en_US.UTF-8")
# Galaxy settings end -----------------------------------------------------
# Load packages -----------------------------------------------------------
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
library(ggrepel)
})
# Import data ------------------------------------------------------------
#if $header == "yes"
results <- read.delim('$input', header = TRUE)
#elif $header == "no"
results <- read.delim('$input', header = FALSE)
#else
# Auto-detect header by checking if P value column is numeric or not
first_line <- read.delim('$input', header = FALSE, nrow = 1)
first_pvalue <- first_line[, $pval_col]
if (is.numeric(first_pvalue)) {
print("No header row detected")
results <- read.delim('$input', header = FALSE)
} else {
print("Header row detected")
results <- read.delim('$input', header = TRUE)
}
#end if
# Format data ------------------------------------------------------------
# Create columns from the column numbers specified
results <- results %>% mutate(fdr = .[[$fdr_col]],
pvalue = .[[$pval_col]],
logfc = .[[$lfc_col]],
labels = .[[$label_col]])
# Get names for legend
down <- unlist(strsplit('$plot_options.legend_labs', split = ","))[1]
notsig <- unlist(strsplit('$plot_options.legend_labs', split = ","))[2]
up <- unlist(strsplit('$plot_options.legend_labs', split = ","))[3]
# Set colours
colours <- setNames(c("cornflowerblue", "grey", "firebrick"), c(down, notsig, up))
# Create significant (sig) column
results <- mutate(results, sig = case_when(
fdr < $signif_thresh & logfc > $lfc_thresh ~ up,
fdr < $signif_thresh & logfc < -$lfc_thresh ~ down,
TRUE ~ notsig))
## R code below is left aligned for R script output
#if $labels.label_select != "none"
# Specify genes to label --------------------------------------------------
#if $labels.label_select == "file"
# Import file with genes of interest
labelfile <- read.delim('$labels.label_file', header = TRUE)
# Label the genes of interest in results table
results <- mutate(results, labels = ifelse(labels %in% labelfile[, 1], labels, ""))
#elif $labels.label_select == "signif"
#if not $labels.top_num
# Label all significant genes in results table
results <- mutate(results, labels=ifelse(sig != notsig, labels, ""))
#elif $labels.top_num > 0
# Get top genes by P value
top <- slice_min(results, order_by = pvalue, n = $labels.top_num)
# Extract into vector
toplabels <- pull(top, labels)
# Label just the top genes in results table
results <- mutate(results, labels = ifelse(labels %in% toplabels, labels, ""))
#end if
#end if
#end if
# Create plot -------------------------------------------------------------
# Open file to save plot as PDF
pdf("volcano_plot.pdf")
# Set up base plot
p <- ggplot(data = results, aes(x = logfc, y = -log10(pvalue))) +
geom_point(aes(colour = sig)) +
scale_color_manual(values = colours) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"),
legend.key = element_blank())
#if $labels.label_select != "none"
# Add gene labels
#if $plot_options.boxes
p <- p + geom_label_repel(data = filter(results, labels != ""), aes(label = labels),
min.segment.length = 0,
max.overlaps = Inf,
show.legend = FALSE)
#else
p <- p + geom_text_repel(data = filter(results, labels != ""), aes(label = labels),
min.segment.length = 0,
max.overlaps = Inf,
show.legend = FALSE)
#end if
#end if
#if not '$plot_options.title'
p <- p + ggtitle('$plot_options.title')
#end if
#if not '$plot_options.xlab'
p <- p + xlab('$plot_options.xlab')
#end if
#if not '$plot_options.ylab'
p <- p + ylab('$plot_options.ylab')
#end if
#if not '$plot_options.xmin' and '$plot_options.xmax'
p <- p + xlim('$plot_options.xmin', '$plot_options.xmax')
#end if
#if not '$plot_options.ymax'
p <- p + ylim(0, '$plot_options.ymax')
#end if
# Set legend title
#if not '$plot_options.legend'
p <- p + theme(legend.title = '$plot_options.legend')
#else
p <- p + theme(legend.title = element_blank())
#end if
# Print plot
print(p)
# Close PDF graphics device
dev.off()
# R and Package versions -------------------------------------------------
sessionInfo()
]]></configfile>
</configfiles>
<inputs>
<param name="input" type="data" format="tabular" label="Specify an input file" />
<param name="header" type="select" label="File has header?" help="Does the differentially expressed results file contain a header row. The tool can auto-detect by checking if the first row in the P value column is a number or not. Default: Auto-detect">
<option value="auto" selected="True">Auto-detect</option>
<option value="yes">Yes</option>
<option value="no">No</option>
</param>
<param name="fdr_col" type="data_column" data_ref="input" label="FDR (adjusted P value)" />
<param name="pval_col" type="data_column" data_ref="input" label="P value (raw)" />
<param name="lfc_col" type="data_column" data_ref="input" label="Log Fold Change" />
<param name="label_col" type="data_column" data_ref="input" label="Labels" />
<param name="signif_thresh" type="float" max="1" value="0.05" label="Significance threshold" help="Default: 0.05"/>
<param name="lfc_thresh" type="float" value="0" label="LogFC threshold to colour" help="Default: 0"/>
<conditional name="labels">
<param name="label_select" type="select" label="Points to label" help="Select to label significant points or input labels from file. Default: None">
<option value="none" selected="True">None</option>
<option value="signif">Significant</option>
<option value="file">Input from file</option>
</param>
<when value="signif">
<param name="top_num" type="integer" optional="True" label="Only label top most significant" help="Specify the top number of points to label by P value significance. If no number is specified, all points that pass the FDR and Log Fold Change thresholds will be labelled."/>
</when>
<when value="file">
<param name="label_file" type="data" format="tabular" label="File of labels"/>
</when>
<when value="none" />
</conditional>
<section name="plot_options" expanded="false" title="Plot Options">
<param name="boxes" type="boolean" truevalue="True" falsevalue="False" checked="False" label="Label Boxes" help="If this is set to Yes, the labels for the points will be in boxes. Default: No"/>
<param name="title" type="text" optional="True" label="Plot title"/>
<param name="xlab" type="text" optional="True" label="Label for x axis"/>
<param name="ylab" type="text" optional="True" label="Label for y axis"/>
<param name="xmin" type="float" optional="True" label="Minimum value for x axis" help="To customise the x axis limits, specify both minimum and maximum values. Leave empty for automatic values."/>
<param name="xmax" type="float" optional="True" label="Maximum value for x axis" help="To customise the x axis limits, specify both minimum and maximum values. Leave empty for automatic values."/>
<param name="ymax" type="float" optional="True" label="Maximum value for y axis" help="To customise the y axis upper limit, specify the maximum value, the minimum will be 0. Leave empty for automatic value."/>
<param name="legend" type="text" optional="True" label="Label for Legend Title"/>
<param name="legend_labs" type="text" value="Down,Not Sig,Up" label="Labels for Legend" help="Labels in the legend can be specified. Default: Down,Not Sig,Up"/>
</section>
<section name="out_options" expanded="false" title="Output Options">
<param name="rscript_out" type="boolean" truevalue="True" falsevalue="False" checked="False" label="Output Rscript?"
help="Output the R code used by the tool. Can edit in R if you want to customise the plot further. Default: No"/>
</section>
</inputs>
<outputs>
<data name="plot" format="pdf" from_work_dir="volcano_plot.pdf" label="${tool.name} on ${on_string}: PDF"/>
<data name="rscript" format="txt" from_work_dir="rscript.txt" label="${tool.name} on ${on_string}: Rscript">
<filter>out_options['rscript_out']</filter>
</data>
</outputs>
<tests>
<test expect_num_outputs="1">
<!-- Ensure default output works -->
<param name="input" ftype="tabular" value="input.tab"/>
<param name="fdr_col" value="4" />
<param name="pval_col" value="3" />
<param name="lfc_col" value="2" />
<param name="label_col" value="1" />
<param name="lfc_thresh" value="0" />
<output name="plot">
<assert_contents>
<has_size value= "933451" delta="1000" />
</assert_contents>
</output>
</test>
<test expect_num_outputs="1">
<!-- Ensure input labels and plot options work -->
<param name="input" ftype="tabular" value="input.tab"/>
<param name="fdr_col" value="4" />
<param name="pval_col" value="3" />
<param name="lfc_col" value="2" />
<param name="label_col" value="1" />
<param name="lfc_thresh" value="0" />
<param name="label_select" value="file"/>
<param name="label_file" ftype="tabular" value="labels.tab" />
<output name="plot">
<assert_contents>
<has_size value= "933832" delta="1000" />
</assert_contents>
</output>
</test>
<test expect_num_outputs="2">
<!-- Ensure rscript output works -->
<param name="input" ftype="tabular" value="input.tab"/>
<param name="header" value="yes"/>
<param name="fdr_col" value="4" />
<param name="pval_col" value="3" />
<param name="lfc_col" value="2" />
<param name="label_col" value="1" />
<param name="lfc_thresh" value="0" />
<param name="label_select" value="file"/>
<param name="label_file" ftype="tabular" value="labels.tab" />
<param name="rscript_out" value="True"/>
<output name="plot">
<assert_contents>
<has_size value= "933832" delta="1000" />
</assert_contents>
</output>
<output name="rscript" value= "out.rscript" lines_diff="4"/>
</test>
</tests>
<help><![CDATA[
.. class:: infomark
**What it does**
This tool creates a Volcano plot using ggplot2. Points can be labelled via ggrepel. It was inspired by this Getting Genetics Done `blog post`_.
In statistics, a `Volcano plot`_ is a type of scatter-plot that is used to quickly identify changes in large data sets composed of replicate data. It plots significance versus fold-change on the y and x axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate data points between two conditions and one wishes to quickly identify the most meaningful changes. A volcano plot combines a measure of statistical significance from a statistical test (e.g., a p value from an ANOVA model) with the magnitude of the change, enabling quick visual identification of those data-points (genes, etc.) that display large magnitude changes that are also statistically significant.
A volcano plot is constructed by plotting the negative log of the p value on the y axis (usually base 10). This results in data points with low p values (highly significant) appearing toward the top of the plot. The x axis is the log of the fold change between the two conditions. The log of the fold change is used so that changes in both directions appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found toward the top of the plot that are far to either the left- or right-hand sides. These represent values that display large magnitude fold changes (hence being left or right of center) as well as high statistical significance (hence being toward the top).
Source: Wikipedia
-----
**Inputs**
A tabular file containing the columns below (additional columns may be present):
* P value
* FDR / adjusted P value
* Log fold change
* Labels (e.g. Gene symbols or IDs)
All significant points, those meeting the specified FDR and Log Fold Change thresholds, will be coloured, red for upregulated, blue for downregulated. Users can choose to apply labels to the points (such as gene symbols) from the Labels column. To label all significant points, select "Significant" for the **Points to label** option, or to only label the top most significant specify a number under "Only label top most significant". Users can label any points of interest through selecting **Points to label** "Input from file" and providing a tabular labels file. The labels file must contain a header row and have the labels in the first column. These labels must match the labels in the main input file.
**Outputs**
A PDF containing a Volcano plot like below. The R code can be output through *Output Options* in the tool form.
.. image:: $PATH_TO_IMAGES/volcano_plot.png
.. _Volcano plot: https://en.wikipedia.org/wiki/Volcano_plot_(statistics)
.. _blog post: https://gettinggeneticsdone.blogspot.com/2016/01/
]]></help>
<citations>
</citations>
</tool>