-
Notifications
You must be signed in to change notification settings - Fork 441
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2672 from sigu-training/master
add exomedepth
- Loading branch information
Showing
6 changed files
with
976 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
categories: | ||
- Sequence Analysis | ||
- Variant Analysis | ||
description: "ExomeDepth: Calls copy number variants (CNVs) from targeted sequence data" | ||
homepage_url: https://cran.r-project.org/package=ExomeDepth | ||
long_description: | | ||
Calls copy number variants (CNVs) from targeted sequence data, typically exome sequencing | ||
experiments designed to identify the genetic basis of Mendelian disorders. | ||
name: exomedepth | ||
owner: crs4 | ||
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/exomedepth |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
# Load ExomeDepth library (without warnings) | ||
suppressMessages(library(ExomeDepth)) | ||
|
||
# Import parameters from xml wrapper (args_file) | ||
args <- commandArgs(trailingOnly=TRUE) | ||
param <- read.table(args[1],sep="=", as.is=TRUE) | ||
|
||
# Set common parameters | ||
target <- param[match("target",param[,1]),2] | ||
trans_prob <- as.numeric(param[match("trans_prob",param[,1]),2]) | ||
output <- param[match("output",param[,1]),2] | ||
test_vs_ref <- as.logical(param[match("test_vs_ref",param[,1]),2]) | ||
|
||
# Create symbolic links for multiple bam and bai | ||
bam <- param[param[,1]=="bam",2] | ||
bam_bai <- param[param[,1]=="bam_bai",2] | ||
bam_label <- param[param[,1]=="bam_label",2] | ||
bam_label <- gsub(" ", "_", bam_label) | ||
|
||
for(i in 1:length(bam)){ | ||
stopifnot(file.symlink(bam[i], paste(bam_label[i], "bam", sep="."))) | ||
stopifnot(file.symlink(bam_bai[i], paste(bam_label[i], "bam.bai", sep="."))) | ||
} | ||
|
||
# Generate read count data | ||
BAMFiles <- paste(bam_label, "bam", sep=".") | ||
sink("/dev/null") | ||
ExomeCount <- suppressMessages(getBamCounts(bed.file=target, bam.files = BAMFiles)) | ||
sink() | ||
|
||
# Convert counts in a data frame | ||
ExomeCount.dafr <- as(ExomeCount[, colnames(ExomeCount)], 'data.frame') | ||
|
||
# Prepare the main matrix of read count data | ||
ExomeCount.mat <- as.matrix(ExomeCount.dafr[, grep(names(ExomeCount.dafr), pattern='.bam')]) | ||
|
||
# Remove .bam from sample name | ||
colnames(ExomeCount.mat) <- gsub(".bam", "", colnames(ExomeCount.mat)) | ||
|
||
# Set nsamples == 1 if mode is test vs reference, assuming test is sample 1 | ||
nsamples <- ifelse(test_vs_ref, 1, ncol(ExomeCount.mat)) | ||
|
||
# Loop over samples | ||
for (i in 1:nsamples){ | ||
|
||
# Create the aggregate reference set for this sample | ||
my.choice <- suppressWarnings(suppressMessages( | ||
select.reference.set(test.counts = ExomeCount.mat[,i], | ||
reference.counts = subset(ExomeCount.mat, select=-i), | ||
bin.length = (ExomeCount.dafr$end - ExomeCount.dafr$start)/1000, | ||
n.bins.reduced = 10000))) | ||
|
||
my.reference.selected <- apply(X = ExomeCount.mat[, my.choice$reference.choice, drop=FALSE], | ||
MAR = 1, | ||
FUN = sum) | ||
|
||
# Now create the ExomeDepth object for the CNVs call | ||
all.exons<-suppressWarnings(suppressMessages(new('ExomeDepth', | ||
test = ExomeCount.mat[,i], | ||
reference = my.reference.selected, | ||
formula = 'cbind(test,reference)~1'))) | ||
|
||
|
||
# Now call the CNVs | ||
result <- try(all.exons<-suppressMessages(CallCNVs(x=all.exons, | ||
transition.probability = trans_prob, | ||
chromosome = ExomeCount.dafr$space, | ||
start = ExomeCount.dafr$start, | ||
end = ExomeCount.dafr$end, | ||
name = ExomeCount.dafr$names)), silent=T) | ||
|
||
# Next if CNVs are not detected | ||
if (class(result)=="try-error"){ | ||
next | ||
} | ||
|
||
# Compute correlation between ref and test | ||
my.cor <- cor(all.exons@reference, all.exons@test) | ||
n.call <- nrow(all.exons@CNV.calls) | ||
|
||
# Write results | ||
my.results <- cbind(all.exons@CNV.calls[,c(7,5,6,3)], | ||
sample=colnames(ExomeCount.mat)[i], | ||
corr=my.cor, | ||
all.exons@CNV.calls[,c(4,9,12)]) | ||
|
||
# Re-order by chr and position | ||
chrOrder<-c(paste("chr",1:22,sep=""),"chrX","chrY","chrM") | ||
my.results[,1] <- factor(my.results[,1], levels=chrOrder) | ||
my.results <- my.results[order(my.results[,1], my.results[,2], my.results[,3]),] | ||
|
||
write.table(sep='\t', quote=FALSE, file = output, | ||
x = my.results, | ||
row.names = FALSE, col.names = FALSE, dec=".", append=TRUE) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
<tool id="exomedepth" name="ExomeDepth" version="1.1.0"> | ||
<description>Calls copy number variants (CNVs) from targeted sequence data</description> | ||
<requirements> | ||
<requirement type="package" version="1.1.10">r-exomedepth</requirement> | ||
</requirements> | ||
<version_command><![CDATA[ | ||
echo $(R --version | grep version | grep -v GNU)", ExomeDepth version" $(R --vanilla --slave -e "library(ExomeDepth); cat(sessionInfo()\$otherPkgs\$ExomeDepth\$Version)") | ||
]]></version_command> | ||
<command detect_errors="exit_code"><![CDATA[ | ||
Rscript '${__tool_directory__}/exomedepth.R' '$args_file' | ||
]]></command> | ||
<configfiles> | ||
<configfile name="args_file"><![CDATA[ | ||
target=$targetFile | ||
test_vs_ref=$test_vs_ref | ||
#for $i in $inputs | ||
bam=${i.input} | ||
bam_bai=${i.input.metadata.bam_index} | ||
#if str($i.label.value) != "": | ||
bam_label=${$i.label.value} | ||
#else | ||
bam_label=${i.input.dataset.name} | ||
#end if | ||
#end for | ||
trans_prob=$transition_probability | ||
output=$output | ||
]]></configfile> | ||
</configfiles> | ||
<inputs> | ||
<param name="targetFile" type="data" format="bed" label="Target regions (BED)"> | ||
<validator type="unspecified_build" /> | ||
</param> | ||
<param name="test_vs_ref" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Call CNVs using 1st sample as test" help="If checked, the tool will call CNVs in the first sample vs all the others. If unchecked, an all vs all CNV call will be performed" /> | ||
<repeat name="inputs" title="BAM" min="2" help="Need to add more files? Use controls below."> | ||
<param name="input" type="data" format="bam" label="BAM file"> | ||
<options> | ||
<filter type="data_meta" ref="targetFile" key="dbkey"/> | ||
</options> | ||
</param> | ||
<param name="label" type="text" size="30" value="" label="Label" help="Label to use in the output. If not given, the dataset name will be used instead"> | ||
<validator type="regex" message="Spaces are not allowed">^\S*$</validator> | ||
</param> | ||
</repeat> | ||
<param name="transition_probability" size="10" type="float" value="0.0001" label="Transition probability" help="Transition probability of the hidden Markov Chain from the normal copy number state to either a deletion or a duplication. The default value (0.0001) expects approximately 20 CNVs genome-wide" /> | ||
</inputs> | ||
<outputs> | ||
<data name="output" format="tabular" label="${tool.name} on ${on_string}" /> | ||
</outputs> | ||
<tests> | ||
<test> | ||
<param name="targetFile" value="CNV_TruSeq_Chr2.bed" dbkey="hg19" ftype="bed"/> | ||
<param name="test_vs_ref" value="True"/> | ||
<repeat name="inputs"> | ||
<param name="input" value="CNV_case_small.bam"/> | ||
</repeat> | ||
<repeat name="inputs"> | ||
<param name="input" value="CNV_control_small.bam"/> | ||
</repeat> | ||
<param name="transition_probability" value="0.5"/> | ||
<output name="output"> | ||
<assert_contents> | ||
<has_text text="chr2" /> | ||
<has_text text="97890544" /> | ||
<has_text text="97890616" /> | ||
<has_text text="deletion" /> | ||
<has_text text="CNV_case_small" /> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
</tests> | ||
<help><![CDATA[ | ||
.. class:: warningmark | ||
**Warning about counts for chromosome X** | ||
Calling CNVs on the X chromosome can create issues if the exome sample of interest and the reference exome | ||
samples it is being compared to are not gender matched. | ||
Make sure that the genders are matched properly (i.e. do not use male as a reference for female | ||
samples and vice versa). | ||
**What it does** | ||
This tool uses ExomeDepth to call copy number variants (CNVs) from targeted sequence data. | ||
**Output format** | ||
=========== ======================== | ||
Column Description | ||
----------- ------------------------ | ||
chr Chromosome | ||
start Start of CNV region | ||
end End of CNV region | ||
type CNV type (deletion, duplication) | ||
sample Name of the sample with CNV | ||
corr Correlation between reference and test counts. To get meaningful result, this correlation should really be above 0.97. If this is not the case, consider the output of ExomeDepth as less reliable (i.e. most likely a high false positive rate) | ||
nexons Number of target regions covered by the CNV | ||
BF Bayes factor. It quantifies the statistical support for each CNV. It is in fact the log10 of the likelihood ratio of data for the CNV call divided by the null (normal copy number). The higher that number, the more confident one can be about the presence of a CNV. While it is difficult to give an ideal threshold, and for short exons the Bayes Factor are bound to be unconvincing, the most obvious large calls should be easily flagged by ranking them according to this quantity | ||
reads.ratio Observed/expected reads ratio | ||
=========== ======================== | ||
**What ExomeDepth does and does not do** | ||
ExomeDepth uses read depth data to call CNVs from exome sequencing experiments. A key idea is that the test | ||
exome should be compared to a matched aggregate reference set. This aggregate reference set should combine | ||
exomes from the same batch and it should also be optimized for each exome. It will certainly differ from one exome | ||
to the next. | ||
Importantly, ExomeDepth assumes that the CNV of interest is absent from the aggregate reference set. Hence | ||
related individuals should be excluded from the aggregate reference. It also means that ExomeDepth can miss | ||
common CNVs, if the call is also present in the aggregate reference. ExomeDepth is really suited to detect rare | ||
CNV calls (typically for rare Mendelian disorder analysis). | ||
The ideas used in this package are of course not specific to exome sequencing and could be applied to other | ||
targeted sequencing datasets, as long as they contain a sufficiently large number of exons to estimate the parameters | ||
(at least 20 genes, say, but probably more would be useful). Also note that PCR based enrichment studies are often | ||
not well suited for this type of read depth analysis. The reason is that as the number of cycles is often set to a high | ||
number in order to equalize the representation of each amplicon, which can discard the CNV information. | ||
]]></help> | ||
<citations> | ||
<citation type="doi">10.1093/bioinformatics/btu135</citation> | ||
<citation type="doi">10.1093/bioinformatics/bts526</citation> | ||
</citations> | ||
</tool> |
Oops, something went wrong.