Skip to content

Commit

Permalink
WisecondorX galaxy wrapper (#696)
Browse files Browse the repository at this point in the history
* ~64 commits to develop this first version of WisecondorX Galaxy wrapper

* Downgrading to release_23.1 for the ci tests !
  • Loading branch information
drosofff authored Dec 15, 2024
1 parent ae36c63 commit b391b0f
Show file tree
Hide file tree
Showing 49 changed files with 678 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ on:
- '*'
env:
GALAXY_FORK: galaxyproject
GALAXY_BRANCH: release_23.2
GALAXY_BRANCH: release_23.1
MAX_CHUNKS: 4
MAX_FILE_SIZE: 2M
concurrency:
Expand Down
13 changes: 13 additions & 0 deletions tools/wisecondorx/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# .shed.yml supporting automatic pushes.
owner: artbio
name: wisecondorx
description: WisecondorX
long_description: |
Improved copy number detection for routine shallow whole-genome sequencing.
See homepage URL for manual and code.
categories:
- Variant Analysis
homepage_url: https://github.com/CenterForMedicalGeneticsGhent/WisecondorX/tree/master
remote_repository_url: https://github.com/ARTbio/tools-artbio/tree/master/tools/wisecondorx
toolshed:
- toolshed
54 changes: 54 additions & 0 deletions tools/wisecondorx/macro.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
<macros>
<token name="@VERSION@">1.2.9</token>
<token name="@WRAPPER_VERSION@">@VERSION@+galaxy0</token>
<token name="@PROFILE@">23.0</token>
<token name="@pipefail@"><![CDATA[set -o | grep -q pipefail && set -o pipefail;]]></token>

<xml name="requirements">
<requirements>
<requirement type="package" version="@VERSION@">wisecondorx</requirement>
</requirements>
</xml>
<token name="@help@"><![CDATA[
**What it does**
WisecondorX, which uses a within-sample normalization technique, detects Copy
Number Variation from BAM input files.
It is important that **no** read quality filtering is executed prior to running
WisecondorX: this software requires low-quality reads to distinguish informative
bins from non-informative ones.
There are three main stages (converting, reference build and predicting) when
using WisecondorX:
**1. Convert .bam files** of aligned reads to .npz files (for both normal and
tumor samples) using the Galaxy tool **WisecondorX convert bam to npz**
**2. Buid a reference index** from .npz files from **normal** samples using the
Galaxy tool **WisecondorX build reference**.
.. class:: warningmark
Automated gender prediction, required to consistently analyze sex chromosomes,
is based on a Gaussian mixture model. If few samples (<20) are included during
reference creation, or not both male and female samples (for NIPT, this means
male and female feti) are represented, this process might not be accurate.
Therefore, alternatively, one can manually tweak the --yfrac parameter.
.. class:: warningmark
It is of paramount importance that the reference set consists of exclusively
negative (normal) control samples that originate from the same sequencer, mapper,
reference genome, type of material, ... etc, as the test samples. As a rule of
thumb, think of all laboratory and in silico steps: the more sources of bias that
can be omitted, the better.
Try to include at least 50 samples per reference. The more the better, yet, from
500 on it is unlikely to observe additional improvement concerning normalization.
**3. Predict Copy Number Variantions** from the reference index and tumor .npz cases
of interest using the Galaxy tool **WisecondorX predict CNVs**
]]></token>
</macros>
Binary file added tools/wisecondorx/test-data/0.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/1.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/2.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/3.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/4.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/5.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/6.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/7.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/8.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/9.ref.npz
Binary file not shown.
Binary file added tools/wisecondorx/test-data/chr1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr12.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr14.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr15.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr16.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr17.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr18.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr19.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr20.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr21.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr22.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chr9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/chrX.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tools/wisecondorx/test-data/genome_wide.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file added tools/wisecondorx/test-data/npz_convert_input.bam
Binary file not shown.
Binary file not shown.
Binary file added tools/wisecondorx/test-data/output_reference.npz
Binary file not shown.
1 change: 1 addition & 0 deletions tools/wisecondorx/test-data/predict_abberations.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chr start end ratio zscore type
317 changes: 317 additions & 0 deletions tools/wisecondorx/test-data/predict_bins.bed

Large diffs are not rendered by default.

28 changes: 28 additions & 0 deletions tools/wisecondorx/test-data/predict_segments.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
chr start end ratio zscore
1 1 120000000 -0.0087 -2.538405323843277
1 140000001 250000000 0.0176 4.312862224344486
2 1 250000000 0.0077 3.3452597487045215
3 1 200000000 0.0058 2.311500455599722
4 1 40000000 0.0007 0.11656739335236131
4 40000001 190000000 0.0006 0.35088837729023276
5 1 190000000 -0.0138 -4.742762233161156
6 1 170000000 0.0029 0.9290111777306987
7 1 160000000 -0.0066 -1.3173166420256102
8 1 150000000 -0.0093 -2.345755449706625
9 1 20000000 0.0057 0.5378706474020528
9 20000001 40000000 -0.0068 -0.5822999425114906
9 60000001 140000000 -0.0066 -0.8909887079018787
10 1 140000000 -0.0053 -0.972447941393507
11 1 140000000 -0.0011 -0.25869752816517777
12 1 140000000 -0.0067 -1.6120716034316234
13 20000001 120000000 0.0091 2.6165198496794595
14 20000001 110000000 0.0046 1.4493631003182375
15 30000001 110000000 0.0022 0.5631104559578264
16 1 90000000 0.0009 0.20349127260764568
17 1 90000000 0.0198 3.2916417620408067
18 1 90000000 0.0209 2.8871676559099937
19 10000001 60000000 -0.0008 -0.09701491554952066
20 1 70000000 0.0034 0.49591449488386785
21 10000001 50000000 -0.0053 -0.6285903890420564
22 10000001 60000000 -0.0177 -1.875889555547745
X 1 160000000 -0.0134 -2.6093805092022024
29 changes: 29 additions & 0 deletions tools/wisecondorx/test-data/predict_statistics.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
chr ratio.mean ratio.median zscore
1 0.003256724685060297 0.009032538940713365 1.1604576995182534
2 0.007728111371764734 0.00879518825265517 3.6168182974192224
3 0.0057970691646083835 0.007918358867360545 2.2102487208570447
4 0.0005794303608920617 -0.0038693714215764057 0.3050780556626696
5 -0.013750074657332427 -0.00306199917667051 -5.00766851062217
6 0.00293681601872286 -0.0016253998352964258 0.9407744372804929
7 -0.0066155513952186815 -0.007596035725320216 -1.3291500169147508
8 -0.009253570570903735 -0.01001300559828008 -2.3586673175031008
9 -0.004174118616434959 -0.0024356973032108246 -0.7243064497148664
10 -0.005266146751525984 0.0028177250094220192 -1.0963168057162755
11 -0.0010682214397647377 -0.012687471128064745 -0.25403103304752794
12 -0.006720586939014568 -0.007785953193232591 -1.7313975674091997
13 0.009139617144346813 0.002351198145606662 2.7174009747784944
14 0.004617644740226149 0.0035058727944338825 1.4966978929197126
15 0.0021565565774295426 -0.005083704704282363 0.5800013406347537
16 0.0009244703405124733 0.007301717962691849 0.20870556732386122
17 0.01982791519920036 0.006869817773389827 3.1146698354398854
18 0.020898589066626748 0.009665993221586318 2.845910113450034
19 -0.000657740980105667 0.0016192329112033454 -0.08006761423301532
20 0.003446546608787667 0.002058925822068644 0.5033887759993748
21 -0.005290652555760193 -0.011028957124213846 -0.5341417465937656
22 -0.017713587100928845 -0.0327155577009055 -1.6790692284983704
X -0.013378674115274611 -0.015072799437654107 -2.6107602961602137
Gender based on --yfrac (or manually overridden by --gender): F
Number of reads: 2081160
Standard deviation of the ratios per chromosome: 0.00932
Median segment variance per bin (doi: 10.1093/nar/gky1263): 0.0008
Copy number profile abnormality (CPA) score (doi: 10.1186/s13073-020-00735-4): 2.32398
57 changes: 57 additions & 0 deletions tools/wisecondorx/wisecondor_npz_converter.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
<tool id="wisecondorx_npz_converter" name="WisecondorX convert" version="@WRAPPER_VERSION@" profile="@PROFILE@">
<description>
bam to npz
</description>
<macros>
<import>macro.xml</import>
</macros>
<expand macro="requirements"/>
<stdio>
<exit_code range="1:" level="fatal" description="Error occured" />
</stdio>
<command detect_errors="exit_code"><![CDATA[
@pipefail@
ln -f -s $bam.metadata.bam_index input.bam.bai &&
ln -f -s $bam input.bam &&
printf "Creating 5kb bins for file $bam.element_identifier" &&
WisecondorX convert input.bam output.npz
]]></command>
<inputs>
<param name="bam" type="data" label="Bam input" format="bam"
help="input Bam is converted in .npz file"/>
</inputs>
<outputs>
<data name="npz" format="npz" from_work_dir="output.npz" label="${on_string}.npz" />
</outputs>
<tests>
<test expect_num_outputs="1">
<param ftype="bam" name="bam" value="npz_convert_input.bam" />
<output name="npz" ftype="npz" file="npz_convert_output.npz" compare="sim_size" delta="10000"/>
</test>
</tests>
<help>
@help@
<![CDATA[
.. class:: infomark
**WisecondorX convert input.bam/cram output.npz [--optional arguments]**
Option List::
--reference Fasta reference to be used with cram inputs.
This option is currently not available in this Galaxy wrapper,
which takes only bam inputs.
--binsize Size per bin in bp; the reference bin size should be a multiple of this value.
Note that this parameter does not impact the resolution, yet it
can be used to optimize processing speed (default: x=5e3).
The --binsize parameter is currently not exposed in this Galaxy
wrapper and is fixed to 5e3
--normdup Use this flag to avoid duplicate removal.
The --normdup parameter is currently not exposed in this Galaxy
wrapper. Default is to remove duplicates.
]]></help>
<citations>
<citation type="doi">10.1093/nar/gky1263</citation>
</citations>
</tool>
112 changes: 112 additions & 0 deletions tools/wisecondorx/wisecondor_predict.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
<tool id="wisecondorx_predict" name="WisecondorX predict" version="@WRAPPER_VERSION@" profile="@PROFILE@">
<description>
CNVs
</description>
<macros>
<import>macro.xml</import>
</macros>
<expand macro="requirements"/>
<stdio>
<exit_code range="1:" level="fatal" description="Error occured" />
</stdio>
<command detect_errors="exit_code"><![CDATA[
@pipefail@
ln -s $npz_input sample.npz &&
ln -s $reference reference.npz &&
WisecondorX predict sample.npz reference.npz sample --plot --bed
]]></command>
<inputs>
<param name="npz_input" type="data" format="npz" label="npz file from sample to analyse"/>
<param name="reference" type="data" format="npz" label="npz reference built with WisecondorX build"/>
</inputs>
<outputs>
<data name="aberrations" format="bed" label="sample abberations" from_work_dir="sample_aberrations.bed"/>
<data name="bins" format="bed" label="sample bins" from_work_dir="sample_bins.bed"/>
<data name="segments" format="bed" label="sample segments" from_work_dir="sample_segments.bed"/>
<data name="statistics" format="txt" label="statistics" from_work_dir="sample_statistics.txt"/>
<collection name="plots" type="list" format="png" label="CNV plots">
<discover_datasets pattern="__name_and_ext__" directory="sample.plots" />
</collection>
</outputs>
<tests>
<test expect_num_outputs="5">
<param name="npz_input" value="input_sample_predict.npz" ftype="npz" />
<param name="reference" value="input_reference_predict.10000kb.npz" />
<output name="aberrations" ftype="bed" file="predict_abberations.bed" />
<output name="bins" ftype="bed" file="predict_bins.bed" compare="sim_size" delta="1000"/>
<output name="segments" ftype="bed" file="predict_segments.bed"/>
<output name="statistics" ftype="txt" file="predict_statistics.txt" compare="sim_size" delta="1000"/>
<output_collection name="plots" type="list">
<element name="chr1" file="chr1.png" compare="sim_size" delta="10000"/>
<element name="chr10" file="chr10.png" compare="sim_size" delta="10000"/>
<element name="chr11" file="chr11.png" compare="sim_size" delta="10000"/>
<element name="chr12" file="chr12.png" compare="sim_size" delta="10000"/>
<element name="chr13" file="chr13.png" compare="sim_size" delta="10000"/>
<element name="chr14" file="chr14.png" compare="sim_size" delta="10000"/>
<element name="chr15" file="chr15.png" compare="sim_size" delta="10000"/>
<element name="chr16" file="chr16.png" compare="sim_size" delta="10000"/>
<element name="chr17" file="chr17.png" compare="sim_size" delta="10000"/>
<element name="chr18" file="chr18.png" compare="sim_size" delta="10000"/>
<element name="chr19" file="chr19.png" compare="sim_size" delta="10000"/>
<element name="chr2" file="chr2.png" compare="sim_size" delta="10000"/>
<element name="chr20" file="chr20.png" compare="sim_size" delta="10000"/>
<element name="chr21" file="chr21.png" compare="sim_size" delta="10000"/>
<element name="chr22" file="chr22.png" compare="sim_size" delta="10000"/>
<element name="chr3" file="chr3.png" compare="sim_size" delta="10000"/>
<element name="chr4" file="chr4.png" compare="sim_size" delta="10000"/>
<element name="chr5" file="chr5.png" compare="sim_size" delta="10000"/>
<element name="chr6" file="chr6.png" compare="sim_size" delta="10000"/>
<element name="chr7" file="chr7.png" compare="sim_size" delta="10000"/>
<element name="chr8" file="chr8.png" compare="sim_size" delta="10000"/>
<element name="chr9" file="chr9.png" compare="sim_size" delta="10000"/>
<element name="chrX" file="chrX.png" compare="sim_size" delta="10000"/>
<element name="genome_wide" file="genome_wide.png" compare="sim_size" delta="10000"/>
</output_collection>
</test>
</tests>
<help>
@help@
<![CDATA[
.. class:: infomark
**WisecondorX predict test_input.npz reference_input.npz output_id [--optional arguments]**
Option List::
--minrefbins Minimum amount of sensible reference bins per target bin;
should generally not be tweaked (default: x=150)
--maskrepeats Bins with distances > mean + sd * 3 in the reference will be
masked. This parameter represents the number of masking cycles
and defines the stringency of the blacklist (default: x=5)
**Should be a multiple of the 5e3**.
Not exposed in this Galaxy wrapper.
--zscore Z-score cutoff to call segments as aberrations (default: x=5)
--alpha P-value cutoff for calling circular binary segmentation
breakpoints (default: x=1e-4).
Not exposed in this Galaxy wrapper.
--beta When beta is given, --zscore is ignored. Beta sets a ratio
cutoff for aberration calling. It's a number between 0 (liberal)
and 1 (conservative) and, when used, is optimally close to the
purity (e.g. fetal/tumor fraction)
Not exposed in this Galaxy wrapper.
--blacklist Blacklist for masking additional regions; requires headerless
.bed file. This is particularly useful when the reference set
is too small to recognize some obvious loci (such as centromeres).
Not exposed in this Galaxy wrapper.
--gender Force WisecondorX to analyze this case as male (M) or female (F).
Useful when e.g. dealing with a loss of chromosome Y, which
causes erroneous gender predictions (choices: x=F or x=M).
Not exposed in this Galaxy wrapper.
--bed Outputs tab-delimited .bed files.
--plot Outputs custom .png plots, directly interpretable.
--ylim [a,b] Force WisecondorX to use y-axis interval [a,b] during plotting, e.g. [-2,2].
Not exposed in this Galaxy wrapper.
--cairo Some operating systems require the cairo bitmap type to write plots.
Not exposed in this Galaxy wrapper.
--seed Random seed for segmentation algorithm (default:None).
Not exposed in this Galaxy wrapper.
]]></help>
<citations>
<citation type="doi">10.1093/nar/gky1263</citation>
</citations>
</tool>
66 changes: 66 additions & 0 deletions tools/wisecondorx/wisecondor_reference_builder.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
<tool id="wisecondorx_reference_builder" name="WisecondorX build" version="@WRAPPER_VERSION@" profile="@PROFILE@">
<description>
reference
</description>
<macros>
<import>macro.xml</import>
</macros>
<expand macro="requirements"/>
<stdio>
<exit_code range="1:" level="fatal" description="Error occured" />
</stdio>
<command detect_errors="exit_code"><![CDATA[
@pipefail@
#for $num, $file in enumerate($npz_inputs):
ln -s $file "${num}.npz" &&
#end for
WisecondorX newref *.npz reference.npz
--binsize ${bin}
--cpus \${GALAXY_SLOTS:-4} &&
mv reference.npz $npz
]]></command>
<inputs>
<param name="npz_inputs" type="data" label="npz inputs" multiple="True" format="npz"
help="Build reference from npz inputs from normal sample (at least 10 samples required)"/>
<param name="bin" size="9" type="integer" value="100000" label="Bin size in nucleotides"
help="Bin default value is 100 kb (100000)" />
</inputs>
<outputs>
<data name="npz" format="npz" label="reference_${bin}nt" />
</outputs>
<tests>
<test expect_num_outputs="1">
<param name="npz_inputs"
value="0.ref.npz,1.ref.npz,2.ref.npz,3.ref.npz,4.ref.npz,5.ref.npz,6.ref.npz,7.ref.npz,8.ref.npz,9.ref.npz"/>
<param name="bin" value="10000" />
<output name="npz" ftype="npz" file="output_reference.npz" compare="sim_size" delta="10000"/>
</test>
</tests>
<help>
@help@
<![CDATA[
.. class:: infomark
**WisecondorX newref reference_input_dir/*.npz reference_output.npz [--optional arguments]**
Option List::
--nipt Always include this flag for the generation of a NIPT reference
--binsize Size per bin in bp, defines the resolution of the output (default: x=1e5)
**Should be a multiple of the 5e3**
--refsize Amount of reference locations per target;
should generally not be tweaked (default: x=300)
--yfrac Y read fraction cutoff, in order to manually define gender.
Setting this to 1 will treat all samples as female.
This parameter is not currently exposed in the Galaxy wrapper.
--plotyfrac plots Y read fraction histogram and Gaussian mixture fit to file x,
can help when setting --yfrac manually; software quits after plotting
The --normdup parameter is currently not exposed in this Galaxy
wrapper. Default is to remove duplicates.
--cpus Number of threads requested (This is defined by the Galaxy administrator)
]]></help>
<citations>
<citation type="doi">10.1093/nar/gky1263</citation>
</citations>
</tool>

0 comments on commit b391b0f

Please sign in to comment.