Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

das tool: do not allow empty labels #5955

Merged
merged 2 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 68 additions & 7 deletions tools/das_tool/das_tool.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,26 @@
<expand macro="requirements"/>
<expand macro="version"/>
<command detect_errors="exit_code"><![CDATA[
#set $bins = ""
#set $labels = ""
#set $sep = ""
#import re

#set $bins = []
#set $labels = []
#for $i, $s in enumerate($binning)
#set $bins += "%s%s" %($sep, $s.bins)
#set $labels += "%s%s" %($sep, $s.labels)
#set $sep = ","
#silent $bins.append(str($s.bins))
#if $s.labels != ''
#silent $labels.append(str($s.labels))
#else
#silent $labels.append(re.sub('[^\w\-_\.]', '_', $s.bins.element_identifier))
#end if
#end for

#if $adv.proteins
ln -s '$adv.proteins' 'proteins' &&
#end if

#set $bins = ','.join($bins)
#set $labels = ','.join($labels)

DAS_Tool
--contigs '$contigs'
--outputbasename 'outputs'
Expand All @@ -46,7 +53,15 @@ DAS_Tool
<param argument="--contigs" type="data" format="fasta" label="Contig sequences"/>
<repeat name="binning" title="Bins" min="1">
<param argument="--bins" type="data" format="tabular" label="Contigs-to-bin table" help="Tabular with two columns: contig-IDs and bin-IDs. Fasta_to_Contigs2Bin can be used to Converts genome bins in fasta format to Contigs-to-bin table"/>
<param argument="--labels" type="text" value="" label="Name of binning prediction tool used to generate the table"/>
<param argument="--labels" type="text" value="" label="Name of binning prediction name" help="If left empty the identifier of the contig-to-bin table is used. Only alphanumeric characters, dash, underscore and dor are allowed. Other characters are replaced by underscore.">
<sanitizer invalid_char="_">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ is invalid and then you add it again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid_char specifies the character used as a replacement for invalid characters. _ is a valid char.

Idea is that this does the same as the regexp that is applied to the element_identifiers in the cheetah code (which I took from what we use all the time).

<valid initial="string.ascii_letters,string.digits">
<add value="-" />
<add value="_" />
<add value="." />
</valid>
</sanitizer>
</param>
</repeat>
<section name="adv" title="Advanced options">
<param argument="--search_engine" type="select" label="Engine used for single copy gene identification">
Expand Down Expand Up @@ -139,6 +154,52 @@ DAS_Tool
</assert_contents>
</output>
</test>
<!-- like the first test, but with empty label -->
<test expect_num_outputs="4">
<param name="contigs" value="contigs.fasta"/>
<repeat name="binning">
<param name="bins" value="metabat.tabular"/>
<!-- <param name="labels" value="metabat"/> -->
</repeat>
<section name="adv">
<param name="search_engine" value="diamond"/>
<param name="proteins" value="proteins.fasta"/>
<param name="score_threshold" value="0.5"/>
<param name="duplicate_penalty" value="0.6"/>
<param name="megabin_penalty" value="0.5" />
</section>
<section name="output">
<param name="write_bin_evals" value="true"/>
<conditional name="write_bins">
<param name="write_bins" value=""/>
</conditional>
<param name="debug" value="true"/>
</section>
<output name="summary" ftype="tabular">
<assert_contents>
<has_text text="unique_SCGs"/>
<has_text text="metabat.8"/>
<has_text text="bacteria"/>
</assert_contents>
</output>
<output name="contigs2bin" ftype="tabular">
<assert_contents>
<has_text text="Ley3_66761_scaffold_6"/>
</assert_contents>
</output>
<output name="log" ftype="txt">
<assert_contents>
<has_text text="Skipping gene prediction"/>
<has_text text="#Target sequences to report alignments for: 1"/>
</assert_contents>
</output>
<output name="eval" ftype="tabular">
<assert_contents>
<has_text text="unique_SCGs"/>
<has_text text="metabat.8"/>
</assert_contents>
</output>
</test>
<test expect_num_outputs="6">
<param name="contigs" value="contigs.fasta"/>
<repeat name="binning">
Expand Down
6 changes: 3 additions & 3 deletions tools/das_tool/macros.xml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<?xml version="1.0"?>
<macros>
<token name="@TOOL_VERSION@">1.1.7</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">21.01</token>
<token name="@VERSION_SUFFIX@">1</token>
<token name="@PROFILE@">22.01</token>
<xml name="biotools">
<xrefs>
<xref type="bio.tools">dastool</xref>
Expand All @@ -28,4 +28,4 @@ DAS Tool is an automated method that integrates the results of a flexible number
<citation type="doi">10.1038/s41564-018-0171-1</citation>
</citations>
</xml>
</macros>
</macros>