You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As this makes ugly warning messages when running -with-tower
I'm shamefully tagging @skrakau to look into this as the bash script scares me too much to try doing it myself 😅 (sorry!)
WARN: Tower request field `tasks.script` exceeds expected size | offending value: `
# ensure augustus has write access to config directory
if [ Y = "Y" ] ; then
cp -r /usr/local/config/ augustus_config/
export AUGUSTUS_CONFIG_PATH=augustus_config
fi
# place db in extra folder to ensure BUSCO recognizes it as path (instead of downloading it)
if [ Y = "Y" ] ; then
mkdir dataset
mv bacteria_odb10 dataset/
fi
# set nullgob: if pattern matches no files, expand to a null string rather than to itself
shopt -s nullglob
# only used for saving busco downloads
most_spec_db="NA"
if busco --lineage_dataset dataset/bacteria_odb10 --mode genome --in MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa --cpu "2" --out "BUSCO" > MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log 2> MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.err; then
# get name of used specific lineage dataset
summaries=(BUSCO/short_summary.specific.*.BUSCO.txt)
if [ ${#summaries[@]} -ne 1 ]; then
echo "ERROR: none or multiple 'BUSCO/short_summary.specific.*.BUSCO.txt' files found. Expected one."
exit 1
fi
[[ $summaries =~ BUSCO/short_summary.specific.(.*).BUSCO.txt ]];
db_name_spec="${BASH_REMATCH[1]}"
most_spec_db=${db_name_spec}
echo "Used specific lineage dataset: ${db_name_spec}"
if [ Y = "Y" ]; then
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa.txt
# if lineage dataset is provided, BUSCO analysis does not fail in case no genes can be found as when using the auto selection setting
# report bin as failed to allow consistent warnings within the pipeline for both settings
if egrep -q $'WARNING: BUSCO did not find any match.' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log ; then
echo "WARNING: BUSCO could not find any genes for the provided lineage dataset! See also MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log."
echo -e "MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa No genes" > "MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.failed_bin.txt"
fi
else
# auto lineage selection
if { egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log && egrep -q $'INFO: Lineage \S+ is selected, supported by ' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log ; } || { egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log && egrep -q $'INFO: The results from the Prodigal gene predictor indicate that your data belongs to the mollicutes clade. Testing subclades...' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log && egrep -q $'INFO: Using local lineages directory ' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log ; }; then
# the second statement is necessary, because certain mollicute clades use a different genetic code, are not part of the BUSCO placement tree, are tested separately
# and cause different log messages
echo "Domain and specific lineage could be selected by BUSCO."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa.txt
db_name_gen=""
summaries_gen=(BUSCO/short_summary.generic.*.BUSCO.txt)
if [ ${#summaries_gen[@]} -lt 1 ]; then
echo "No 'BUSCO/short_summary.generic.*.BUSCO.txt' file found. Assuming selected domain and specific lineages are the same."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa.txt
db_name_gen=${db_name_spec}
else
[[ $summaries_gen =~ BUSCO/short_summary.generic.(.*).BUSCO.txt ]];
db_name_gen="${BASH_REMATCH[1]}"
echo "Used generic lineage dataset: ${db_name_gen}"
cp BUSCO/short_summary.generic.${db_name_gen}.BUSCO.txt short_summary.domain.${db_name_gen}.MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa.txt
fi
for f in BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_buscos.${db_name_gen}.faa.gz
break
done
for f in BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_buscos.${db_name_gen}.fna.gz
break
done
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log && egrep -q $'INFO: No marker genes were found. Root lineage \S+ is kept' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log ; then
echo "Domain could be selected by BUSCO, but no more specific lineage."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa.txt
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log && egrep -q $'INFO: Not enough markers were placed on the tree \([0-9]*\). Root lineage \S+ is kept' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log ; then
echo "Domain could be selected by BUSCO, but no more specific lineage."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.domain.${db_name_spec}.MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa.txt
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log && egrep -q $'INFO: Running virus detection pipeline' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log ; then
# TODO double-check if selected dataset is not one of bacteria_*, archaea_*, eukaryota_*?
echo "Domain could not be selected by BUSCO, but virus dataset was selected."
cp BUSCO/short_summary.specific.${db_name_spec}.BUSCO.txt short_summary.specific_lineage.${db_name_spec}.MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa.txt
else
echo "ERROR: Some not expected case occurred! See MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log." >&2
exit 1
fi
fi
for f in BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_buscos.${db_name_spec}.faa.gz
break
done
for f in BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/run_${db_name_spec}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_buscos.${db_name_spec}.fna.gz
break
done
elif egrep -q $'ERROR: No genes were recognized by BUSCO' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.err ; then
echo "WARNING: BUSCO analysis failed due to no recognized genes! See also MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.err."
echo -e "MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa No genes" > "MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.failed_bin.txt"
elif egrep -q $'INFO: \S+ selected' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log && egrep -q $'ERROR: Placements failed' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.err ; then
echo "WARNING: BUSCO analysis failed due to failed placements! See also MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.err. Still using results for selected generic lineage dataset."
echo -e "MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa Placements failed" > "MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.failed_bin.txt"
message=$(egrep $'INFO: \S+ selected' MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.log)
[[ $message =~ INFO:[[:space:]]([_[:alnum:]]+)[[:space:]]selected ]];
db_name_gen="${BASH_REMATCH[1]}"
most_spec_db=${db_name_gen}
echo "Used generic lineage dataset: ${db_name_gen}"
cp BUSCO/auto_lineage/run_${db_name_gen}/short_summary.txt short_summary.domain.${db_name_gen}.MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa.txt
for f in BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa; do
cat BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*faa | gzip >MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_buscos.${db_name_gen}.faa.gz
break
done
for f in BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna; do
cat BUSCO/auto_lineage/run_${db_name_gen}/busco_sequences/single_copy_busco_sequences/*fna | gzip >MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_buscos.${db_name_gen}.fna.gz
break
done
else
echo "ERROR: BUSCO analysis failed for some unknown reason! See also MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_busco.err." >&2
exit 1
fi
# additionally output genes predicted with Prodigal (GFF3)
if [ -f BUSCO/logs/prodigal_out.log ]; then
mv BUSCO/logs/prodigal_out.log "MEGAHIT-MetaBAT2-test_minigut.unbinned.1.fa_prodigal.gff"
fi
# if needed delete temporary BUSCO files
if [ Y ]; then
find . -depth -type d -name "augustus_config" -execdir rm -rf "{}" \;
find . -depth -type d -name "auto_lineage" -execdir rm -rf "{}" \;
find . -depth -type d -name "run_*" -execdir rm -rf "{}" +
fi
cat <<-END_VERSIONS > versions.yml
"NFCORE_MAG:MAG:BUSCO_QC:BUSCO":
python: $(python --version 2>&1 | sed 's/Python //g')
R: $(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
busco: $(busco --version 2>&1 | sed 's/BUSCO //g')
END_VERSIONS
`, size: 10488 (max: 10240)
```
The text was updated successfully, but these errors were encountered:
Description of feature
As this makes ugly warning messages when running
-with-tower
I'm shamefully tagging @skrakau to look into this as the bash script scares me too much to try doing it myself 😅 (sorry!)
The text was updated successfully, but these errors were encountered: