Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identical .gff file names from different genome, and then issue with mcl groups #341

Closed
bhclement opened this issue Aug 18, 2017 · 19 comments

Comments

@bhclement
Copy link

Hello,
I was testing Prokka and Roary for my genomes.
I followed the Prokka command for genome annotations but all .gff files of different samples have the identical .gff file names! Is this normal?
Also there were some error message when I run the Roary commands.
I will appreciate for advice.
Thanks,
KT
p.s. command and result copied below.

[ctsui@grl-salk E_coli] prokka --kingdom Bacteria --outdir Ecoli_V29 --locustag ecoli_V_29 --addgenes --force --cpus 8 V_29.contigs.fasta
[ctsui@grl-salk E_coli]$ roary -e --mafft -p 8 ./Ecoli*/*.gff
2017/08/17 17:52:10 Extracting proteins from GFF files
Extracting proteins from /Ecoli_V5/PROKKA_08172017.gff
Extracting proteins from Ecoli_V29/PROKKA_08172017.gff
Extracting proteins from Ecoli_U16/PROKKA_08172017.gff
Extracting proteins from /Ecoli_V7/PROKKA_08172017.gff
Extracting proteins from /Ecoli_V21/PROKKA_08172017.gff
Combine proteins into a single file
Iteratively run cd-hit
Parallel all against all blast
BLAST Database error: No alias or index file found for protein database [/projects/roary_test/_1503017512/DJ3228qGRO/output_contigs] in search path [/projects/roary_test/_1503017512::]
Cluster with MCL
2017/08/17 17:53:05 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr --output_multifasta_files -i /projects/E_coli/roary_test/_1503017512/ZV2aUQOBho//_gff_files -f /projects/E_coli/roary_test/_1503017512/ZV2aUQOBho//_fasta_files -t 11 --dont_create_rplots -v --mafft -j Local --processors 1 --group_limit 50000 -cd 99
2017/08/17 17:53:07 Reinflate clusters
Cant open file: _uninflated_mcl_groups

@menright99
Copy link

You need to use --prefix 'strain_name' in the command line - this will give you strain_name.gff

I also use --locustag 'strain_name' as this names the genes after your strain and not the default

Hope this works.

@bhclement
Copy link
Author

Thanks. I did Prokka again with --prelix in the command.
But the message "Cant open file: _uninflated_mcl_groups" still persist! Is some dependency missing?

@tseemann
Copy link
Contributor

@bhclement sounds like your installation of mcl failed to produce the correct output files.

thank you @menright99 for helping with Prokka.
I tend to use --prefix STRAIN --locustag STRAIN --outdir STRAIN
I should add an --auto mode to do this sort of pattern - maybe in Prokka 2.0

@bhclement
Copy link
Author

Thanks Torsten! So I need re-install mcl?
which version should I install?

@bhclement
Copy link
Author

Roary was installed in the server, while mcl was installed in a personal dir. Is this the issue?
I already export the path in which mcl was installed.

@tseemann
Copy link
Contributor

Can you provide us with the output of roary -a ?

@bhclement
Copy link
Author

Please cite Roary if you use any of the results it produces:
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
"Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693
doi: http://doi.org/10.1093/bioinformatics/btv421
Pubmed: 26198102

2017/08/21 21:05:45 Looking for 'Rscript' - found /usr/local/bin/Rscript
2017/08/21 21:05:45 Determined Rscript version is 2.15
2017/08/21 21:05:45 Roary needs Rscript 3 or higher. Please upgrade and try again.
2017/08/21 21:05:45 Looking for 'awk' - found /bin/awk
2017/08/21 21:05:45 Looking for 'bedtools' - found /usr/local/bin/bedtools
2017/08/21 21:05:45 Determined bedtools version is 2.24
2017/08/21 21:05:45 Looking for 'blastp' - found /usr/bin/blastp
2017/08/21 21:05:45 Determined blastp version is 2.2.28
2017/08/21 21:05:45 Looking for 'grep' - found /bin/grep
2017/08/21 21:05:45 Looking for 'kraken' - found /usr/local/bin/kraken
Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl5/Bio/Roary/External/CheckTools.pm li ne 129.
2017/08/21 21:05:46 Determined kraken version is
2017/08/21 21:05:46 Looking for 'kraken-report' - found /usr/local/bin/kraken-report
Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl5/Bio/Roary/External/CheckTools.pm li ne 129.
2017/08/21 21:05:46 Determined kraken-report version is
2017/08/21 21:05:46 Looking for 'mafft' - found /usr/bin/mafft
Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl5/Bio/Roary/External/CheckTools.pm li ne 129.
2017/08/21 21:05:46 Determined mafft version is
2017/08/21 21:05:46 Looking for 'makeblastdb' - found /usr/bin/makeblastdb
2017/08/21 21:05:46 Determined makeblastdb version is 2.2.28
2017/08/21 21:05:46 ERROR: Can't find required 'mcl' in your $PATH
2017/08/21 21:05:46 Looking for 'parallel' - found /usr/bin/parallel
2017/08/21 21:05:46 Determined parallel version is 20130722
2017/08/21 21:05:46 ERROR: Can't find required 'prank' in your $PATH
2017/08/21 21:05:46 Looking for 'sed' - found /bin/sed
2017/08/21 21:05:46 Looking for 'cd-hit' - found /usr/local/bin/cd-hit
2017/08/21 21:05:46 Determined cd-hit version is 4.6
2017/08/21 21:05:46 Looking for 'FastTree' - found /usr/local/bin/FastTree
2017/08/21 21:05:46 Determined FastTree version is 2.1
2017/08/21 21:05:46 Roary version 3.6.0
2017/08/21 21:05:46 Error: You need to provide at least 2 files to build a pan genome
Usage: roary [options] *.gff

Options: -p INT number of threads [1]
-o STR clusters output filename [clustered_proteins]
-f STR output directory [.]
-e create a multiFASTA alignment of core genes using PRANK
-n fast core gene alignment with MAFFT, use with -e
-i minimum percentage identity for blastp [95]
-cd FLOAT percentage of isolates a gene must be in to be core [99]
-qc generate QC report with Kraken
-k STR path to Kraken database for QC, use with -qc
-a check dependancies and print versions
-b STR blastp executable [blastp]
-c STR mcl executable [mcl]
-d STR mcxdeblast executable [mcxdeblast]
-g INT maximum number of clusters [50000]
-m STR makeblastdb executable [makeblastdb]
-r create R plots, requires R and ggplot2
-s dont split paralogs
-t INT translation table [11]
-z dont delete intermediate files
-v verbose output to STDOUT
-w print version and exit
-y add gene inference information to spreadsheet, doesnt work with -e
-h this help message

Example: Quickly generate a core gene alignment using 8 threads
roary -e --mafft -p 8 *.gff

@bhclement
Copy link
Author

Roary did not recognize the local path I export:

export PATH=/projects3/giardia/TB_pilot/4Dec2016/Spade_genomes/Roary_data/mcl-0 5-090/src/alien/oxygen/src/:$PATH

@andrewjpage
Copy link
Member

andrewjpage commented Aug 22, 2017 via email

@andrewjpage
Copy link
Member

andrewjpage commented Aug 22, 2017 via email

@bhclement
Copy link
Author

[ctsui@grl-salk 4Dec2016]$ mcl
-bash: mcl: command not found
[ctsui@grl-salk 4Dec2016]$ mclblastline
/usr/bin/env: perl: No such file or directory
[ctsui@grl-salk 4Dec2016]$

@bhclement
Copy link
Author

So I did not install mcl properly?

@andrewjpage
Copy link
Member

There does appear to be a problem with mcl. Is that space character in the PATH intentional or a typo?
Andrew

@bhclement
Copy link
Author

which space?

@tseemann
Copy link
Contributor

@bhclement what does echo $PATH | tr ":" "\n" | nl say?

@bhclement
Copy link
Author

 **1	/TB_pilot/4Dec2016/Spade_genomes/Roary_data/mcl-14-137/src/alien/oxygen/src/
 2	/TB_pilot/4Dec2016/Spade_genomes/Roary_data/mcl-05-090/src/alien/oxygen/src/
 3	.
 4	/opt/miniconda2/bin
 5	/usr/local/bin
 6	/bin
 7	/usr/bin
 8	/usr/local/sbin
 9	/usr/sbin
10	/sbin
11	/usr/local/rvm/bin
12	/home/ctsui/bin

any insights? Thanks.

@tseemann
Copy link
Contributor

tseemann commented Aug 24, 2017

Seems you are using miniconda. Can you just conda install mcl and get rid of PATH items 1, 2, 3 ?

See https://bioconda.github.io/recipes/mcl/README.html

@bhclement
Copy link
Author

Hello, I reinstalled mcl and it appeared to be working now.
Many thanks

@tseemann
Copy link
Contributor

tseemann commented Sep 2, 2017

@bhclement that is great news! can you please close this issue now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants