Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a guix package #259

Closed
wwood opened this issue Jul 20, 2016 · 8 comments
Closed

a guix package #259

wwood opened this issue Jul 20, 2016 · 8 comments

Comments

@wwood
Copy link
Contributor

wwood commented Jul 20, 2016

Hi,

I'm attempting to package roary for GNU Guix. I think I'm most of the way there, however there is a number of test failures. I've put a log of the test failures here:
https://gist.github.com/wwood/99cac38dd932c4f424400f151a9d693a

I'm finding it a little hard to track down, and I'm hoping someone more familiar with the code will immediately be able to spot what is going on. The build happens in quite a restrictive environment so basic unix utilities such as "grep" aren't available (by default, they can be added).

I'm using the v3.6.4 release from the releases GitHub page.
Thanks in advance.
ben

@andrewjpage
Copy link
Member

Hi Ben,
Thanks for taking the time to package Roary. I assumed standard utilities
would be available like awk & grep, so perhaps thats the issue. Could you
run 'roary -a' and send me the output?
Regards,
Andrew

On 20 July 2016 at 13:09, Ben J Woodcroft [email protected] wrote:

Hi,

I'm attempting to package roary for GNU Guix. I think I'm most of the way
there, however there is a number of test failures. I've put a log of the
test failures here:
https://gist.github.com/wwood/99cac38dd932c4f424400f151a9d693a

I'm finding it a little hard to track down, and I'm hoping someone more
familiar with the code will immediately be able to spot what is going on.
The build happens in quite a restrictive environment so basic unix
utilities such as "grep" aren't available (by default, they can be added).

I'm using the v3.6.4 release from the releases GitHub page.
Thanks in advance.
ben


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#259, or mute the thread
https://github.com/notifications/unsubscribe-auth/AABeV00lKW_8zPtebXVXL9Fle3a-DxF1ks5qXhATgaJpZM4JQtdN
.

@wwood
Copy link
Contributor Author

wwood commented Jul 20, 2016

Thanks for the quick response

starting phase `check'
2016/07/20 12:27:52 Looking for 'Rscript' - found /gnu/store/wyds6svxvp3gz5j3sszycvsa1v61j0vm-r-3.3.0/bin/Rscript

Please cite Roary if you use any of the results it produces:
    Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
    "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693
    doi: http://doi.org/10.1093/bioinformatics/btv421
    Pubmed: 26198102

2016/07/20 12:27:53 Determined Rscript version is 3.3
2016/07/20 12:27:53 Looking for 'awk' - found /gnu/store/bwgn42q3946z4k4x62wbrmryq7xhqfih-gawk-4.1.3/bin/awk
2016/07/20 12:27:53 Looking for 'bedtools' - found /gnu/store/hdsk6w2qrxjwshkqsh0j7pmnlcfzqgkb-bedtools-2.26.0/bin/bedtools
2016/07/20 12:27:53 Determined bedtools version is 2.26
2016/07/20 12:27:53 Looking for 'blastp' - found /gnu/store/mrnv2g9qhqfazb9mgd64lxdv2wkzwwbd-blast+-2.4.0/bin/blastp
2016/07/20 12:27:54 Determined blastp version is 2.4.0
2016/07/20 12:27:54 Looking for 'grep' - found /gnu/store/rhzcg2h6mmf1dlzv32w227kn4dkdcxmn-grep-2.22/bin/grep
2016/07/20 12:27:54 Optional tool 'kraken' not found in your $PATH
2016/07/20 12:27:54 Optional tool 'kraken-report' not found in your $PATH
2016/07/20 12:27:54 Looking for 'mafft' - found /gnu/store/8l72wfpvib3ayfywsw6mm3721yj3mhxs-mafft-7.299/bin/mafft
Use of uninitialized value in concatenation (.) or string at /tmp/guix-build-roary-3.6.4.drv-0/Roary-3.6.4/lib/Bio/Roary/External/CheckTools.pm line 139.
2016/07/20 12:27:54 Determined mafft version is 
2016/07/20 12:27:54 Looking for 'makeblastdb' - found /gnu/store/mrnv2g9qhqfazb9mgd64lxdv2wkzwwbd-blast+-2.4.0/bin/makeblastdb
2016/07/20 12:27:55 Determined makeblastdb version is 2.4.0
2016/07/20 12:27:55 Looking for 'mcl' - found /gnu/store/y59g41adjhz0rl1cypwvs5m5n6l5rxsd-mcl-14.137/bin/mcl
2016/07/20 12:27:55 Determined mcl version is 14-137
2016/07/20 12:27:55 Looking for 'parallel' - found /gnu/store/27y8f7r3pscdym2nyifrix4m2bxvbyxf-parallel-20160622/bin/parallel
2016/07/20 12:27:55 Determined parallel version is 20160622
2016/07/20 12:27:55 Looking for 'prank' - found /gnu/store/ch169vv7lb0c0sj38bafdyf2qr0hk9mx-prank-150803/bin/prank
2016/07/20 12:27:55 Looking for 'sed' - found /gnu/store/4ls9kdy1w6ichvmbrl5wn98lxmznbkd6-sed-4.2.2/bin/sed
2016/07/20 12:27:55 Looking for 'cd-hit' - found /gnu/store/r4idkc9mgxic5m50yyl6hmj1nqm6q2jq-cd-hit-4.6.5/bin/cd-hit
2016/07/20 12:27:55 Determined cd-hit version is 4.6
2016/07/20 12:27:55 Looking for 'FastTree' - found /gnu/store/0ln8cw2cvaxmsbnmma9zvrbh9zhjfld7-fasttree-2.1.8/bin/FastTree
Use of uninitialized value in concatenation (.) or string at /tmp/guix-build-roary-3.6.4.drv-0/Roary-3.6.4/lib/Bio/Roary/External/CheckTools.pm line 139.
2016/07/20 12:27:55 Determined FastTree version is 
2016/07/20 12:27:55 Roary version 1.006924
2016/07/20 12:27:55 Error: You need to provide at least 2 files to build a pan genome
Usage:   roary [options] *.gff

Options: -p INT    number of threads [1]
         -o STR    clusters output filename [clustered_proteins]
         -f STR    output directory [.]
         -e        create a multiFASTA alignment of core genes using PRANK
         -n        fast core gene alignment with MAFFT, use with -e
         -i        minimum percentage identity for blastp [95]
         -cd FLOAT percentage of isolates a gene must be in to be core [99]
         -qc       generate QC report with Kraken
         -k STR    path to Kraken database for QC, use with -qc
         -a        check dependancies and print versions
         -b STR    blastp executable [blastp]
         -c STR    mcl executable [mcl]
         -d STR    mcxdeblast executable [mcxdeblast]
         -g INT    maximum number of clusters [50000]
         -m STR    makeblastdb executable [makeblastdb]
         -r        create R plots, requires R and ggplot2
         -s        dont split paralogs
         -t INT    translation table [11]
         -z        dont delete intermediate files
         -v        verbose output to STDOUT
         -w        print version and exit
         -y        add gene inference information to spreadsheet, doesnt work with -e
         -h        this help message

Example: Quickly generate a core gene alignment using 8 threads
         roary -e --mafft -p 8 *.gff

For further info see: http://sanger-pathogens.github.io/Roary/

mafft version is 7.299, by the way (ie newest). FastTree is 2.1.8, should be updated to 2.1.9.
ben

@andrewjpage
Copy link
Member

Hi Ben,
I think I've fixed it, could you take a look at this branch and let me know
if the tests pass?
https://github.com/andrewjpage/Roary/tree/dont_split_gff_on_fasta_line

It looks like bedtools was updated 2 weeks ago and it introduced a slightly
new output.
Regards,
Andrew

On 20 July 2016 at 13:35, Ben J Woodcroft [email protected] wrote:

Thanks for the quick response

starting phase `check'
2016/07/20 12:27:52 Looking for 'Rscript' - found /gnu/store/wyds6svxvp3gz5j3sszycvsa1v61j0vm-r-3.3.0/bin/Rscript

Please cite Roary if you use any of the results it produces:
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
"Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693
doi: http://doi.org/10.1093/bioinformatics/btv421
Pubmed: 26198102

2016/07/20 12:27:53 Determined Rscript version is 3.3
2016/07/20 12:27:53 Looking for 'awk' - found /gnu/store/bwgn42q3946z4k4x62wbrmryq7xhqfih-gawk-4.1.3/bin/awk
2016/07/20 12:27:53 Looking for 'bedtools' - found /gnu/store/hdsk6w2qrxjwshkqsh0j7pmnlcfzqgkb-bedtools-2.26.0/bin/bedtools
2016/07/20 12:27:53 Determined bedtools version is 2.26
2016/07/20 12:27:53 Looking for 'blastp' - found /gnu/store/mrnv2g9qhqfazb9mgd64lxdv2wkzwwbd-blast+-2.4.0/bin/blastp
2016/07/20 12:27:54 Determined blastp version is 2.4.0
2016/07/20 12:27:54 Looking for 'grep' - found /gnu/store/rhzcg2h6mmf1dlzv32w227kn4dkdcxmn-grep-2.22/bin/grep
2016/07/20 12:27:54 Optional tool 'kraken' not found in your $PATH
2016/07/20 12:27:54 Optional tool 'kraken-report' not found in your $PATH
2016/07/20 12:27:54 Looking for 'mafft' - found /gnu/store/8l72wfpvib3ayfywsw6mm3721yj3mhxs-mafft-7.299/bin/mafft
Use of uninitialized value in concatenation (.) or string at /tmp/guix-build-roary-3.6.4.drv-0/Roary-3.6.4/lib/Bio/Roary/External/CheckTools.pm line 139.
2016/07/20 12:27:54 Determined mafft version is
2016/07/20 12:27:54 Looking for 'makeblastdb' - found /gnu/store/mrnv2g9qhqfazb9mgd64lxdv2wkzwwbd-blast+-2.4.0/bin/makeblastdb
2016/07/20 12:27:55 Determined makeblastdb version is 2.4.0
2016/07/20 12:27:55 Looking for 'mcl' - found /gnu/store/y59g41adjhz0rl1cypwvs5m5n6l5rxsd-mcl-14.137/bin/mcl
2016/07/20 12:27:55 Determined mcl version is 14-137
2016/07/20 12:27:55 Looking for 'parallel' - found /gnu/store/27y8f7r3pscdym2nyifrix4m2bxvbyxf-parallel-20160622/bin/parallel
2016/07/20 12:27:55 Determined parallel version is 20160622
2016/07/20 12:27:55 Looking for 'prank' - found /gnu/store/ch169vv7lb0c0sj38bafdyf2qr0hk9mx-prank-150803/bin/prank
2016/07/20 12:27:55 Looking for 'sed' - found /gnu/store/4ls9kdy1w6ichvmbrl5wn98lxmznbkd6-sed-4.2.2/bin/sed
2016/07/20 12:27:55 Looking for 'cd-hit' - found /gnu/store/r4idkc9mgxic5m50yyl6hmj1nqm6q2jq-cd-hit-4.6.5/bin/cd-hit
2016/07/20 12:27:55 Determined cd-hit version is 4.6
2016/07/20 12:27:55 Looking for 'FastTree' - found /gnu/store/0ln8cw2cvaxmsbnmma9zvrbh9zhjfld7-fasttree-2.1.8/bin/FastTree
Use of uninitialized value in concatenation (.) or string at /tmp/guix-build-roary-3.6.4.drv-0/Roary-3.6.4/lib/Bio/Roary/External/CheckTools.pm line 139.
2016/07/20 12:27:55 Determined FastTree version is
2016/07/20 12:27:55 Roary version 1.006924
2016/07/20 12:27:55 Error: You need to provide at least 2 files to build a pan genome
Usage: roary [options] *.gff

Options: -p INT number of threads [1]
-o STR clusters output filename [clustered_proteins]
-f STR output directory [.]
-e create a multiFASTA alignment of core genes using PRANK
-n fast core gene alignment with MAFFT, use with -e
-i minimum percentage identity for blastp [95]
-cd FLOAT percentage of isolates a gene must be in to be core [99]
-qc generate QC report with Kraken
-k STR path to Kraken database for QC, use with -qc
-a check dependancies and print versions
-b STR blastp executable [blastp]
-c STR mcl executable [mcl]
-d STR mcxdeblast executable [mcxdeblast]
-g INT maximum number of clusters [50000]
-m STR makeblastdb executable [makeblastdb]
-r create R plots, requires R and ggplot2
-s dont split paralogs
-t INT translation table [11]
-z dont delete intermediate files
-v verbose output to STDOUT
-w print version and exit
-y add gene inference information to spreadsheet, doesnt work with -e
-h this help message

Example: Quickly generate a core gene alignment using 8 threads
roary -e --mafft -p 8 *.gff

For further info see: http://sanger-pathogens.github.io/Roary/

mafft version is 7.299, by the way (ie newest). FastTree is 2.1.8, should
be updated to 2.1.9.
ben


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#259 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABeV8CPpP2fl9L3ePInRDVMtNwpWD3nks5qXhX3gaJpZM4JQtdN
.

@wwood
Copy link
Contributor Author

wwood commented Jul 21, 2016

Great, thanks that helps, more tests pass. But, there is some still some failing tests. I've updated the gist with the new build log. Any ideas? Thanks.

@andrewjpage
Copy link
Member

Thanks. This error looks odd and may be the cause of some of the failures:
"MSG: Could not write file 't/data/out_of_order_fasta.fa.sorted.fa':
Permission denied"

Is the code on a read only device or are the restricted permissions?
Whilst most of the tests run in temp directories, some I'm afraid write to
the test direcotyr itself, which normally isnt an issue for me because
distzilla copies the code to a build directory and runs them from the copy.

Is there a way that I could get a VM or vagrant file etc... replicating
your environment so that I can look further?
Andrew

On 21 July 2016 at 13:03, Ben J Woodcroft [email protected] wrote:

Great, thanks that helps, more tests pass. But, there is some still some
failing tests. I've updated the gist with the new build log. Any ideas?
Thanks.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#259 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABeV-FBGNb5csvwdtPFUjhFnOm7q8yUks5qX1_6gaJpZM4JQtdN
.

@wwood
Copy link
Contributor Author

wwood commented Jul 23, 2016

I'm afraid that doesn't seem to be the root of the cause, this was fixed by running chmod u+w -R t/data before running the tests, and there is no special permissions - the user running the tests owns the files. Is there a reason for that directory being read only?

Unfortunately, there is still some failing tests. In particular this one:
https://gist.github.com/wwood/99cac38dd932c4f424400f151a9d693a#file-roary-build-log-L323

I can reproduce something similar outside too:

ben@u:~/git/Roary/2$ PATH=~/git/Roary/bin:$PATH PERL5LIB=~/git/Roary/lib:$PERL5LIB roary -v -z -e ~/git/Roary/t/data/query_1.gff ~/git/Roary/t/data/query_2.gff

Please cite Roary if you use any of the results it produces:
    Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
    "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693
    doi: http://doi.org/10.1093/bioinformatics/btv421
    Pubmed: 26198102

2016/07/23 09:40:45 Fixing input GFF files
2016/07/23 09:40:45 Input file contains duplicate gene IDs, attempting to fix by adding a unique suffix.  New GFF in the fixed_input_files directory.  /home/ben/git/Roary/t/data/query_2.gff 
2016/07/23 09:40:45 Extracting proteins from GFF files
Extracting proteins from /home/ben/git/Roary/t/data/query_1.gff
Extracting proteins from fixed_input_files/query_2.gff
Combine proteins into a single file
Iteratively run cd-hit
Parallel all against all blast
BLAST Database error: No alias or index file found for protein database [/home/ben/git/Roary/2/JvcGnx9fNm/output_contigs] in search path [/home/ben/git/Roary/2::]
Cluster with MCL
2016/07/23 09:40:47 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr --output_multifasta_files -i /home/ben/git/Roary/2/RjcNVHtoY9//_gff_files -f /home/ben/git/Roary/2/RjcNVHtoY9//_fasta_files -t 11 --dont_delete_files --dont_create_rplots   -v  -j Local --processors 1 --group_limit 50000 -cd 99
2016/07/23 09:40:48 Reinflate clusters
2016/07/23 09:40:48 Split groups with paralogs
2016/07/23 09:40:48 Labelling the groups
2016/07/23 09:40:48 Transfering the annotation to the groups
2016/07/23 09:40:48 Creating accessory binary gene presence and absence fasta
2016/07/23 09:40:48 Creating accessory binary gene presence and absence tree
2016/07/23 09:40:48 The input file is too small so not creating a tree
2016/07/23 09:40:48 Creating accessory gene presence and absence clusters
2016/07/23 09:40:48 Theres no accessory binary file so skipping accessory binary clustering
2016/07/23 09:40:48 Creating the spreadsheet with gene presence and absence
2016/07/23 09:40:48 Creating summary statistics of the spreadsheet
2016/07/23 09:40:48 Very few core genes detected with the current settings. Try modifying the core definition ( -cd 90 ) and/or 
    the blast identity (-i 70) parameters.  Also try checking for contamination (-qc) and ensure you only have one species.
2016/07/23 09:40:48 Creating tab files for R
2016/07/23 09:40:48 Create EMBL files
2016/07/23 09:40:48 Creating files with the nucleotide sequences for every cluster
Aligning each cluster
Use of uninitialized value in require at (eval 709) line 1.
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_5.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_1.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_12.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_3.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_8.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_9.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_14.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_7.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_6.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_15.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_11.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_4.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_13.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_10.fa
2016/07/23 09:40:48 Running command: protein_alignment_from_nucleotides  -v pan_genome_sequences/group_2.fa

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet
---------------------------------------------------

I put the output directory here: https://github.com/wwood/roary-for-guix-files

In terms of getting a VM etc, the easiest thing for me would be if you simply installed Guix (either directly or in a VM of your own), there are instructions here
https://www.gnu.org/software/guix/manual/guix.html#Installation

Once that is up I can give you the package definition of Roary and its dependencies that are not currently in Guix. If you aren't super keen I could get a dockerfile together, but I'd rather convince you of the awesomeness of Guix.

And since this is already a long response, I may as well make it longer. I note there are lines like this in the code:

system( $self->_command_to_run );

I wonder if it might help debug these errors if syscalls that exit with non-zero status were caught. For instance, I wrote this small Ruby library which does something similar. Just a thought, feel free to ignore me.
https://github.com/wwood/bioruby-commandeer/

@andrewjpage
Copy link
Member

I've just uploaded a fix for this, thanks for reporting it (it was broken on other systems as well). Version 3.6.6 should be in CPAN in a few hours. If you have any suggestions for improvements, pull requests are always welcomed!

@wwood
Copy link
Contributor Author

wwood commented Jul 26, 2016

Great, all the tests pass in 3.6.6. Thanks for the fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants