-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genes (well) annotated in prokka end up all in different groups?? #355
Comments
@4ureliek how sure are you that you have 81 conserved MecA sequences in your isolates? If you extract the protein sequence for the first MecA, and BLASTP against the other 80 isolates, do you get 1 good hit for every species? You could also combine all the prokka .FFN files and run |
MecA presence was confirmed through an independent method, but the problem was mostly the inconsistency between the prokka annotations (that looked good) and the roary gene_presence_absence.csv file. Interestingly when we added the -s option, now we get all MecA annotated by prokka in one group, and not 1) split in different groups and more importantly 2) under the expected name (description column). Since this option is basically not splitting paralogs (i.e. when not sure the genes are orthologs) I think that the quality of the assemblies (they are at the contig level) was the main factor for the splitting in groups, but I am still confused regarding the naming of these groups, and how that is decided by roary. |
When you annotate with Prokka I would strongly recommened you provide the option |
Yes, we did that, it helped a lot! |
As you have found using the -s option to turn off paralog splitting collapses them all into one group. It is likely you have some genomes with multiple copies of this gene and they get split into different groups based on syntany. As for naming, it is the most frequently annotated name from the prokka input. As @tseemann suggests, providing species specific annotation will help. |
Hi,
I used prokka on Staph aureus strains. I checked the MecA annotations and they look good, but since I could not find MecA in the roary output, I checked which group each 'locus tag' ended up in (grepping them from the gene_presence_absence.csv file).
I am getting back 81 different groups (~as many as the original genes, which means they mostly did not end up in the same clusters). And these groups have lots of different gene names, that have nothing to do with MecA... Looks like these MecA genes all cluster with the wrong set of genes, which is surprising?
I used this command line:
nohup time roary -e --mafft -o staph.roary.out -v ../prokka/*.gff -p 5 -r > staph.roary.aln.log &
Is there something I should be doing differently? I can send some files to reproduce the issue if needed, but first I was hoping there was a simple explanation, such as an option I overlooked!
Cheers,
Aurelie
The text was updated successfully, but these errors were encountered: