-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some protein sequences in your file are identical. #49
Comments
Short update. I checked three of the potential duplicate sequences
Apparently they are not identical. Any reasons for the error then? cheers |
Hi Alex, The point of this error message is to make people think about their input data and why it might be that there are identical proteins. The biggest reason this may happen is due to an annotation that is derived from de novo transcriptomes that hasn't been de-duplicated. In these cases it is best to investigate whether these are real or not, as having misannotated proteins will detract from detecting the true signal of chromosomes evolution. In your case there is a flag to shut off the warnings: About the three potential duplicate sequences, the way that the warning message works is it just shows three proteins that have other proteins identical to them in the protein.fasta file. The warning doesn't mean that these three proteins are identical to one another. I will have to modify the message to make that clear. |
Ah ok thanks a lot Darrin! Yes I was a bit confused with the message but it is clear now. Thanks again |
TODO: clarify the error messages in both odp and nway_rbh |
See also #49 |
Hello Darrin,
I am getting the mentioned error and subsequent crash when I cloned the most recent odp version whereas I was not getting this error before. However, some of my genomes have a known event ancient whole genome duplication and It could be indeed that many protein sequences on different chromosomes (homoeologs) have identical sequences. I am sceptical as to whether it is really necessary to introduce this legality check.
I also downloaded some publicly available genomes and had the same problems. The identical sequences are not different isoforms as they come from a different locus (gene and locus ID). i made sure to remove identical or alternative isoforms before running odp.
Any thoughts on this?
Thanks
Alex
The text was updated successfully, but these errors were encountered: