Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDS-extractor.pl should return error message (and exit code 1) when no CDS could be extracted #3

Open
jvollme opened this issue Jul 10, 2017 · 2 comments
Assignees

Comments

@jvollme
Copy link

jvollme commented Jul 10, 2017

When genbanks are used as input that still have windows/dos line-endings, cds-extractor.pl just quits without an error message, giving the impression that it successfully extracted all CDS.

Maybe it could either be adjusted to tolerate windows line endings or to always double-check the number of extracted CDS when finished and generally raise a warning if it is zero.

@jvollme
Copy link
Author

jvollme commented Jul 10, 2017

Ah, nevermind. The ACTUAL problem was that the genbank had ALL cds marked as "/pseudo". Had not expected that :)

But here's another thought: Maybe you could include an option to output pseudogenes also?
A general warning (and optimally exit-code 1) in cases were zero CDS were extracted would still be very helpful though, for including the script in automized pipelines.

@jvollme jvollme closed this as completed Jul 10, 2017
@aleimba aleimba self-assigned this Jul 11, 2017
@aleimba
Copy link
Owner

aleimba commented Jul 11, 2017

Hi @jvollme,

thanks for bringing up the issue!

  1. Windows <-> Unix line ending issues are very annoying, that's why I recommend running dos2unix on the files https://github.com/aleimba/bac-genomics-scripts#windows---unix-linebreak-problems. And of course it would be best to work only in Unix 😉
  2. cds_extractor is designed to ignore /pseudo CDS, because it has several downstream issues. I rather trust more sophisticated ORF-finding software than translate everything by myself, including stop codons etc.. Also for protein extraction I use the tag /translation which is not included for pseudo-CDS. If you want to extract also pseudo genes I recommend using other software, e.g. FeatureExtract from the CBS (http://www.cbs.dtu.dk/services/FeatureExtract/download.php) or extractfeat from the EMBOSS package (http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/extractfeat.html).
  3. I have a check in cds_extractorif there is no annotation in the input file at all and quit with error. A file where all CDS have a /pseudo tag is a rather unusual case, but its a good idea to include a "zero CDS extract" error. Will include it.

@aleimba aleimba reopened this Jul 11, 2017
@jvollme jvollme changed the title CDS-extractor.pl quits quietly when encountering dos/windows like line-endings CDS-extractor.pl should return error message (and exit code 1) when no CDS could be extracted Jul 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants