Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phage contig reorientation with dnaapler in pharokka - terL in the reoriented contig does not begin at position 1 in the Genbank file #379

Closed
TimSkvortsov opened this issue Feb 6, 2025 · 2 comments

Comments

@TimSkvortsov
Copy link

Hello, I used pharokka the other day to reorient and annotate the genome of a newly sequenced Cutibacterium phage. The reorientation was completed correctly, but in the genbank file the CDS coordinates of terL were given as 130..1512 instead of the expected 1..1512. The attached image shows the beginning of the reoriented genome with the terL as predicted by PHANOTATE highlighted in yellow.

I am not sure whether it can be called a bug or not though as I understand that dnaapler uses tblastx whilst PHANOTATE gene predictions are based on the analysis of the phage genome as a whole.

LOCUS       1                      29446 bp    DNA     linear   PHG 19-JAN-2025
DEFINITION  1 length=29446 depth=1.00x.
ACCESSION   1
VERSION     1
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     CDS             130..1512
                     /ID="FGUJEDQP_CDS_0001"
                     /transl_table=11
                     /phrog="9"
                     /top_hit="NC_023862_p15"
                     /locus_tag="FGUJEDQP_CDS_0001"
                     /function="head and packaging"
                     /product="terminase large subunit"
                     /source="PHANOTATE_1.5.1"
                     /score="-4561.899041514007"
                     /phase="0"
                     /translation="VLDDWLAIGSNGRLASGVCGVFVPRQNGKNAILEVVELFKATIQG
                     RRILHTAHELKSARKAFMRLRSFFENERQFPDLYRMVKTIRATNGQEAIVLHHPDCATF
                     ERKCGCPGWGSVEFVARSRGSARGFTVDDLVCDEAQELSDEQLEALLPTVSAAPSGDPQ
                     QIFLGTPPGPLADGSVVLRLRGQALSGGKRFAWTEFSIPDESDPDDLTRSWRKLAGDTN
                     PALGRRLNFGTVSDEHESMSAAGFARERLGWWDRGQSATSVIPADKWAQSAVDDVELVG
                     GKVFGVSFSRSGDRVALAGAGKADAGVHVEVIDGLSGTIVDGVGRLADWLAVRWGDTDR
                     IMVAGSGAVLLQKALTDRGVPGRGVVVADTGVYVEACQSFLEGVRSGVVSHPRADSRRD
                     MLDIAVRSAVQKRKGSAWGWGSTFKDGSEVPLEAVSLAYLGAKMAKARRRERSGRKRVS
Image
@TimSkvortsov TimSkvortsov changed the title Phage contig reorientation with dnaapler in pharokka - terL in does not begin at position 1 in the Genbank file Phage contig reorientation with dnaapler in pharokka - terL in the reoriented contig does not begin at position 1 in the Genbank file Feb 6, 2025
@gbouras13
Copy link
Owner

Hi @TimSkvortsov,

What you have observed is an error that gene predictors often cannot score orfs in a circular fashion, and so are prone to miscall the CDS near the contig endpoints. In the context of phanotate, I am not sure how common this is but something for me to look at. But yes, I’d trust the true terL to start at 1 here given dnaapler uses tblastx/mmseqs.

On the plus side, this is a known issue and the developer of pyrodigal is working on implementing a fix - see eg gbouras13/dnaapler#90 althonos/pyrodigal#65 - so hopefully soon it will be fixed at least for pyrodigal (which is an option in pharokka). Perhaps trying out pharokka on this genome with -g prodigal in any case may be interesting as it stands too.

George

@TimSkvortsov
Copy link
Author

Perfect, thanks for the explanation @gbouras13.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants