Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabs in FASTA record headers cause Kaptive to fall over #39

Closed
CorinYeatsCGPS opened this issue Oct 14, 2024 · 2 comments
Closed

Tabs in FASTA record headers cause Kaptive to fall over #39

CorinYeatsCGPS opened this issue Oct 14, 2024 · 2 comments
Assignees
Labels

Comments

@CorinYeatsCGPS
Copy link

Hi,

I found we have a few FASTAs with tab characters in them, which causes Kaptive to fall over:

Assembly        Best match locus        Best match type Match confidence        Problems        Identity        Coverage        Length discrepancy      Expected genes in locus Expected genes in locus, details  Missing expected genes  Other genes in locus    Other genes in locus, details   Expected genes outside locus    Expected genes outside locus, details   Other genes outside locus       Other genes outside locus, details        Truncated genes, details        Extra genes, details
Traceback (most recent call last):
  File "/usr/local/bin/kaptive", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kaptive/__main__.py", line 234, in main
    if result := typing_pipeline(assembly, args.db, args.threads, args.score_metric, args.weight_metric,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kaptive/assembly.py", line 278, in typing_pipeline
    partial=a.partial, dna_seq=assembly.seq(a.ctg, a.r_st, a.r_en, a.strand))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kaptive/assembly.py", line 60, in seq
    return self.contigs[ctg].seq[start:end] if strand == "+" else self.contigs[ctg].seq[
           ~~~~~~~~~~~~^^^^^
KeyError: 'OW967300.1'

It's a bit unusual but technically it's allowed to have tabs in the headers, and there appears to be a pipeline in use that produces them.

@tomdstanton
Copy link
Collaborator

See klebgenomics/Kleborate#85

@tomdstanton tomdstanton self-assigned this Oct 15, 2024
@tomdstanton
Copy link
Collaborator

Fixed in b94e596.
@CorinYeatsCGPS can you double check with the problematic assembly on your end? I made some synthetic tabbed headers and seemed to work fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants