Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError triggered by "NoneType" when running kindel consensus with realignment #23

Open
sivico26 opened this issue Nov 21, 2024 · 9 comments

Comments

@sivico26
Copy link

Hi there,

I have issues generating a consensus with realignment for an amplicon marker with Nanopore data.

This is what I am running.

$ kindel consensus -r debug.bam > test_kindel_realign.fasta
loading sequences: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5810/5810 [00:01<00:00, 5245.56it/s]
Traceback (most recent call last):
  File "/home/sivico/mambaforge/envs/AmpliPhaser/bin/kindel", line 8, in <module>
    sys.exit(main())
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/kindel/cli.py", line 83, in main
    parser.dispatch()
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/argh/helpers.py", line 56, in dispatch
    return dispatch(self, *args, **kwargs)
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/argh/dispatching.py", line 199, in dispatch
    return run_endpoint_function(
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/argh/dispatching.py", line 270, in run_endpoint_function
    return _process_command_output(lines, output_file, raw_output, always_flush)
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/argh/dispatching.py", line 290, in _process_command_output
    for line in lines:
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/argh/dispatching.py", line 415, in _execute_command
    for line in result:
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/argh/dispatching.py", line 395, in _call
    result = function(*positional_values, **values_by_name)
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/kindel/cli.py", line 21, in consensus
    result = kindel.bam_to_consensus(bam_path,
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/kindel/kindel.py", line 403, in bam_to_consensus
    cdrps = cdrp_consensuses(aln.weights, aln.clip_start_weights, aln.clip_end_weights,
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/kindel/kindel.py", line 199, in cdrp_consensuses
    + cdr_end_consensuses(weights, clip_end_weights, clip_end_depth,
  File "/home/sivico/mambaforge/envs/AmpliPhaser/lib/python3.9/site-packages/kindel/kindel.py", line 184, in cdr_end_consensuses
    clip_consensus = rev_clip_consensus[::-1]
TypeError: 'NoneType' object is not subscriptable

If I deactivate the realignment option, it runs without trouble. However, I am highly interested in the soft-clipped sequences, and being able to rescue them is why I am using kindel in the first place. This is how it looks:

debug_softclips

So there is a big gap in the reference, but it could be filled with the soft-clipped bases. So any idea why this TypeError is being raised, and what is the logic behind is appreciated. If you need any more info, please let me know.

Here is the data: debug.bam.txt (remove the .txt extension before reproducing the error)

@bede
Copy link
Owner

bede commented Nov 23, 2024

Hi @sivico26, thanks for describing the problem so clearly and reproducibly. I have reproduced with your BAM file and will attempt a fix after the weekend. Thanks

@bede
Copy link
Owner

bede commented Nov 26, 2024

Thanks for your patience. I've found and fixed the problem identifying overlap between soft-clipped reads spanning 1268-1748 in your debug sample. The longest common subsequence function was failing to identify a valid match, which I've fixed. Your reads appear to support a large deletion of 466 bases.

soft clipped consensus from position 1268 (left side of gap):

TATTAATTATATTGTCTTCCCACCCTACAATTTGTAACTAATATACCATTTCGTTCATAGATCGATGATGCTGTTAAACTGTATGAGGAGTCAGAGCCATGGACGTTAATGCATTGCTGGAACATCCTTCGCCATGAAGCTAAATGGAGCGATAAGATGGGGGAGATAAATTTTAGAGGAACAAAAAAAAAAGTTAATAAGAGGGTTGAAGGAAAAAAAAAAGGGGAAAAGGGAAAATTTGGAAAGATGATAATGGGCAACCACCTCCACCTTAAGGAAGGGGAAAGGGCAAAAAAGCATGAGGGGAGGGGGAGCGGAGAAAAAAACATCTAGCACACATTAAAATTAAATTTTTTAA

soft clipped consensus from position 1748 (right side of gap):

AAAAAAATTAATAAAAAAAGAAGAAAAAATAATATGCTTGCCTTCCTTAATGTGAGAAGAGACAAACACCGGTATTTCCCTTTCTCTTTTATATGAATAAAATTTATTTTTATAATTATTTTTTTCTTTAATAATTTAACTTTTAAAAATTTAGAAAAAAAACATGGGGGTTTTTATCAAAGGAAAAAAAAATATTTTTTCGAAAACATTAAAGGCCCTTCTACAAGAAGTCAAGTTGCCATTTCTAACAAGTGGCTCACCATCCAAAAGGCGGTGAACAAATTCTGTGGTCATTTTTGGTTTGTTGAAAGGTTAGACAAAAGTGGAAAGACTGAGCAGGACCGAGTAAGTCAATGTGTATTAATTATATTG

Longest common consensensus subsequence:TATTAATTATATTG
In context this is: AATGTGtattaattatattgTCTTCC

Below are the results from a testing branch for your sequences. Once I've done more testing I will release a new version including the fixes.

reference: glutathione
options:
- bam_path: debug.bam
- min_depth: 1
- realign: True
    - min_overlap: 7
    - clip_decay_threshold: 0.1
- trim_ends: False
- uppercase: False
observations:
- min, max observed depth: 0, 1735
- ambiguous sites: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 2444, 2445, 2446, 2447, 2448, 2449, 2450, 2451, 2452, 2453, 2454
- insertion sites: 134, 466, 737
- deletion sites: 114, 138, 307, 399, 402, 411, 1068, 1070, 1073, 1781
- clip-dominant regions: 1268-1748: TATTAATTATATTG

>glutathione_cns
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NGAGAGAAGGTTGTGCGGCAAGGGAGAACTCGCCGGTTCGGGGTTGTGCAGCGGGGTGCC
CGGCTAGCTCCTcCCCGTCAAGGTCAGCTCCGGCGAGTCCCAAGTCAAGGTTGGTCCCCT
TCCATTCATCCCCTTCCATTCCCGACTTGGAATCGCTCGATCTGCTAGTTCGTACGTCGT
GGGGGTTGATATCTGTATCCGGTTTGAGGAACGGTGGGTTCTCTCCAGATCCGTGGAGTC
AGTGGCCGGTTTAGCATGCGGCGTTTGCGGCTGTTGTTCTAGTGCAGTGATTTTGATTGC
ATCGATTGATTTGCATTTATTTTTCTCTTTGGGTCACAACCGTGGTGATCCGTGGTGAAT
GCTTCTTGACTTCGTGTCCCTTGCTAACCATTCCAATCAAaattagatcgctttgtctag
atgcttaagAATTTAACCGGCTGCAGGATGGCAAAGAACTTGAATCCCTATGGAAACAAT
GATGGTAACTCTGAGGCCTATCTTGAAGGCCAGGAATGGGAGTTCCCTCTTTCTGATTCA
TTGGAAGATTTCGATAATCTTACAATTCCTCAGGTTTGCCACGCTGCTATTGTATTTTGC
TGCTATTGCATTGGCTAACTAAATACTGCTATTGTATTTTGTCCTTGCTGCAAATGGTTT
AGATGCAGCAAGCCATAGGTCTGCGGCGGCCAATTCCATTtcatcctccgagaccACACC
CTGCAACTGTTAATGCGGAAGCCATCCAAGAGCACACTGAGGTAATTGGACAAACTCCCG
AATTAGAGATTCCTATTCCTCAGTTTAAAAGAGGAGGGAAAGGCAAGGGCAAAACCAAAG
GCGCTGGTAATTTCAGTGGCAAGAGGTTATCACAAAGAGGTAAATCCTTTAGCAAAGATG
AAGACAAAATTATATGCTCTGCCTTCCTGAATGTGAGCAAAGACCCTATCACTGGTATGT
CCCTTTCACTTGTAGATGAATAATCTTGAGTTATATAATTATACTGCTACTTCAATCTAT
GACTTGTAAATATTTAGGAACAAATCATCATGGCGGTTATTATCAAAGGATACATAAATA
TTTTATCGAGAACATTGAAGGCCCTTCTACAAGAAGTCAAGTTGCCATTTCTAACAAGTG
GCTCACCATCCAAAAGGCGGTGAACAAATTCTGTGGTCATTTTTGGTTTGTTGAAAGGTT
AGACAAAAGTGGAAAGACTGAGCAGGACCGAGTAAGTCAATGTGtattaattatattgTC
TTCCCACCCTACAATTTGTAACTAATATACCATTTCGTTCATAGATCGATGATGCTGTTA
AACTGTATGAGGAGTCAGAGCCATGGACGTTAATGCATTGCTGGAACATCCTTCGCCATG
AAGCTAAATGGAGCGATAAGATGGTGGAGATCAATTCTAGAGGCACAAGTACAAAAGTTA
ATCAGCAGGTTGCAGGCAACAATCAAGGGGAACAGGGACAATCTGAGCATGATGATAATG
GACAGCCAGCTCGACCTGAAGGAAGGGACAGTGCCAAGAAGCGTAGGAGTCGTGGGACTG
CAGACAATGATGCATCTAGCGCTGCAATTGAAGTTCTTCAAAGCATGAATGCAAGGGGCC
AGATCAAGGATGACAAAGAAGACAGTCAGATGGCGCAAATACTCCAATGGAAGGATGCTA
AGATAGAGCTTCAACAAAATATGATTGCTCTGCAGAGAGAGGAGATGCAAAAGAGATGGG
AACTTGAGAAGGAGAAGCTGAACTTGACTAGGGAGGAAGTACAACGGCGTAAAGAACAGA
CAAAGGTTGAGATGATGAAGGCTGAAGCTCATTTCATGGGTCAGGATCTAGACAAGTTAG
CCCCACACCTCAAAGAGTACTACATATCCATCCAGCGAGAGATAATGGAGCGTCGAGGGA
TTATAAGCTCTCCAAGTAGCAGTTCTGGACCGANNNNNNNNNNN

Your data is an excellent test case – would you mind if I use it in this repository for testing purposes?

Thanks,
Bede

@sivico26
Copy link
Author

Hi @bede,

That sounds fantastic, thank you for looking into this, and continue supporting kindel. Since our pipeline installs kindel from pip, I will be looking forward to the next release. There is no problem with using the data for testing. I am glad you find it useful! If you want I can share the "reference" gene, that I used to map the data in the first place.

@bede
Copy link
Owner

bede commented Nov 26, 2024

Thank you for kinds words. Kindel was overdue some attention, and I think these fixes warrant a v1 release which I'll release and push to PyPI as soon as reasonably possible. Handling of deletions is improved significantly. You can follow my progress in the v1 branch but I'll reply here when ready.

A perfect consensus for the bam you provided is now generated in a single pass:

>glutathione_cns
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NGAGAGAAGGTTGTGCGGCAAGGGAGAACTCGCCGGTTCGGGGTTGTGCAGCGGGGTGCC
CGGCTAGCTCCTcCCCGTCAAGGTCAGCTCCGGCGAGTCCCAAGTCAAGGTTGGTCCCCT
TCCATTCATCCCCTTCCATTCCCGACTTGGAATCGCTCGATCTGCTAGTTCGTACGTCGT
GGGGGTTGATATCTGTATCCGGTTTGAGGAACGGTGGGTTCTCTCCAGATCCGTGGAGTC
AGTGGCCGGTTTAGCATGCGGCGTTTGCGGCTGTTGTTCTAGTGCAGTGATTTTGATTGC
ATCGATTGATTTGCATTTATTTTTCTCTTTGGGTCACAAATCCGTGGTGAATGCTTCTTG
ACTTCGTGTCCCTTGCTAACCATTCCAATCAAaattagatcgctttgtctagatgcttaa
gAATTTAACCGGCTGCAGGATGGCAAAGAACTTGAATCCCTATGGAAACAATGATGGTAA
CTCTGAGGCCTATCTTGAAGGCCAGGAATGGGAGTTCCCTCTTTCTGATTCATTGGAAGA
TTTCGATAATCTTACAATTCCTCAGGTTTGCCACGCTGCTATTGTATTTTGCTGCTATTG
CATTGGCTAACTAAATACTGCTATTGTATTTTGTCCTTGCTGCAAATGGTTTAGATGCAG
CAAGCCATAGGTCTGCGGCGGCCAATTCCATTtcatcctccgagaccACACCCTGCAACT
GTTAATGCGGAAGCCATCCAAGAGCACACTGAGGTAATTGGACAAACTCCCGAATTAGAG
ATTCCTATTCCTCAGTTTAAAAGAGGAGGGAAAGGCAAGGGCAAAACCAAAGGCGCTGGT
AATTTCAGTGGCAAGAGGTTATCACAAAGAGGTAAATCCTTTAGCAAAGATGAAGACAAA
ATTATATGCTCTGCCTTCCTGAATGTGAGCAAAGACCCTATCACTGGTATGTCCCTTTCA
CTTGTAGATGAATAATCTTGAGTTATATAATTATACTGCTACTTCAATCTATGACTTGTA
AATATTTAGGAACAAATCATGGCGGTTATTATCAAAGGATACATAAATATTTTATCGAGA
ACATTGAAGGCCCTTCTACAAGAAGTCAAGTTGCCATTTCTAACAAGTGGCTCACCATCC
AAAAGGCGGTGAACAAATTCTGTGGTCATTTTTGGTTTGTTGAAAGGTTAGACAAAAGTG
GAAAGACTGAGCAGGACCGAGTAAGTCAATGTGtattaattatattgTCTTCCCACCCTA
CAATTTGTAACTAATATACCATTTCGTTCATAGATCGATGATGCTGTTAAACTGTATGAG
GAGTCAGAGCCATGGACGTTAATGCATTGCTGGAACATCCTTCGCCATGAAGCTAAATGG
AGCGATAAGATGGTGGAGATCAATTCTAGAGGCACAAGTACAAAAGTTAATCAGCAGGTT
GCAGGCAACAATCAAGGGGAACAGGGACAATCTGAGCATGATGATAATGGACAGCCAGCT
CGACCTGAAGGAAGGGACAGTGCCAAGAAGCGTAGGAGTCGTGGGACTGCAGACAATGAT
GCATCTAGCGCTGCAATTGAAGTTCTTCAAAGCATGAATGCAAGGGGCCAGATCAAGGAT
GACAAAGAAGACAGTCAGATGGCGCAAATACTCCAATGGAAGGATGCTAAGATAGAGCTT
CAACAAAATATGATTGCTCTGCAGAGAGAGGAGATGCAAAAGAGATGGGAACTTGAGAAG
GAGAAGCTGAACTTGACTAGGGAGGAAGTACAACGGCGTAAAGAACAGACAAAGGTTGAG
ATGATGAAGGCTGAAGCTCATTTCATGGGTCAGGATCTAGACAAGTTAGCCCCACACCTC
AAAGAGTACTACATATCCATCCAGCGAGAGATAATGGAGCGTCGAGGGATTATAAGCTCT
CCAAGTAGCAGTTCTGGACCGANNNNNNNNNNN

@bede
Copy link
Owner

bede commented Nov 27, 2024

Hi @sivico26 , do you know if the pipeline you're using can accommodate the minimum Python version required by Kindel increasing to 3.8?

@sivico26
Copy link
Author

sivico26 commented Nov 27, 2024

Hi @bede, very considerate of you to ask. Yes, we are using 3.9 essentially because clair3 forces us to. Otherwise, we would be using 3.12 or 3.13.

I think 3.8 even reached end-of-life already... So, making the upgrade to 3.8 sounds sensible, maybe even to 3.9.

@bede
Copy link
Owner

bede commented Nov 27, 2024

Excellent, thanks

@bede
Copy link
Owner

bede commented Nov 27, 2024

v1.0.0 is now available on PyPI and I would appreciate your feedback

@bede
Copy link
Owner

bede commented Nov 27, 2024

Also, I have added an example using your data to the README to show the plotting functionality. If you would rather I removed it, please let me know. It is a nice example of --realign working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants