Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add free function alignment_from_cigar and cigar_from_alignment. #3057

Merged

Conversation

smehringer
Copy link
Member

@smehringer smehringer commented Oct 4, 2022

When we remove the alignment field from the SAM file, we need to have a replacement in place to get from a CIGAR string, read from file, to an alignment.

I copied over the functionality from IO into a free function in the /alignment module for now.

We should discuss where and how we will provide this feature.

Core Meeting 11.10.2022 and 13.10.2022

alignment_from_cigar

  1. The order of parameters
    fixed to this: alignment_from_cigar(cigar_vector, read, reference, position_in_reference);
    unless smehringer notices when working with it in the PR that something is at odds.
  2. Should an empty CIGAR be valid?
    No. A Cigar string is correct iff: Sum of lengths of the M/I/S/=/X operations shall equal the length of SEQ. Since an empty query cannot be aligned and empty CIGAR is not valid but indicates that an alignment is unavailable (same as in IO).
    [exception is thrown]
  3. What should happen if the CIGAR does not match the reference or sequence length?
    E.g. seq = ACTG, CIGAR = 3M (too many matches)
    It's a corrupt CIGAR
    [exception is thrown]
  4. What if the reference length is too short for the CIGAR string?
    It's a corrupt CIGAR
    POS (reference start position) plus the sum of the lengths of M/=/X/D/N CIGAR operations may not exceed the length of reference.
    [exception is thrown]
    Note: Circular references should be handled by altering the reference input. Condition for correct reference length:
  5. We expect a zero-based position.

cigar_from_alignment

  1. The order of parameters
cigar_from_alignment(alignment,
                     seqan3::cigar_clipped_bases{.hard_front = 3,
                                                 .hard_back = 3,
                                                 .soft_front = 3,
                                                 .soft_back = 3},
                     true /*extended cigar y/n*/);
  1. Should the soft/hard clipping be defaulted to 0?
    yes
  2. Should an empty alignment ("", "") be valid?
    No, same as for empty CIGAR strings
    [exception is thrown]
  3. What happens if the aligned sequences lengths differ? E.g. (ACT, ATGG)
    It's a corrupt alignment
    [exception is thrown]

ideas for utility functions (dump, no decision made)

auto clipped_bases = cigar_clipped_bases(cigar);

auto len = cigar_query_length(cigar);

sam_out.push_back({read, cigar});

class cigar_utility
{

}

cigar_utiliy util{cigar};
cigar_utiliy util{alignment, clipped_bases};

util.reference_length();
util.query_length();
util.clipping();

util.cigar();
util.alignment(reference, position, read);

@vercel
Copy link

vercel bot commented Oct 4, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
seqan3 ✅ Ready (Inspect) Visit Preview Oct 19, 2022 at 10:28AM (UTC)

@smehringer smehringer added the feature/proposal a new feature or an idea of label Oct 4, 2022
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 4, 2022
@codecov
Copy link

codecov bot commented Oct 4, 2022

Codecov Report

Base: 98.21% // Head: 98.23% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (7d028e6) compared to base (8d81788).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 7d028e6 differs from pull request most recent head 96b6d10. Consider uploading reports for the commit 96b6d10 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3057      +/-   ##
==========================================
+ Coverage   98.21%   98.23%   +0.01%     
==========================================
  Files         275      277       +2     
  Lines       12231    12329      +98     
==========================================
+ Hits        12013    12111      +98     
  Misses        218      218              
Impacted Files Coverage Δ
...lignment/cigar_conversion/alignment_from_cigar.hpp 100.00% <100.00%> (ø)
...lignment/cigar_conversion/cigar_from_alignment.hpp 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@smehringer smehringer force-pushed the io_add_construct_alignment_from_cigar branch from 983dd18 to 6993752 Compare October 7, 2022 10:19
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 7, 2022
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 7, 2022
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 7, 2022
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 7, 2022
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 12, 2022
include/seqan3/alignment/cigar_conversion.hpp Outdated Show resolved Hide resolved
include/seqan3/alignment/cigar_conversion.hpp Outdated Show resolved Hide resolved
include/seqan3/alignment/cigar_conversion.hpp Outdated Show resolved Hide resolved
include/seqan3/alignment/cigar_conversion.hpp Outdated Show resolved Hide resolved
include/seqan3/alignment/cigar_conversion.hpp Outdated Show resolved Hide resolved
include/seqan3/alignment/cigar_conversion.hpp Outdated Show resolved Hide resolved
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 13, 2022
@eseiler
Copy link
Member

eseiler commented Oct 18, 2022

I did a rebase because clang-format couldn't

@seqan-actions seqan-actions removed the lint [INTERNAL] signal for linting label Oct 18, 2022
@smehringer smehringer force-pushed the io_add_construct_alignment_from_cigar branch from 3a95da4 to e3ddab9 Compare October 18, 2022 12:39
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 18, 2022
Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change all the occurrences of cigar string etc. to CIGAR string? It's a bit mixed throughout the documentation.

I still have to check the unit tests

@seqan-actions seqan-actions added the lint [INTERNAL] signal for linting label Oct 19, 2022
@eseiler eseiler removed the lint [INTERNAL] signal for linting label Oct 19, 2022
@seqan-actions seqan-actions added the lint [INTERNAL] signal for linting label Oct 19, 2022
@eseiler eseiler removed the lint [INTERNAL] signal for linting label Oct 19, 2022
@seqan-actions seqan-actions added the lint [INTERNAL] signal for linting label Oct 19, 2022
@smehringer smehringer force-pushed the io_add_construct_alignment_from_cigar branch from 6a60b80 to b86720d Compare October 19, 2022 10:14
@seqan-actions seqan-actions removed the lint [INTERNAL] signal for linting label Oct 19, 2022
@smehringer smehringer requested a review from eseiler October 19, 2022 10:16
@smehringer smehringer force-pushed the io_add_construct_alignment_from_cigar branch from b86720d to 96b6d10 Compare October 19, 2022 10:25
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 19, 2022
@eseiler eseiler merged commit 042498e into seqan:master Oct 19, 2022
@smehringer smehringer deleted the io_add_construct_alignment_from_cigar branch October 19, 2022 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/proposal a new feature or an idea of
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants