-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CIGAR string support to alignment IO #1192
Conversation
09edc53
to
956eff3
Compare
956eff3
to
bc4c356
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For commits 1–4. It's mostly about documentation that was not adapted to code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For commits 5-6.
test/unit/io/alignment_file/alignment_file_format_test_template.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more curiosity...
bc4c356
to
3887b07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already really good! However, a few small issues remain:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please check the CI results (header test + doxygen).
3887b07
to
cd42682
Compare
Codecov Report
@@ Coverage Diff @@
## master #1192 +/- ##
=========================================
- Coverage 97.61% 97.6% -0.02%
=========================================
Files 222 222
Lines 9007 8961 -46
=========================================
- Hits 8792 8746 -46
Misses 215 215
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, looks good!
Hi, |
Hi @ppericard, great to hear that this module is in use :) There is an internal compiler error in the range library that has been fixed in the new range-v3 release so I am just wating for some PRs that make our library compatible to the that release. I think it will be a matter of 2-3 weeks. I will try raise the priority of this! |
db2f70d
to
3bc5b0a
Compare
2520271
to
da4e5b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor stuff 💅
da4e5b1
to
c2c7f62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was the final round.
* is based on sequence at the second position of the \p alignment pair, | ||
* namely the query sequence. | ||
* \attention Note that CIGAR elements (respectively by their CIGAR operation) are always related to one of the | ||
* two sequences in a pairwise alignment. In this case, the resulting cigar_vector is based on sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. What information do you like to express?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the cigar string is a relative thing... If it says "deletion of 2 bases", it means a deletion in the query not the reference, so you need to aware which is the reference sequence and which is the query sequence.
In the case of map_aligned_values_to_cigar_op
it is query_char and reference_char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I find the description a bit complicated. I proposed some alternative description in a comment above. What do you think?
@rrahn Please mark the others as resolved 🙏 |
c2c7f62
to
0de953e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one little 💅 thing for some documentation.
* | ||
* The following alignment reference sequence on top and the query sequence at | ||
* the bottom. | ||
* \attention Note that CIGAR elements (respectively by their CIGAR operation) are always related to one of the two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* \attention Note that CIGAR elements (respectively by their CIGAR operation) are always related to one of the two | |
The first sequence is always considered the reference sequence while the second one is considered the query sequence. The cigar operations regarding insertions and deletions are set accordingly. |
Is it this, what you want to express?
If yes, could make the same/similar note above for the other function?
* is based on sequence at the second position of the \p alignment pair, | ||
* namely the query sequence. | ||
* \attention Note that CIGAR elements (respectively by their CIGAR operation) are always related to one of the | ||
* two sequences in a pairwise alignment. In this case, the resulting cigar_vector is based on sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I find the description a bit complicated. I proposed some alternative description in a comment above. What do you think?
* in a pairwise alignment. In this case, the resulting cigar_vector | ||
* is based on sequence at the second position of the \p alignment pair, | ||
* namely the query sequence. | ||
* | ||
* ### Example: | ||
* ### Theoretical Example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unresolved?
* in a pairwise alignment. In this case, the resulting cigar_vector | ||
* is based on sequence at the second position of the \p alignment pair, | ||
* namely the query sequence. | ||
* \attention Note that CIGAR elements (respectively by their CIGAR operation) are always related to one of the two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be also a candidate not to miss out if you agree on changing the description as proposed.
dada220
to
2062407
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation stuff 💅
* \tparam alignment_type Must model the seqan3::tuple_like and must have std::tuple_size 2. | ||
* Each tuple element must be a range over values comparable to seqan3::gap. | ||
* \param alignment The alignment, represented by a pair of aligned sequences, | ||
* to be transformed into CIGAR_vector based on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIGAR_vector
cigar_vector
cigar vector
From these three I would prefer the latter since it does not name a variable really. Also in the note block
2062407
to
2abd6eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am truly sorry, I just saw that there is still inconsistent uses of CIGAR string
and cigar string
as well. It is ok to leave it as is, but if you want to change it please keep me a heads up. Otherwise I will just merge it in a 30 minutes or so.
fabc4bb
to
654bb1c
Compare
@rrahn I also updated the changelog! |
…nore clipping when constructing the alignment.
…d reduce redundant code.
654bb1c
to
1cf7756
Compare
@rrahn I think this can be merged now. I also rebased on current master. |
@rrahn ping |
Blocked by #1194see last commit, the alternative way to get the help page description does not cause the ICE, so I am not dependent on the ranges-v3 update.Review commit by commit and note the following:
regarding commit 1: When the alingment IO was written, no cigar alphabet was present. This commit introduces the alphabet and replaces
pair<char, uint32_t>
.regarding commit 2: This commit belongs to the refactoring of the first, but it did ONLY move the get_cigar_vector code above the get_cigar_string code to make use of it. Git does a horrible job in detecting this change (where actually there is none...). So I kept it separate to see the actual differences in the former commit.
regarding commit 3: We ignored hard clipping and stored soft clipping in a separate variable before because it was not needed. Now the CIGAR string should represent the exact input, with hard clipping, so the functions were adapted to return the complete cigar vector.
regarding commit 4: The parse_cigar/parse_binary_cigar functions were format specific and should not be free functions but member functions.
regarding commit 5: Support reading of CIGAR fields.
regarding commit 6: Support writing of CIGAR fields. Note that for BAM, more elaborate changes were needed to compute the ref_length from the CIGAR string (formerly obtained by simply querying the alignment length) because BAM writes out
bin
information for the BAM index that needs the ref_length information.