-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alignment] Wrong alignment score #3043
Comments
Hi @yceh, Thank you for submitting the bug report. I could reduce the sequences to "GGCAAGAA" and "CGAAGC". Alignment:
Score Matrix:
Trace Matrix:
Path:
Score Matrix (Just Path):
Trace Matrix (Just Path):
|
If you add #include <seqan3/alignment/aligned_sequence/debug_stream_alignment.hpp> you can print the alignment: auto & res = *results.begin();
seqan3::debug_stream << "alignment:\n" << res << "\n"; |
I found that changing the scoring scheme type to seqan3::nucleotide_scoring_scheme<int8_t> Mind that with |
Something goes wrong with the traceback.
The score for this alignment is -239. Is not optimal, because you could align both A:
The score for this alignment is |
https://godbolt.org/z/qrePrW67Y Some debug code#include <utility>
#include <vector>
#include <seqan3/alignment/aligned_sequence/debug_stream_alignment.hpp>
#include <seqan3/alignment/configuration/align_config_gap_cost_affine.hpp>
#include <seqan3/alignment/configuration/all.hpp>
#include <seqan3/alignment/matrix/detail/debug_matrix.hpp>
#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alignment/scoring/nucleotide_scoring_scheme.hpp>
#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
int main(int argc, char ** argv)
{
using namespace seqan3::literals;
seqan3::dna4_vector ref = "GGCAAGAA"_dna4;
seqan3::dna4_vector query = "CGAAGC"_dna4;
seqan3::dna4 letter;
seqan3::nucleotide_scoring_scheme<int> nu_scheme;
nu_scheme.score('A'_dna4, 'A'_dna4) = 145;
nu_scheme.score('A'_dna4, 'T'_dna4) = -144;
nu_scheme.score('A'_dna4, 'G'_dna4) = -153;
nu_scheme.score('A'_dna4, 'C'_dna4) = -152;
nu_scheme.score('T'_dna4, 'A'_dna4) = -144;
nu_scheme.score('T'_dna4, 'T'_dna4) = 144;
nu_scheme.score('T'_dna4, 'G'_dna4) = -143;
nu_scheme.score('T'_dna4, 'C'_dna4) = -140;
nu_scheme.score('G'_dna4, 'A'_dna4) = -153;
nu_scheme.score('G'_dna4, 'T'_dna4) = -143;
nu_scheme.score('G'_dna4, 'G'_dna4) = 144;
nu_scheme.score('G'_dna4, 'C'_dna4) = -145;
nu_scheme.score('C'_dna4, 'A'_dna4) = -152;
nu_scheme.score('C'_dna4, 'T'_dna4) = -140;
nu_scheme.score('C'_dna4, 'G'_dna4) = -145;
nu_scheme.score('C'_dna4, 'C'_dna4) = 144;
int gap_open_cost = -52;
int gap_extend_cost = -58;
auto new_gap_cost = gap_extend_cost + gap_open_cost;
// Configure the alignment kernel.
auto config = seqan3::align_cfg::method_global{} | seqan3::align_cfg::scoring_scheme{nu_scheme}
| seqan3::align_cfg::gap_cost_affine{seqan3::align_cfg::open_score{gap_open_cost},
seqan3::align_cfg::extension_score{gap_extend_cost}}
| seqan3::align_cfg::detail::debug{};
auto results = seqan3::align_pairwise(std::tie(ref, query), config);
auto & res = *results.begin();
seqan3::debug_stream << res.alignment() << std::endl;
seqan3::debug_stream << seqan3::detail::debug_matrix{res.score_matrix()} << std::endl;
seqan3::debug_stream << seqan3::detail::debug_matrix{res.trace_matrix()} << std::endl;
// seqan3::debug_stream<<"aligned_ref:\n"<<std::get<0>(res.alignment())<<"\n";
// seqan3::debug_stream<<"aligned_query:\n"<<std::get<1>(res.alignment())<<"\n";
auto seq1 = std::get<0>(res.alignment());
auto seq2_iter = std::get<1>(res.alignment()).begin();
int score_so_far = 0;
int q_pos = 0;
int r_pos = 0;
enum
{
M,
I,
D
} state = M;
seqan3::debug_stream << "seq1\tseq2\tstate\tq_pos\tr_pos\tscore\n";
for (auto seq1_iter = seq1.begin(); seq1_iter < seq1.end(); seq1_iter++)
{
if (*seq1_iter == seqan3::gap{})
{
q_pos++;
if (state == D)
{
score_so_far += gap_extend_cost;
}
else
{
score_so_far += new_gap_cost;
}
state = D;
}
else if (*seq2_iter == seqan3::gap{})
{
r_pos++;
if (state == I)
{
score_so_far += gap_extend_cost;
}
else
{
score_so_far += new_gap_cost;
}
state = I;
}
else
{
r_pos++;
q_pos++;
seqan3::alphabet_variant<seqan3::dna4, seqan3::gap> seq1_ele = *seq1_iter;
seqan3::alphabet_variant<seqan3::dna4, seqan3::gap> seq2_ele = *seq2_iter;
score_so_far += nu_scheme.score(seq1_ele.convert_to<seqan3::dna4>(), seq2_ele.convert_to<seqan3::dna4>());
state = M;
}
seqan3::debug_stream << *seq1_iter << '\t' << *seq2_iter << '\t'
<< (state == D ? "DEL" : (state == I ? "INS" : "MAT")) << '\t' << q_pos << '\t' << r_pos
<< '\t' << score_so_far << '\n';
seq2_iter++;
}
seqan3::debug_stream << "alignemnt score in result:" << res.score() << ", " << score_so_far << "\n";
return res.score() == score_so_far ? 0 : -1;
} Alignment:
Score Matrix:
Trace Matrix:
Path:
Score Matrix (Just Path):
Trace Matrix (Just Path):
|
OK, I did some digging. The meaning of the parameters is not that explicitly stated in the paper, but my interpretation is:
The equation seems to be equivalent to E.g., nu_scheme.set_simple_scheme(seqan3::match_score{145}, seqan3::mismatch_score{-144});
int const gap_open_cost = -100;
int const gap_extend_cost = -22; // works
int const gap_extend_cost = -23; // does not work
nu_scheme.set_simple_scheme(seqan3::match_score{145}, seqan3::mismatch_score{-146});
int const gap_open_cost = -100;
int const gap_extend_cost = -22; // works
int const gap_extend_cost = -23; // works In the example, the scores are more sophisticated, I think the most safe would be to go with the most positive mismatch All in all, it seems that the algorithm does not work because we are in violation of this assumption made by Gotoh. Edit:
|
Added carry bit to up and left open trace directions in order to capture original open signal even if it is overwritten by the orhtogonal direction. In the particular issue the left extension overrites the up open signal in the last column and fifth row. The traceback now follows the vertical or horizontal direction unitl it encounters the carry bit signal.
Hey, so I looked at it today and the problem was that on very few occasions the original |
Added carry bit to up and left open trace directions in order to capture original open signal even if it is overwritten by the orhtogonal direction. In the particular issue the left extension overrites the up open signal in the last column and fifth row. The traceback now follows the vertical or horizontal direction unitl it encounters the carry bit signal.
Added carry bit to up and left open trace directions in order to capture original open signal even if it is overwritten by the orhtogonal direction. In the particular issue the left extension overrites the up open signal in the last column and fifth row. The traceback now follows the vertical or horizontal direction unitl it encounters the carry bit signal.
We merged #3098 which should fix this issue. Please re-open if the issue has not been resolved for you. |
Does this problem persist on the current master?
Is there an existing issue for this?
Current Behavior
For the example attached, seqan report an alignment score of 34240, but the score of the alignment returned is 34202.
Expected Behavior
The reported alignment score and the score calculated from the returned alignment should match.
The attached code prints the alignment score at each column, for a majority of other alignments, the alignment score at the last column matches the alignment score reported by seqan, but it is still entirely possible that I didn't calculate the score correctly or I misunderstood the doc. (and for this particular example, they will match if I trim away the last nucleotide of both reference and query)
Steps To Reproduce
The following code should produce this issue:
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: