Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment configuration free end gaps #2032

30 changes: 14 additions & 16 deletions doc/tutorial/pairwise_alignment/configurations.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
#include <seqan3/alignment/configuration/all.hpp>
//! [include]

//! [include_aligned_ends]
#include <seqan3/alignment/configuration/align_config_aligned_ends.hpp>
//! [include_aligned_ends]

//! [include_scoring_scheme]
#include <seqan3/alignment/scoring/aminoacid_scoring_scheme.hpp>
#include <seqan3/alignment/scoring/nucleotide_scoring_scheme.hpp>
//! [include_scoring_scheme]

//! [include_method]
#include <seqan3/alignment/scoring/method.hpp>
//! [include_method]

//! [include_gap_scheme]
#include <seqan3/alignment/scoring/gap_scheme.hpp>
//! [include_gap_scheme]
Expand All @@ -33,18 +33,16 @@
int main()
{
{
//! [aligned_ends]

seqan3::front_end_first fef{std::true_type{}};
seqan3::back_end_first bef{std::false_type{}};
seqan3::front_end_second fes{true};
seqan3::back_end_second bes{false};

auto cfg_1 = seqan3::align_cfg::aligned_ends{seqan3::end_gaps{fef, bef, fes, bes}};
auto cfg_2 = seqan3::align_cfg::aligned_ends{seqan3::end_gaps{fef, fes}};
//! [aligned_ends]
(void) cfg_1;
(void) cfg_2;
//! [method_global_free_end_gaps]
// Example of a semi-global alignment where leading and trailing gaps in the
// second sequence are not penelised:
auto config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{false},
seqan3::align_cfg::free_end_gaps_sequence2_leading{true},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{false},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{true}};
//! [method_global_free_end_gaps]
(void) config;
}

{
Expand Down
81 changes: 33 additions & 48 deletions doc/tutorial/pairwise_alignment/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,45 +110,23 @@ seqan3::align_cfg::method_global.
\remark The method configuration must be given by the user as it strongly depends on the application context.
It would be wrong for us to assume what the intended default behaviour should be.

The global alignment can be further refined by setting the seqan3::align_cfg::aligned_ends option.
The seqan3::align_cfg::aligned_ends class specifies wether or not gaps at the end of the sequences are penalised.
In SeqAn you can configure this behaviour for every end (front and back of the first sequence and second sequence)
separately using the seqan3::end_gaps class.
This class is constructed with up to 4 end gap specifiers (one for every end):

- seqan3::front_end_first - aligning front of first sequence with a gap.
- seqan3::back_end_first - aligning back of first sequence with a gap.
- seqan3::front_end_second - aligning front of second sequence with a gap.
- seqan3::back_end_second - aligning back of second sequence with a gap.

These classes can be constructed with either a constant boolean (std::true_type or std::false_type) or a regular `bool`
argument. The former enables static configuration of the respective features in the alignment algorithm. The
latter allows to configure these features at runtime. This makes setting these values from runtime dependent parameters,
e.g. user input, much easier. The following code snippet demonstrates the different use cases:

\snippet doc/tutorial/pairwise_alignment/configurations.cpp include_aligned_ends
\snippet doc/tutorial/pairwise_alignment/configurations.cpp aligned_ends

The `cfg_1` and the `cfg_2` will result in the exact same configuration of the alignment where aligning the front of
either sequence with gaps is not penalised while the back of both sequences is. The order of the arguments is
irrelevant. Specifiers initialised with constant booleans can be mixed with those initialised with `bool` values.
If a specifier for a particular sequence end is not given, it defaults to the specifier initialised with
`std::false_type`.

\note You should always prefer initialising the end-gaps specifiers using the boolean constants if possible
as it reduces the compile time. The reason for this is that the runtime information is converted into static types
for the alignment algorithm. For every end-gap specifier the compiler will generate two versions for the `true` and the
`false` case. This adds up to 16 different paths the compiler needs to instantiate.

SeqAn also offers \ref predefined_end_gap_configurations "predefined" seqan3::end_gaps configurations that
cover the typical use cases.

| Entity | Meaning |
| -------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
| \ref seqan3::end_gaps::free_ends_none "free_ends_none" | Enables the typical global alignment. |
| \ref seqan3::end_gaps::free_ends_all "free_ends_all" | Enables overlap alignment, where the end of one sequence can overlap the end of the other sequence. |
| \ref seqan3::end_gaps::free_ends_first "free_ends_first" | Enables semi global alignment, where the second sequence is aligned as an infix of the first sequence. |
| \ref seqan3::end_gaps::free_ends_second "free_ends_second" | Enables semi global alignment, where the first sequence is aligned as an infix of the second sequence. |
The global alignment can be further refined by initialising the configuration element with
the free end gap specifiers. They specify whether gaps at the end of the sequences are penalised.
In SeqAn you can configure this behaviour for every end, namely
for leading and trailing gaps of the first and second sequence.
seqan3::align_cfg::method_global is constructed with 4 free end gap specifiers (one for every end):

- seqan3::align_cfg::free_end_gaps_sequence1_leading - If set to true, aligning leading gaps in first sequence is not penalised.
- seqan3::align_cfg::free_end_gaps_sequence2_leading - If set to true, aligning leading gaps in second sequence is not penalised.
- seqan3::align_cfg::free_end_gaps_sequence1_trailing - If set to true, aligning trailing gaps in first sequence is not penalised.
- seqan3::align_cfg::free_end_gaps_sequence2_trailing - If set to true, aligning trailing gaps in second sequence is not penalised.

The following code snippet demonstrates the different use cases:

\snippet doc/tutorial/pairwise_alignment/configurations.cpp include_method
\snippet doc/tutorial/pairwise_alignment/configurations.cpp method_global_free_end_gaps

The order of arguments is fixed and must always be as shown in the example.

\assignment{Assignment 2}

Expand All @@ -161,8 +139,8 @@ would be aligned as an infix of the second sequence.

\include doc/tutorial/pairwise_alignment/pairwise_alignment_solution_2.cpp

To accomplish our goal we simply add the align_cfg::aligned_ends option initialised with `free_ends_first` to the
existing configuration.
To accomplish our goal we initialise the `method_global` option with the free end specifiers
for sequence 1 set to `true`, and those for sequence 2 with `false`.

\endsolution

Expand Down Expand Up @@ -299,19 +277,26 @@ To make the configuration easier, we added a shortcut called seqan3::align_cfg::
\snippet doc/tutorial/pairwise_alignment/configurations.cpp include_edit
\snippet doc/tutorial/pairwise_alignment/configurations.cpp edit

The `edit_scheme` still has to be combined with an alignment method. When combining it
with the seqan3::align_cfg::method_global configuration element, the edit distance algorithm
can be further refined with free end gaps (see section `Global and semi-global alignment`).
Copy link
Member

@marehr marehr Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there away to inter-reference this with doxygen?


\attention Only the following free end gap configurations are supported for the
global alignment configuration with the edit scheme:
- no free end gaps (all free end gap specifiers are set to `false`)
- free end gaps for the first sequence (free end gaps are set to `true` for the first and
to `false` for the second sequence)
Using any other free end gap configuration will
disable the edit distance and fall back to the standard pairwise alignment and will not use the fast bitvector
algorithm.
Comment on lines +289 to +291
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflow this section


### Refine edit distance

The edit distance can be further refined using seqan3::align_cfg::aligned_ends to also compute a semi-global alignment
and the seqan3::align_cfg::max_error configuration to give an upper limit of the allowed number of edits. If the
The edit distance can be further refined using the seqan3::align_cfg::max_error configuration to give an upper limit of the allowed number of edits. If the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really long all of a sudden?

respective alignment could not find a solution within the given error bound, the resulting score is infinity
(corresponds to std::numeric_limits::max). Also the alignment and the front and back coordinates can be computed using
the align_cfg::result option.

\attention Only the options seqan3::free_ends_none and seqan3::free_ends_first
are supported for the aligned ends configuration with the edit distance. Using any other aligned ends configuration will
disable the edit distance and fall back to the standard pairwise alignment and will not use the fast bitvector
algorithm.

\assignment{Assignment 6}

Compute all pairwise alignments from the assignment 1 (only the scores). Only allow at most 7 errors and
Expand Down
9 changes: 6 additions & 3 deletions doc/tutorial/pairwise_alignment/pa_assignment_3_solution.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,13 @@ int main()
}

// Configure the alignment kernel.
auto config = seqan3::align_cfg::method_global{} |
auto config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{false},
seqan3::align_cfg::free_end_gaps_sequence2_leading{true},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{false},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{true}} |
seqan3::align_cfg::scoring{seqan3::aminoacid_scoring_scheme{
seqan3::aminoacid_similarity_matrix::BLOSUM62}} |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_second};
seqan3::aminoacid_similarity_matrix::BLOSUM62}};

for (auto const & res : seqan3::align_pairwise(source, config))
seqan3::debug_stream << "Score: " << res.score() << '\n';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,12 @@ int main()
"AGGTACGAGCGACACT"_dna4};

// Configure the alignment kernel.
auto config = seqan3::align_cfg::method_global{} |
seqan3::align_cfg::scoring{seqan3::nucleotide_scoring_scheme{}} |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first};
auto config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{true},
seqan3::align_cfg::free_end_gaps_sequence2_leading{false},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::scoring{seqan3::nucleotide_scoring_scheme{}};

for (auto const & res : seqan3::align_pairwise(seqan3::views::pairwise_combine(vec), config))
seqan3::debug_stream << "Score: " << res.score() << '\n';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,14 @@ int main()
auto seq2 = "GGACGACATGACGTACGACTTTACGTACGACTAGC"_dna4;

// Configure the alignment kernel.
auto config = seqan3::align_cfg::method_global{} |
auto config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{true},
seqan3::align_cfg::free_end_gaps_sequence2_leading{true},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{true}} |
seqan3::align_cfg::scoring{seqan3::nucleotide_scoring_scheme{
seqan3::match_score{4}, seqan3::mismatch_score{-2}}} |
seqan3::align_cfg::gap{seqan3::gap_scheme{seqan3::gap_score{-4}}} |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_all} |
seqan3::align_cfg::result{seqan3::with_alignment};

for (auto const & res : seqan3::align_pairwise(std::tie(seq1, seq2), config))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,14 @@ int main()
auto seq2 = "GGACGACATGACGTACGACTTTACGTACGACTAGC"_dna4;

// Configure the alignment kernel.
auto config = seqan3::align_cfg::method_global{} |
auto config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{true},
seqan3::align_cfg::free_end_gaps_sequence2_leading{true},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{true}} |
seqan3::align_cfg::scoring{seqan3::nucleotide_scoring_scheme{
seqan3::match_score{4}, seqan3::mismatch_score{-2}}} |
seqan3::align_cfg::gap{seqan3::gap_scheme{seqan3::gap_score{-4}}} |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_all} |
seqan3::align_cfg::result{seqan3::with_alignment} |
seqan3::align_cfg::band_fixed_size{seqan3::align_cfg::lower_diagonal{-3},
seqan3::align_cfg::upper_diagonal{8}};
Expand Down
7 changes: 5 additions & 2 deletions doc/tutorial/read_mapper/read_mapper_step3.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,12 @@ void map_reads(std::filesystem::path const & query_path,
seqan3::search_cfg::hit_all_best;

//! [alignment_config]
seqan3::configuration const align_config = seqan3::align_cfg::method_global{} |
seqan3::configuration const align_config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{true},
seqan3::align_cfg::free_end_gaps_sequence2_leading{false},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::result{seqan3::with_alignment};
//! [alignment_config]

Expand Down
7 changes: 5 additions & 2 deletions doc/tutorial/read_mapper/read_mapper_step4.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,12 @@ void map_reads(std::filesystem::path const & query_path,
seqan3::search_cfg::error_count{errors}} |
seqan3::search_cfg::hit_all_best;

seqan3::configuration const align_config = seqan3::align_cfg::method_global{} |
seqan3::configuration const align_config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{true},
seqan3::align_cfg::free_end_gaps_sequence2_leading{false},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::result{seqan3::with_alignment};

for (auto && record : query_file_in)
Expand Down
16 changes: 12 additions & 4 deletions doc/tutorial/search/search_solution5.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,13 @@ void run_text_single()

seqan3::configuration const search_config = seqan3::search_cfg::max_error_total{seqan3::search_cfg::error_count{1}} |
seqan3::search_cfg::hit_all_best;
seqan3::configuration const align_config = seqan3::align_cfg::method_global{} |

seqan3::configuration const align_config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{true},
seqan3::align_cfg::free_end_gaps_sequence2_leading{false},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::result{seqan3::with_alignment};

auto search_results = search(query, index, search_config);
Expand Down Expand Up @@ -56,9 +60,13 @@ void run_text_collection()

seqan3::configuration const search_config = seqan3::search_cfg::max_error_total{seqan3::search_cfg::error_count{1}} |
seqan3::search_cfg::hit_all_best;
seqan3::configuration const align_config = seqan3::align_cfg::method_global{} |

seqan3::configuration const align_config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{true},
seqan3::align_cfg::free_end_gaps_sequence2_leading{false},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::result{seqan3::with_alignment};

seqan3::debug_stream << "-----------------\n";
Expand Down
Loading