Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alignment configuration] Free end gap configuration - documentation and removal of aligned_ends #2119

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 14 additions & 16 deletions doc/tutorial/pairwise_alignment/configurations.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
#include <seqan3/alignment/configuration/all.hpp>
//! [include]

//! [include_aligned_ends]
#include <seqan3/alignment/configuration/align_config_aligned_ends.hpp>
//! [include_aligned_ends]

//! [include_scoring_scheme]
#include <seqan3/alignment/scoring/aminoacid_scoring_scheme.hpp>
#include <seqan3/alignment/scoring/nucleotide_scoring_scheme.hpp>
Expand All @@ -15,6 +11,10 @@
#include <seqan3/alignment/configuration/align_config_gap_cost_affine.hpp>
//! [include_gap_cost_affine]

//! [include_method]
#include <seqan3/alignment/configuration/align_config_method.hpp>
//! [include_method]

//! [include_output]
#include <seqan3/alignment/configuration/align_config_output.hpp>
//! [include_output]
Expand All @@ -33,18 +33,16 @@
int main()
{
{
//! [aligned_ends]

seqan3::front_end_first fef{std::true_type{}};
seqan3::back_end_first bef{std::false_type{}};
seqan3::front_end_second fes{true};
seqan3::back_end_second bes{false};

auto cfg_1 = seqan3::align_cfg::aligned_ends{seqan3::end_gaps{fef, bef, fes, bes}};
auto cfg_2 = seqan3::align_cfg::aligned_ends{seqan3::end_gaps{fef, fes}};
//! [aligned_ends]
(void) cfg_1;
(void) cfg_2;
//! [method_global_free_end_gaps]
// Example of a semi-global alignment where leading and trailing gaps in the
// second sequence are not penalised:
auto config = seqan3::align_cfg::method_global{
seqan3::align_cfg::free_end_gaps_sequence1_leading{false},
seqan3::align_cfg::free_end_gaps_sequence2_leading{true},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{false},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{true}};
//! [method_global_free_end_gaps]
(void) config;
}

{
Expand Down
93 changes: 40 additions & 53 deletions doc/tutorial/pairwise_alignment/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,45 +110,26 @@ seqan3::align_cfg::method_global.
\remark The method configuration must be given by the user as it strongly depends on the application context.
It would be wrong for us to assume what the intended default behaviour should be.

The global alignment can be further refined by setting the seqan3::align_cfg::aligned_ends option.
The seqan3::align_cfg::aligned_ends class specifies whether or not gaps at the end of the sequences are penalised.
In SeqAn you can configure this behaviour for every end (front and back of the first sequence and second sequence)
separately using the seqan3::end_gaps class.
This class is constructed with up to 4 end gap specifiers (one for every end):

- seqan3::front_end_first - aligning front of first sequence with a gap.
- seqan3::back_end_first - aligning back of first sequence with a gap.
- seqan3::front_end_second - aligning front of second sequence with a gap.
- seqan3::back_end_second - aligning back of second sequence with a gap.

These classes can be constructed with either a constant boolean (std::true_type or std::false_type) or a regular `bool`
argument. The former enables static configuration of the respective features in the alignment algorithm. The
latter allows to configure these features at runtime. This makes setting these values from runtime dependent parameters,
e.g. user input, much easier. The following code snippet demonstrates the different use cases:

\snippet doc/tutorial/pairwise_alignment/configurations.cpp include_aligned_ends
\snippet doc/tutorial/pairwise_alignment/configurations.cpp aligned_ends

The `cfg_1` and the `cfg_2` will result in the exact same configuration of the alignment where aligning the front of
either sequence with gaps is not penalised while the back of both sequences is. The order of the arguments is
irrelevant. Specifiers initialised with constant booleans can be mixed with those initialised with `bool` values.
If a specifier for a particular sequence end is not given, it defaults to the specifier initialised with
`std::false_type`.

\note You should always prefer initialising the end-gaps specifiers using the boolean constants if possible
as it reduces the compile time. The reason for this is that the runtime information is converted into static types
for the alignment algorithm. For every end-gap specifier the compiler will generate two versions for the `true` and the
`false` case. This adds up to 16 different paths the compiler needs to instantiate.

SeqAn also offers \ref predefined_end_gap_configurations "predefined" seqan3::end_gaps configurations that
cover the typical use cases.

| Entity | Meaning |
| -------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
| \ref seqan3::end_gaps::free_ends_none "free_ends_none" | Enables the typical global alignment. |
| \ref seqan3::end_gaps::free_ends_all "free_ends_all" | Enables overlap alignment, where the end of one sequence can overlap the end of the other sequence. |
| \ref seqan3::end_gaps::free_ends_first "free_ends_first" | Enables semi global alignment, where the second sequence is aligned as an infix of the first sequence. |
| \ref seqan3::end_gaps::free_ends_second "free_ends_second" | Enables semi global alignment, where the first sequence is aligned as an infix of the second sequence. |
The global alignment can be further refined by initialising the seqan3::align_cfg::method_global configuration element
with the free end gap specifiers. They specify whether gaps at the end of the sequences are penalised.
In SeqAn you can configure this behaviour for every end, namely for leading and trailing gaps of the first and second
sequence. seqan3::align_cfg::method_global is constructed with 4 free end gap specifiers (one for every end):

- seqan3::align_cfg::free_end_gaps_sequence1_leading - If set to true, aligning leading gaps in first sequence is
not penalised.
- seqan3::align_cfg::free_end_gaps_sequence2_leading - If set to true, aligning leading gaps in second sequence is
not penalised.
- seqan3::align_cfg::free_end_gaps_sequence1_trailing - If set to true, aligning trailing gaps in first sequence is
not penalised.
- seqan3::align_cfg::free_end_gaps_sequence2_trailing - If set to true, aligning trailing gaps in second sequence is
not penalised.

The following code snippet demonstrates the different use cases:

\snippet doc/tutorial/pairwise_alignment/configurations.cpp include_method
\snippet doc/tutorial/pairwise_alignment/configurations.cpp method_global_free_end_gaps

The order of arguments is fixed and must always be as shown in the example.

\assignment{Assignment 2}

Expand All @@ -161,8 +142,8 @@ would be aligned as an infix of the second sequence.

\include doc/tutorial/pairwise_alignment/pairwise_alignment_solution_2.cpp

To accomplish our goal we simply add the align_cfg::aligned_ends option initialised with `free_ends_first` to the
existing configuration.
To accomplish our goal we initialise the `method_global` option with the free end specifiers
for sequence 1 set to `true`, and those for sequence 2 with `false`.

\endsolution

Expand Down Expand Up @@ -193,7 +174,7 @@ the alignment computation. The default initialised seqan3::align_cfg::gap_cost_a
and for a gap opening to `0`. Note that the gap open score is added to the gap score when a gap is opened within the
alignment computation. Therefore setting the gap open score to `0` disables affine gaps.
You can pass a seqan3::align_cfg::extension_score and a seqan3::align_cfg::open_score object to initialise the scheme
with custom gap penalties. The penalties can be assessed changed later by using the respective member variables
with custom gap penalties. The penalties can be assessed changed later by using the respective member variables
`extension_score` and `open_score`.

\attention SeqAn's alignment algorithm computes the maximal similarity score, thus the match score must be set to a
Expand Down Expand Up @@ -306,19 +287,25 @@ To make the configuration easier, we added a shortcut called seqan3::align_cfg::
\snippet doc/tutorial/pairwise_alignment/configurations.cpp include_edit
\snippet doc/tutorial/pairwise_alignment/configurations.cpp edit

The `edit_scheme` still has to be combined with an alignment method. When combining it
with the seqan3::align_cfg::method_global configuration element, the edit distance algorithm
can be further refined with free end gaps (see section `Global and semi-global alignment`).

\attention Only the following free end gap configurations are supported for the
global alignment configuration with the edit scheme:
- no free end gaps (all free end gap specifiers are set to `false`)
- free end gaps for the first sequence (free end gaps are set to `true` for the first and
to `false` for the second sequence)
Using any other free end gap configuration will disable the edit distance and fall back to the standard pairwise
alignment and will not use the fast bitvector algorithm.

### Refine edit distance

The edit distance can be further refined using seqan3::align_cfg::aligned_ends to also compute a semi-global alignment
and the seqan3::align_cfg::min_score configuration to fix an edit score (a limit of the allowed number of edits). If the
respective alignment could not find a solution within the given error bound, the resulting score is infinity
(corresponds to std::numeric_limits::max). Also the alignment and the begin and end positions of the alignment can be
computed using a combination of the align_cfg::output_alignment, align_cfg::output_begin_position and
align_cfg::output_end_position options.

\attention Only the options seqan3::free_ends_none and seqan3::free_ends_first
are supported for the aligned ends configuration with the edit distance. Using any other aligned ends configuration will
disable the edit distance and fall back to the standard pairwise alignment and will not use the fast bitvector
algorithm.
The edit distance can be further refined using the seqan3::align_cfg::min_score configuration to fix an edit score
(a limit of the allowed number of edits).. If the respective alignment could not find a solution within the given error
bound, the resulting score is infinity (corresponds to std::numeric_limits::max). Also the alignment and the begin and
end positions of the alignment can be computed using a combination of the align_cfg::output_alignment,
align_cfg::output_begin_position and align_cfg::output_end_position options.

\assignment{Assignment 6}

Expand Down
3 changes: 1 addition & 2 deletions doc/tutorial/pairwise_alignment/pa_assignment_3_solution.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,7 @@ int main()
seqan3::align_cfg::free_end_gaps_sequence1_trailing{false},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{true}} |
seqan3::align_cfg::scoring_scheme{seqan3::aminoacid_scoring_scheme{
seqan3::aminoacid_similarity_matrix::BLOSUM62}} |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_second};
seqan3::aminoacid_similarity_matrix::BLOSUM62}};

for (auto const & res : seqan3::align_pairwise(source, config))
seqan3::debug_stream << "Score: " << res.score() << '\n';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,7 @@ int main()
seqan3::align_cfg::free_end_gaps_sequence2_leading{false},
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::scoring_scheme{seqan3::nucleotide_scoring_scheme{}} |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first};
seqan3::align_cfg::scoring_scheme{seqan3::nucleotide_scoring_scheme{}};

for (auto const & res : seqan3::align_pairwise(seqan3::views::pairwise_combine(vec), config))
seqan3::debug_stream << "Score: " << res.score() << '\n';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ int main()
seqan3::match_score{4}, seqan3::mismatch_score{-2}}} |
seqan3::align_cfg::gap_cost_affine{seqan3::align_cfg::open_score{0},
seqan3::align_cfg::extension_score{-4}} |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_all} |
output_config;

for (auto const & res : seqan3::align_pairwise(std::tie(seq1, seq2), config))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ int main()
seqan3::match_score{4}, seqan3::mismatch_score{-2}}} |
seqan3::align_cfg::gap_cost_affine{seqan3::align_cfg::open_score{0},
seqan3::align_cfg::extension_score{-4}} |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_all} |
output_config |
seqan3::align_cfg::band_fixed_size{seqan3::align_cfg::lower_diagonal{-3},
seqan3::align_cfg::upper_diagonal{8}};
Expand Down
1 change: 0 additions & 1 deletion doc/tutorial/read_mapper/read_mapper_step3.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ void map_reads(std::filesystem::path const & query_path,
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::output_alignment{} |
seqan3::align_cfg::output_score{};
//! [alignment_config]
Expand Down
1 change: 0 additions & 1 deletion doc/tutorial/read_mapper/read_mapper_step4.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ void map_reads(std::filesystem::path const & query_path,
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::output_alignment{} |
seqan3::align_cfg::output_begin_position{} |
seqan3::align_cfg::output_score{};
Expand Down
2 changes: 0 additions & 2 deletions doc/tutorial/search/search_solution5.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ void run_text_single()
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::output_alignment{} |
seqan3::align_cfg::output_score{};

Expand Down Expand Up @@ -69,7 +68,6 @@ void run_text_collection()
seqan3::align_cfg::free_end_gaps_sequence1_trailing{true},
seqan3::align_cfg::free_end_gaps_sequence2_trailing{false}} |
seqan3::align_cfg::edit_scheme |
seqan3::align_cfg::aligned_ends{seqan3::free_ends_first} |
seqan3::align_cfg::output_alignment{} |
seqan3::align_cfg::output_score{};

Expand Down
Loading