Skip to content

Commit

Permalink
Update concatenated seqs (#2947)
Browse files Browse the repository at this point in the history
* [misc] simplify and improve concat_seqs

* Apply suggestions from code review

Co-authored-by: Enrico Seiler <[email protected]>
  • Loading branch information
h-2 and eseiler authored Mar 13, 2022
1 parent 5833e58 commit 9ec41b7
Show file tree
Hide file tree
Showing 2 changed files with 232 additions and 126 deletions.
236 changes: 111 additions & 125 deletions include/seqan3/alphabet/container/concatenated_sequences.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,116 +26,13 @@
#include <cereal/types/vector.hpp>
#endif

namespace seqan3::detail
{

//!\brief Function object for seqan3::detail::as_const.
//!\ingroup alphabet_container
struct as_const_fn
{
//!\brief Operator that returns rvalues as rvalues.
template <typename t>
t operator()(t const && arg) const
{
return std::move(arg);
}

//!\brief Operator that returns lvalue references as lvalue-to-const-references.
template <typename t>
t const & operator()(t const & arg) const
{
return arg;
}
};

/*!\brief A view that provides only `const &` to elements of the underlying range.
* \tparam urng_t The type of the range being processed. See below for requirements. [template parameter is
* omitted in pipe notation]
* \param[in] urange The range being processed. [parameter is omitted in pipe notation]
* \returns A range of `const`-protected elements.
* \ingroup alphabet_container
*
* \details
*
* \header_file{seqan3/alphabet/container/concatenated_sequences.hpp}
*
* ### View properties
*
*
* | Concepts and traits | `urng_t` (underlying range type) | `rrng_t` (returned range type) |
* |----------------------------------|:-------------------------------------:|:--------------------------------------------------:|
* | std::ranges::input_range | *required* | *preserved* |
* | std::ranges::forward_range | | *preserved* |
* | std::ranges::bidirectional_range | | *preserved* |
* | std::ranges::random_access_range | | *preserved* |
* | std::ranges::contiguous_range | | *preserved* |
* | | | |
* | std::ranges::viewable_range | *required* | *guaranteed* |
* | std::ranges::view | | *guaranteed* |
* | std::ranges::sized_range | | *preserved* |
* | std::ranges::common_range | | *preserved* |
* | std::ranges::output_range | | *lost* |
* | seqan3::const_iterable_range | | *preserved* |
* | std::semiregular | | *preserved* |
* | | | |
* | std::ranges::range_reference_t | | `t &` -> `t const &` but `t` -> `t` |
*
* See the \link views views submodule documentation \endlink for detailed descriptions of the view properties.
*
* \hideinitializer
*/
inline constexpr auto as_const = std::views::transform(seqan3::detail::as_const_fn{});

/*!\brief The reference type of seqan3::concatenated_sequences.
* \ingroup alphabet_container
* \tparam value_type The value_type of the seqan3::concatenated_sequences.
* \tparam is_const_ref Reference type or const reference type.
*
* \details
*
* A light-weight type that inherits from the returned type of the seqan3::views::slice adaptor (std::span, std::ranges::subrange, ...),
* but additionally provides implicit convertibility to the `value_type`. This is needed so that `value_type` and
* `reference` type of seqan3::concatenated_sequences satisfy std::common_reference_with.
*
* The const version of this type additionally ensures deep constness to maintain container-like behaviour.
*/
template <typename value_type, bool const_>
struct concatenated_sequences_reference_proxy :
public std::conditional_t<const_,
decltype(std::declval<value_type const &>() | detail::as_const | views::slice(0,1)),
decltype(std::declval<value_type &>() | views::slice(0,1))>
{
//!\brief The base type.
using base_t =
std::conditional_t<const_,
decltype(std::declval<value_type const &>() | detail::as_const | views::slice(0,1)),
decltype(std::declval<value_type &>() | views::slice(0,1))>;

//!\brief Inherit the base type's constructors.
using base_t::base_t;

//!\brief Construct from base type.
concatenated_sequences_reference_proxy(base_t && rhs) : base_t{std::move(rhs)} {}

//!\brief Implicitly convert to the `value_type` of seqan3::concatenated_sequences.
operator value_type() const
{
value_type ret;
ret.resize(std::ranges::size(*this));
std::ranges::copy(*this, std::ranges::begin(ret));
return ret;
}
};

} // namespace seqan3::detail

namespace seqan3
{

/*!\brief Container that stores sequences concatenated internally.
* \tparam inner_type The type of sequences that will be stored. Must satisfy seqan3::reservible_container.
* \tparam data_delimiters_type A container that stores the begin/end positions in the inner_type. Must be
* seqan3::reservible_container and have inner_type's size_type as value_type.
* \tparam underlying_container_type Type of the underlying container. Must satisfy seqan3::reservible_container.
* \tparam data_delimiters_type A container that stores the begin/end positions in the underlying_container_type. Must
* satifsy seqan3::reservible_container and have underlying_container_type's size_type as value_type.
* \implements seqan3::cerealisable
* \implements seqan3::reservible_container
* \ingroup alphabet_container
Expand All @@ -145,7 +42,7 @@ namespace seqan3
* the `StringSet<TString, Owner<ConcatDirect>>` from SeqAn2.
*
* It saves all of the member sequences inside one concatenated sequence internally. If you access an element,
* you instead get a view on the internal string as a proxy. This has the following
* you instead get a view on the internal string. This has the following
* advantages:
*
* * Better cache locality when parsing the sequences linearly (and often also on random access).
Expand All @@ -160,15 +57,19 @@ namespace seqan3
* * Modifying elements is limited to operations on elements of that element, i.e. you can change a character,
* but you can't assign a new member sequence to an existing position.
*
* Note that the "value type" of `seqan3::concatenated_sequences<T>` is **not** `T`. It is a view –– typically
* a std::span or a std::string_view. This view becomes invalid when the container is destroyed or any
* operation is performed on the container that invalidates its iterators, e.g. #push_back().
*
* ### Example
*
* \include test/snippet/alphabet/container/concatenated_sequences.cpp
*
* ### Exceptions
*
* Whenever a strong exception guarantee is given for this class, it presumes that
* `std::is_nothrow_move_constructible<typename inner_type::value_type>` otherwise only basic exception safety can
* be assumed.
* `std::is_nothrow_move_constructible<typename underlying_container_type::value_type>` otherwise only basic exception
* safety can be assumed.
*
* ### Thread safety
*
Expand All @@ -178,19 +79,20 @@ namespace seqan3
*
* \experimentalapi{Experimental since version 3.1.}
*/
template <typename inner_type,
typename data_delimiters_type = std::vector<typename inner_type::size_type>>
template <typename underlying_container_type,
typename data_delimiters_type = std::vector<typename underlying_container_type::size_type>>
//!\cond
requires reservible_container<std::remove_reference_t<inner_type>> &&
requires reservible_container<std::remove_reference_t<underlying_container_type>> &&
reservible_container<std::remove_reference_t<data_delimiters_type>> &&
std::is_same_v<std::ranges::range_size_t<inner_type>, std::ranges::range_value_t<data_delimiters_type>>
std::is_same_v<std::ranges::range_size_t<underlying_container_type>,
std::ranges::range_value_t<data_delimiters_type>>
//!\endcond
class concatenated_sequences
{
protected:
//!\privatesection
//!\brief Where the concatenation is stored.
std::decay_t<inner_type> data_values;
std::decay_t<underlying_container_type> data_values;
//!\brief Where the delimiters are stored; begins with 0, has size of size() + 1.
data_delimiters_type data_delimiters{0};

Expand All @@ -200,26 +102,27 @@ class concatenated_sequences
* \{
*/

/*!\brief == inner_type.
/*!\brief A views::slice that represents "one element", typically a std::span.
* \hideinitializer
* \details
* \experimentalapi{Experimental since version 3.1.}
*/
using value_type = std::decay_t<inner_type>;
using value_type = decltype(std::declval<std::decay_t<underlying_container_type> &>() | views::slice(0, 1));

/*!\brief A proxy of type views::slice that represents the range on the concatenated vector.
/*!\brief A views::slice that represents "one element", typically a std::span.
* \hideinitializer
* \details
* \experimentalapi{Experimental since version 3.1.}
*/
using reference = detail::concatenated_sequences_reference_proxy<value_type, false>;
using reference = value_type;

/*!\brief An immutable proxy of type views::slice that represents the range on the concatenated vector.
/*!\brief An immutable views::slice that represents "one element", typically a std::span or std::string_view.
* \hideinitializer
* \details
* \experimentalapi{Experimental since version 3.1.}
*/
using const_reference = detail::concatenated_sequences_reference_proxy<value_type, true>;
using const_reference = decltype(std::declval<std::decay_t<underlying_container_type> const &>() |
views::slice(0, 1));

/*!\brief The iterator type of this container (a random access iterator).
* \hideinitializer
Expand Down Expand Up @@ -692,7 +595,7 @@ class concatenated_sequences
const_reference operator[](size_type const i) const
{
assert(i < size());
return data_values | detail::as_const | views::slice(data_delimiters[i], data_delimiters[i+1]);
return data_values | views::slice(data_delimiters[i], data_delimiters[i+1]);
}

/*!\brief Return the first element as a view. Calling front on an empty container is undefined.
Expand Down Expand Up @@ -773,7 +676,7 @@ class concatenated_sequences
//!\copydoc concat()
const_reference concat() const
{
return data_values | detail::as_const | views::slice(static_cast<size_type>(0), concat_size());
return data_values | views::slice(static_cast<size_type>(0), concat_size());
}

/*!\brief Provides direct, unsafe access to underlying data structures.
Expand All @@ -791,7 +694,7 @@ class concatenated_sequences
//!\copydoc raw_data()
std::pair<decltype(data_values) const &, decltype(data_delimiters) const &> raw_data() const
{
return {std::as_const(data_values), std::as_const(data_delimiters)};
return {data_values, data_delimiters};
}

//!\}
Expand Down Expand Up @@ -1327,8 +1230,11 @@ class concatenated_sequences
* \tparam rng_type The type of range to be inserted; must satisfy \ref is_compatible_with_value_type.
* \param value The value to append.
*
* If the new size() is greater than capacity() then all iterators and references (including the past-the-end
* iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
* This conceptionally adds another element to the container with the specified content,
* i.e. the "outer container" grows by 1.
*
* If the new concat_size() is greater than concat_capacity(), all iterators and references (including the
* past-the-end iterator) are invalidated. Otherwise, only the past-the-end iterator is invalidated.
*
* ### Complexity
*
Expand All @@ -1351,6 +1257,86 @@ class concatenated_sequences
data_delimiters.push_back(data_delimiters.back() + std::ranges::size(value));
}

/*!\brief Appends an empty element to the end of the container.
*
* This conceptionally adds an empty element to the container,
* i.e. the "outer container" grows by 1 and the new back() will be empty.
*
* No iterators are invalidated.
*
* ### Complexity
*
* Amortised constant. Wort-case linear in size().
*
* ### Exceptions
*
* Basic exception guarantee, i.e. guaranteed not to leak. However, the container may contain invalid data after
* an exception is thrown.
*
* \experimentalapi{Experimental since version 3.1.}
*/
void push_back()
{
data_delimiters.push_back(data_delimiters.back());
}

/*!\brief Appends the given element-of-element value to the end of the underlying container.
* \param value The value to append.
*
* This conceptionally performs a `push_back()` on the `back()` of this container,
* i.e. that last inner container grows by 1.
*
* If the new concat_size() is greater than concat_capacity(), all iterators and references (including the
* past-the-end iterator) are invalidated. Otherwise, only the past-the-end iterator is invalidated.
*
* ### Complexity
*
* Amortised constant. Wort-case linear in concat_size().
*
* ### Exceptions
*
* Basic exception guarantee, i.e. guaranteed not to leak. However, the container may contain invalid data after
* an exception is thrown.
*
* \experimentalapi{Experimental since version 3.1.}
*/
void last_push_back(std::ranges::range_value_t<underlying_container_type> const value)
{
data_values.push_back(value);
++data_delimiters.back();
}

/*!\brief Appends the given elements to the end of the underlying container (increases size of last element by n).
* \tparam rng_type The type of range to be inserted; must satisfy \ref is_compatible_with_value_type.
* \param value The value to append.
*
* This conceptionally performs an `insert()` on the `back()` of this container,
* i.e. the last inner container grows by value.size().
*
* If the new concat_size() is greater than concat_capacity(), all iterators and references (including the
* past-the-end iterator) are invalidated. Otherwise, only the past-the-end iterator is invalidated.
*
* ### Complexity
*
* Amortised linear in the size of value. Wort-case linear in concat_size().
*
* ### Exceptions
*
* Basic exception guarantee, i.e. guaranteed not to leak. However, the container may contain invalid data after
* an exception is thrown.
*
* \experimentalapi{Experimental since version 3.1.}
*/
template <std::ranges::forward_range rng_type>
void last_append(rng_type && value)
//!\cond
requires is_compatible_with_value_type<rng_type>
//!\endcond
{
data_values.insert(data_values.end(), std::ranges::begin(value), std::ranges::end(value));
data_delimiters.back() += std::ranges::size(value);
}

/*!\brief Removes the last element of the container.
*
* Calling pop_back on an empty container is undefined. In debug mode an assertion will be thrown.
Expand Down
Loading

1 comment on commit 9ec41b7

@vercel
Copy link

@vercel vercel bot commented on 9ec41b7 Mar 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.