Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alphabet_base expects functions instead of look-up tables. #2427

Merged
merged 28 commits into from
Mar 24, 2021

Conversation

marehr
Copy link
Member

@marehr marehr commented Mar 3, 2021

This is part of seqan/product_backlog#295

This changes rank_to_char and char_to_rank to static member functions.

Blocked by #2430.

@vercel
Copy link

vercel bot commented Mar 3, 2021

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/seqan/seqan3/FVGwq8hgw9v9qLeKd7pTPTYdQpJs
✅ Preview: https://seqan3-git-fork-marehr-alphabetbasefunctions-seqan.vercel.app

@marehr marehr requested review from a team and simonsasse and removed request for a team March 4, 2021 00:03
@marehr marehr force-pushed the alphabet_base_functions branch from afbbc26 to 2537f10 Compare March 5, 2021 14:05
@codecov
Copy link

codecov bot commented Mar 5, 2021

Codecov Report

Merging #2427 (a9ba4d7) into master (a1dd8cd) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2427      +/-   ##
==========================================
+ Coverage   98.29%   98.31%   +0.01%     
==========================================
  Files         268      269       +1     
  Lines       11055    11134      +79     
==========================================
+ Hits        10867    10946      +79     
  Misses        188      188              
Impacted Files Coverage Δ
...clude/seqan3/alphabet/aminoacid/aminoacid_base.hpp 100.00% <ø> (ø)
...ude/seqan3/alphabet/nucleotide/nucleotide_base.hpp 100.00% <ø> (ø)
include/seqan3/alphabet/alphabet_base.hpp 100.00% <100.00%> (ø)
include/seqan3/alphabet/aminoacid/aa10li.hpp 100.00% <100.00%> (ø)
include/seqan3/alphabet/aminoacid/aa10murphy.hpp 100.00% <100.00%> (ø)
include/seqan3/alphabet/aminoacid/aa20.hpp 100.00% <100.00%> (ø)
include/seqan3/alphabet/aminoacid/aa27.hpp 100.00% <100.00%> (ø)
...alphabet/cigar/exposition_only/cigar_operation.hpp 100.00% <100.00%> (ø)
...ude/seqan3/alphabet/composite/alphabet_variant.hpp 100.00% <100.00%> (ø)
include/seqan3/alphabet/gap/gap.hpp 100.00% <100.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1dd8cd...a9ba4d7. Read the comment docs.

@marehr marehr force-pushed the alphabet_base_functions branch 2 times, most recently from de9c83e to 1cd0f8e Compare March 8, 2021 21:50
Copy link
Member

@simonsasse simonsasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found just one typo :)

include/seqan3/alphabet/composite/alphabet_variant.hpp Outdated Show resolved Hide resolved
@marehr marehr requested a review from simonsasse March 10, 2021 12:56
@marehr marehr force-pushed the alphabet_base_functions branch from 2aafb9d to d270fe0 Compare March 11, 2021 12:01
Copy link
Member

@simonsasse simonsasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only thing is that when I see chr, I always think of chromosome and not character. Maybe its worth changing the chr to char. Probably just personal preference.

@eseiler
Copy link
Member

eseiler commented Mar 15, 2021

Looks good. Only thing is that when I see chr, I always think of chromosome and not character. Maybe its worth changing the chr to char. Probably just personal preference.

@simonsasse, can you approve and assign core for review?

@marehr marehr requested a review from simonsasse March 15, 2021 09:38
Copy link
Member

@simonsasse simonsasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eseiler eseiler self-requested a review March 15, 2021 09:41
Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, just some documentation remarks and general questions.


This point does not need / should not be addressed in this PR:

I am missing a bit some documentation on what the advantage of this is. If there is a changelog entry, it should shortly describe why this is good?
I think doc/howto/write_an_alphabet/dna2_derive_from_base.cpp can also shortly mention that there is an alternative to tables (it does show it, but doesn't explicitly mention it).

doc/cookbook/custom_dna4.cpp Outdated Show resolved Hide resolved
include/seqan3/alphabet/alphabet_base.hpp Outdated Show resolved Hide resolved
include/seqan3/alphabet/alphabet_base.hpp Outdated Show resolved Hide resolved
Comment on lines 223 to 225
using index_t = std::make_unsigned_t<char_type>;
index_t const id = static_cast<index_t>(chr);
return derived_type::char_to_rank[id];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these lines different?
From what I've read, the bug only refers to deprecation warnings being emitted even when the code is not called.
So this block should also be

        using index_t = std::make_unsigned_t<char_type>;
        return derived_type::char_to_rank[static_cast<index_t>(chr)];

?

include/seqan3/alphabet/aminoacid/aa10li.hpp Show resolved Hide resolved
include/seqan3/alphabet/gap/gap.hpp Outdated Show resolved Hide resolved
include/seqan3/alphabet/nucleotide/nucleotide_base.hpp Outdated Show resolved Hide resolved
@@ -43,10 +43,16 @@ class gap : public alphabet_base<gap, 1, char>
friend base_t;

//!\brief Value to char conversion table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true anymore :)

include/seqan3/alphabet/quality/quality_base.hpp Outdated Show resolved Hide resolved
@marehr marehr force-pushed the alphabet_base_functions branch 2 times, most recently from 91f8071 to 7b001db Compare March 24, 2021 00:37
@marehr marehr requested a review from eseiler March 24, 2021 00:37
@marehr
Copy link
Member Author

marehr commented Mar 24, 2021

I added / changed documentation, I tried to clarify the snippets.

Otherwise, I used your suggestion not to use the term lookup table, but just a general description.

Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't force-push during reviews, I have no idea what you changed. Looks like you addressed the points though.

Can you check the code coverage? Maybe you need to rebase on master

marehr added 6 commits March 24, 2021 20:08
I tried to define the lookup tables within the function itself, but we
can't declare the lookup table as `static constexpr ...`, because the
function is declared constexpr and does not allow any linkage within it.

So we would need to declare the lookup table as constexpr only, like so

```patch
-    static constexpr char_type rank_to_char[alphabet_size]
+    static constexpr char_type rank_to_char(rank_type const rank)
+    {
+        constexpr char_type lookup_table[] =
         {
             'A',
             'C',
             'G',
             'T'
         };

+        return lookup_table[rank];
+    }
```

The problem is that gcc does not optimise (clang does!) this case with
lookup tables, but moves the memory-in every time we call this function.

See https://godbolt.org/z/K8fcqe.

The micro benchmark `test/performance/alphabet/alphabet_assign_char_benchmark`
showed that `assign_char<seqan3::dna4>` took `81.6 ns` before that
patch, but `2903 ns` after that change.

So I reverted to use the normal in-class lookup-table declaration that is now
used within the static function itself.

I would have preferred to remove that class-visible definition.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99320
marehr added 22 commits March 24, 2021 20:08
@marehr marehr force-pushed the alphabet_base_functions branch from 7b001db to a9ba4d7 Compare March 24, 2021 19:09
@marehr marehr requested a review from eseiler March 24, 2021 21:27
@eseiler eseiler merged commit f4391dc into seqan:master Mar 24, 2021
@marehr marehr deleted the alphabet_base_functions branch March 24, 2021 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants