-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/seq file expose valid formats #863
Feature/seq file expose valid formats #863
Conversation
Codecov Report
@@ Coverage Diff @@
## master #863 +/- ##
==========================================
+ Coverage 96.86% 96.86% +<.01%
==========================================
Files 211 212 +1
Lines 8249 8274 +25
==========================================
+ Hits 7990 8015 +25
Misses 259 259
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add cards for structure file and alignment file to also add extesions?
template <IStream2 stream_type, | ||
SequenceFileInputFormat file_format> | ||
sequence_file_input(stream_type && stream, | ||
file_format const &) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_format const &) | |
file_format const &) |
💅 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not resolved
41de320
to
bf8e1e0
Compare
What's the status here? |
my request is still not resolved ☝️ |
🦏 ... |
bf8e1e0
to
e8f6873
Compare
Somehow I have overwritten the submitted changes from your review and by that undo the changes. Please check if now everything is as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry one more thing. Otherwise LGTM now
//!\returns std::vector over std::string with all valid file extensions specified by `valid_formats`. | ||
static std::vector<std::string> valid_file_extensions() | ||
{ | ||
std::vector<std::string> extensions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will need this code in each of the files (6 in total currently) so to reduce code it would be great if this can be outsourced into a function in io/detail/misc
🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already proposed a small mixin class that adds this feature. But @h-2 was proposing to keep it simple. I am in general in favour for DRYing this out and don't see any points against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we should sit down and think about what else we can DRY about the files and do it in one step altogether; but if you really want to factor it out now, I would prefer not to do a CRTP-mixin, but rather just a free function in misc that is called by the member function.
CRTP-mixins have the drawback that they are difficult to document and I would rather hide the implementation details from the user, the class is complicated enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a CRTP-mixin? I was aiming all along for a free function that takes the vallid_formats and returns a list of extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to do a CRTP-mixin, but rather just a free function in misc that is called by the member function.
The interface still needs to be repeated in each file and still there can go a lot of things wrong (e.g. diverging documentation and probably other subtle errors). And even if the interface is very simple not much can go wrong it suggests a wrong example that could be copied in other places for more complex problems.
CRTP-mixins have the drawback that they are difficult to document and I would rather hide the implementation details from the user, the class is complicated enough.
I think we don't need a full CRTP-mixin. Something that just get's the valid_formats type would be sufficient. I think this can be cleanly documented and it does not need to be detail stuff. Regarding further refactorings. We should discuss it, but given the huge amount of tasks and the few weeks we have left, I wouldn't go for something big now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not in favour of discussing this too long, I would propose the following free functions in misc
template <typename ... format_ts>
std::vector<std::string> valid_file_format_extensions()
{
// impl
}
inline std::vector<std::string> valid_file_compression_extensions()
{
// impl based on macros
#ifdef SEQAN3_HAS_GRZIP
}
And then just use the free functions in the argParse and display something like
--input A file with the input sequences. Valid format extensions are [fasta, fa, fastq, fq], possibly combined with [gz, bz2, bgzf].
e8f6873
to
b87f19e
Compare
I have now revised everything to use a global detail function and also added a compression_formats type to determine the compression. |
b87f19e
to
51ab9b9
Compare
51ab9b9
to
8d85f58
Compare
{ | ||
{"bgzf"} | ||
}; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are at it, why not move the magic header variable into the tag? I think extensions and magic header are similarly "connected" to the header.
Did you think of maybe making the extensions not const
so users can overwrite them, like we are doing for the formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I am not yet convinced of how that could benefit. Like I can add my own format and say this is also gzipped? Not sure that we want that for compressions. I would at least put that on ice if this is really ever gonna requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are at it, why not move the magic header variable into the tag? I think extensions and magic header are similarly "connected" to the header.
That's a good point. And it looks nicer.
std::tuple{zstd_compression{}}, | ||
#endif // SEQAN3_HAS_BZIP2 | ||
std::tuple{} | ||
)), type_list>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, so much hackery, just because trailing comma? This is causes a lot of template instantiation for something very simple... we really need our own meta-stuff 😥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I was really trying to get something working but in the end if you have this macro stuff it is not trivial to figure out if the previous macro was enabled or not. This allows me to specify any order and always get rid of the was there a comma or not issue.
Beautiful right 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ugly, but oh well. Can you add a note that this should be replaced once the proper mechanics are available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to give it 10 minutes more love and see if I make some meta programming interface for this. But for now I would not spend to much time on it.
include/seqan3/io/detail/misc.hpp
Outdated
*/ | ||
template <typename formats_t> | ||
//!\cond | ||
requires (all_formats_have_file_extensions(formats_t{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this? Aren't the formats already required by concept to have the extensions static member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a concept for the formats? I thought not but I'll have a second look or you point me there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, now I understand what you mean. Well, I thought since it is now a free function which could be potentially used otherwise it is good to check if the stuff added is what we want. But since the PR is quite old I did not checked this via a static assert. But I would keep it for this reasons but make it a static assert instead
template <IStream2 stream_type, | ||
SequenceFileInputFormat file_format> | ||
sequence_file_input(stream_type && stream, | ||
file_format const &) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not resolved
8d85f58
to
28a4de1
Compare
*/ | ||
template <typename file_t = void> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you make the class a template and not just a constructor? The state of the class does not depend on the template argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because a constructor cannot have an explicit template argument. Thus you would need to pass in an instance of a file. But since it is not default constructible you need to feed it with a stream or file. Yet, it is a weird circular dependency as you need to parse the arguments before you know the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, that these changes have no implication on the current version, due to type deduction guides. That means, you can provide the file type as a template argument or you call the constructors like before, without any template argument.
|
||
// Check extension. | ||
validate_filename(file); | ||
this->validate_filename(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because, the base class is now a template class. 😒
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can not make the base class a template and just the derived classes? That would simplify the inheritance, too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can do that but than have to copy the constructors for the output file. This kind of diminish the use of a base class to DRY some code. Just because I need to use this to call the correct functions?
std::tuple{zstd_compression{}}, | ||
#endif // SEQAN3_HAS_BZIP2 | ||
std::tuple{} | ||
)), type_list>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ugly, but oh well. Can you add a note that this should be replaced once the proper mechanics are available?
28a4de1
to
661007e
Compare
|
||
// Check extension. | ||
validate_filename(file); | ||
this->validate_filename(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can not make the base class a template and just the derived classes? That would simplify the inheritance, too...
template <typename _file_t = file_t> | ||
requires std::Same<_file_t, void> | ||
//!\endcond | ||
file_validator_base() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just write requires std::Same<file_t, void>
after the signature, no need to make it a function template.
|
||
//!\cond | ||
template <typename _file_t = file_t> | ||
requires requires { typename _file_t::valid_formats; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enforce this constraint via static_assert in the top of the struct
requires requires { typename _file_t::valid_formats; } | ||
|
||
file_validator_base() : extensions{detail::valid_file_extensions<typename file_t::valid_formats>()} | ||
{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe just make a single default constructor with if constexpr
, that's much easier to read.
//!\cond | ||
template <typename _file_t = file_t> | ||
requires std::Same<_file_t, void> | ||
//!\endcond | ||
explicit file_validator_base(std::vector<std::string> extensions) : extensions{std::move(extensions)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, put requires after the signature
seqan3::input_file_validator validator2{std::vector{std::string{"exe"}, std::string{"fasta"}}}; | ||
seqan3::debug_stream << validator2.get_help_page_message() << "\n"; | ||
|
||
// Give as a template argument the seqan3 file type to get all valid extensions for this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Give as a template argument the seqan3 file type to get all valid extensions for this file. | |
// Give the seqan3 file type as a template argument to get all valid extensions for this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in other places.
|
||
int main(int argc, const char ** argv) | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove newline
EXPECT_EQ(validator.get_help_page_message(), "Valid output file formats: [fa, fasta, sam, bam]"); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double newline
…extensions from a list of formats.
d27b096
to
7e97653
Compare
@h-2 I think I addressed all issues now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor 💅
otherwise LGTM
BUT: can you add notes to the CHANGELOG? Making the type a template is in fact API breakage, even if it's not visible in all contexts. Also moving the extensions into the type (unless that's detail). The feature is also something noteworthy, I think.
@@ -13,83 +13,95 @@ | |||
#pragma once | |||
|
|||
#include <array> | |||
#include <tuple> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still there 💅
Oh yes, good point with the changelog. 👍 Sent with GitHawk |
7e97653
to
1d0d3a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may sound like 💅 but I would prefer to keep the current structure of the CHANGELOG...
CHANGELOG.md
Outdated
@@ -25,9 +25,15 @@ If possible, provide tooling that performs the changes, e.g. a shell-script. | |||
|
|||
## API changes | |||
|
|||
### The `type_list` header has moved | |||
### Argument parser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to group things by module? I don't mind, but we should then do it for features, bugs and API changes, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets very nested then altogether. Also, some changes are not module specific, like renaming of Concepts that is going to happen 🤔
Maybe we can just mention the module in the name of the entry and sort by module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the information should be grouped. The most natural grouping would be by module. Then people can review by module whether or not the changes affect their code.
CHANGELOG.md
Outdated
|
||
If you included `<seqan3/core/type_list.hpp>` you need to change the path to `<seqan3/core/type_list/type_list.hpp>`. | ||
* The signature of input file and output file validator have been modified to template classes. | ||
It is now possible to get the list of file extensions from the type of a seqan input file, respectively output file. Old code will likely not be affected unless the type was used explicitly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mixing up the description of a feature and an API Change. Also Old code will likely not be affected
does not help people in actually fixing it, if it is affected. I think the API Changes section should provide at most minimal context and specify exactly what to do to make code compile.
I would propose to write:
## New features
### Argument Parser: file validators accept files as template arguments
Passing a file as template argument to a file validator, e.g. `seqan3::input_file_validator<seqan3::sequence_file_input<>> validator{}` will automatically set all supported extensions.
[...]
## API Changes
### Argument Parser: file validators are now class templates
Most code should be unaffected; places that are affected need to explicitly provide empty template arguments, e.g. `seqan3::input_file_validator<>` instead of `seqan3::input_file_validator`.
### Core | ||
|
||
* The `type_list` header has moved | ||
If you included `<seqan3/core/type_list.hpp>` you need to change the path to `<seqan3/core/type_list/type_list.hpp>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be joined into one line with the previous line in our doxygen documentation. I would prefer to keep the ###
per item so that we have enough space to describe a feature or API workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But no one is looking at our changelog through doxygen. I propose highlighting the begin of the point with bold and adding a colon. This will be rendered properly in markdown and doxygen too.
1d0d3a4
to
06eba98
Compare
06eba98
to
c9e8995
Compare
@h-2 ping |
fixes #892