Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/seq file expose valid formats #863

Merged
merged 6 commits into from
Aug 19, 2019

Conversation

rrahn
Copy link
Contributor

@rrahn rrahn commented Mar 29, 2019

fixes #892

@rrahn rrahn requested a review from smehringer March 29, 2019 21:37
@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #863 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #863      +/-   ##
==========================================
+ Coverage   96.86%   96.86%   +<.01%     
==========================================
  Files         211      212       +1     
  Lines        8249     8274      +25     
==========================================
+ Hits         7990     8015      +25     
  Misses        259      259
Impacted Files Coverage Δ
include/seqan3/io/sequence_file/output.hpp 100% <ø> (ø) ⬆️
include/seqan3/io/sequence_file/input.hpp 100% <ø> (ø) ⬆️
include/seqan3/io/detail/misc_input.hpp 96.22% <100%> (+0.48%) ⬆️
include/seqan3/argument_parser/validators.hpp 88.41% <100%> (+0.36%) ⬆️
include/seqan3/io/detail/misc.hpp 100% <100%> (ø) ⬆️
include/seqan3/io/detail/magic_header.hpp 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9034c3b...c9e8995. Read the comment docs.

Copy link
Member

@smehringer smehringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add cards for structure file and alignment file to also add extesions?

include/seqan3/io/sequence_file/input.hpp Outdated Show resolved Hide resolved
include/seqan3/io/sequence_file/input.hpp Outdated Show resolved Hide resolved
include/seqan3/io/sequence_file/input.hpp Outdated Show resolved Hide resolved
include/seqan3/io/sequence_file/input.hpp Outdated Show resolved Hide resolved
template <IStream2 stream_type,
SequenceFileInputFormat file_format>
sequence_file_input(stream_type && stream,
file_format const &)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_format const &)
file_format const &)

💅 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not resolved

test/unit/io/sequence_file/sequence_file_input_test.cpp Outdated Show resolved Hide resolved
test/unit/io/sequence_file/sequence_file_output_test.cpp Outdated Show resolved Hide resolved
test/unit/io/sequence_file/sequence_file_output_test.cpp Outdated Show resolved Hide resolved
test/unit/io/sequence_file/sequence_file_input_test.cpp Outdated Show resolved Hide resolved
@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch 2 times, most recently from 41de320 to bf8e1e0 Compare March 30, 2019 13:47
@rrahn rrahn requested a review from smehringer March 30, 2019 13:48
@h-2
Copy link
Member

h-2 commented Apr 5, 2019

What's the status here?

@smehringer
Copy link
Member

my request is still not resolved ☝️

@rrahn
Copy link
Contributor Author

rrahn commented Apr 9, 2019

🦏 ...

@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from bf8e1e0 to e8f6873 Compare April 9, 2019 11:50
@rrahn rrahn requested a review from smehringer April 9, 2019 11:50
@rrahn
Copy link
Contributor Author

rrahn commented Apr 9, 2019

Somehow I have overwritten the submitted changes from your review and by that undo the changes. Please check if now everything is as expected.

Copy link
Member

@smehringer smehringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry one more thing. Otherwise LGTM now

//!\returns std::vector over std::string with all valid file extensions specified by `valid_formats`.
static std::vector<std::string> valid_file_extensions()
{
std::vector<std::string> extensions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need this code in each of the files (6 in total currently) so to reduce code it would be great if this can be outsourced into a function in io/detail/misc 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already proposed a small mixin class that adds this feature. But @h-2 was proposing to keep it simple. I am in general in favour for DRYing this out and don't see any points against it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think we should sit down and think about what else we can DRY about the files and do it in one step altogether; but if you really want to factor it out now, I would prefer not to do a CRTP-mixin, but rather just a free function in misc that is called by the member function.
CRTP-mixins have the drawback that they are difficult to document and I would rather hide the implementation details from the user, the class is complicated enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a CRTP-mixin? I was aiming all along for a free function that takes the vallid_formats and returns a list of extensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to do a CRTP-mixin, but rather just a free function in misc that is called by the member function.

The interface still needs to be repeated in each file and still there can go a lot of things wrong (e.g. diverging documentation and probably other subtle errors). And even if the interface is very simple not much can go wrong it suggests a wrong example that could be copied in other places for more complex problems.

CRTP-mixins have the drawback that they are difficult to document and I would rather hide the implementation details from the user, the class is complicated enough.

I think we don't need a full CRTP-mixin. Something that just get's the valid_formats type would be sufficient. I think this can be cleanly documented and it does not need to be detail stuff. Regarding further refactorings. We should discuss it, but given the huge amount of tasks and the few weeks we have left, I wouldn't go for something big now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not in favour of discussing this too long, I would propose the following free functions in misc

template <typename ... format_ts>
std::vector<std::string> valid_file_format_extensions()
{
    // impl
}

inline std::vector<std::string> valid_file_compression_extensions()
{
   // impl based on macros
#ifdef SEQAN3_HAS_GRZIP 
}

And then just use the free functions in the argParse and display something like

--input   A file with the input sequences. Valid format extensions are [fasta, fa, fastq, fq], possibly combined with [gz, bz2, bgzf].

@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from e8f6873 to b87f19e Compare May 21, 2019 12:50
@rrahn rrahn requested a review from smehringer May 21, 2019 12:51
@rrahn
Copy link
Contributor Author

rrahn commented May 21, 2019

I have now revised everything to use a global detail function and also added a compression_formats type to determine the compression.

@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from b87f19e to 51ab9b9 Compare May 21, 2019 20:04
@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from 51ab9b9 to 8d85f58 Compare July 11, 2019 15:52
@rrahn rrahn requested a review from h-2 July 11, 2019 15:54
{
{"bgzf"}
};
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at it, why not move the magic header variable into the tag? I think extensions and magic header are similarly "connected" to the header.

Did you think of maybe making the extensions not const so users can overwrite them, like we are doing for the formats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I am not yet convinced of how that could benefit. Like I can add my own format and say this is also gzipped? Not sure that we want that for compressions. I would at least put that on ice if this is really ever gonna requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at it, why not move the magic header variable into the tag? I think extensions and magic header are similarly "connected" to the header.

That's a good point. And it looks nicer.

include/seqan3/argument_parser/validators.hpp Show resolved Hide resolved
std::tuple{zstd_compression{}},
#endif // SEQAN3_HAS_BZIP2
std::tuple{}
)), type_list>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, so much hackery, just because trailing comma? This is causes a lot of template instantiation for something very simple... we really need our own meta-stuff 😥

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was really trying to get something working but in the end if you have this macro stuff it is not trivial to figure out if the previous macro was enabled or not. This allows me to specify any order and always get rid of the was there a comma or not issue.

Beautiful right 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ugly, but oh well. Can you add a note that this should be replaced once the proper mechanics are available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to give it 10 minutes more love and see if I make some meta programming interface for this. But for now I would not spend to much time on it.

*/
template <typename formats_t>
//!\cond
requires (all_formats_have_file_extensions(formats_t{}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this? Aren't the formats already required by concept to have the extensions static member?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a concept for the formats? I thought not but I'll have a second look or you point me there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, now I understand what you mean. Well, I thought since it is now a free function which could be potentially used otherwise it is good to check if the stuff added is what we want. But since the PR is quite old I did not checked this via a static assert. But I would keep it for this reasons but make it a static assert instead

template <IStream2 stream_type,
SequenceFileInputFormat file_format>
sequence_file_input(stream_type && stream,
file_format const &)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not resolved

include/seqan3/io/sequence_file/input.hpp Show resolved Hide resolved
@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from 8d85f58 to 28a4de1 Compare July 18, 2019 07:53
@rrahn rrahn requested a review from h-2 July 18, 2019 07:53
*/
template <typename file_t = void>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you make the class a template and not just a constructor? The state of the class does not depend on the template argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because a constructor cannot have an explicit template argument. Thus you would need to pass in an instance of a file. But since it is not default constructible you need to feed it with a stream or file. Yet, it is a weird circular dependency as you need to parse the arguments before you know the filename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, that these changes have no implication on the current version, due to type deduction guides. That means, you can provide the file type as a template argument or you call the constructors like before, without any template argument.


// Check extension.
validate_filename(file);
this->validate_filename(file);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because, the base class is now a template class. 😒

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can not make the base class a template and just the derived classes? That would simplify the inheritance, too...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can do that but than have to copy the constructors for the output file. This kind of diminish the use of a base class to DRY some code. Just because I need to use this to call the correct functions?

include/seqan3/io/detail/magic_header.hpp Outdated Show resolved Hide resolved
include/seqan3/io/detail/magic_header.hpp Outdated Show resolved Hide resolved
include/seqan3/io/detail/magic_header.hpp Show resolved Hide resolved
std::tuple{zstd_compression{}},
#endif // SEQAN3_HAS_BZIP2
std::tuple{}
)), type_list>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ugly, but oh well. Can you add a note that this should be replaced once the proper mechanics are available?

include/seqan3/io/detail/misc.hpp Show resolved Hide resolved
include/seqan3/io/detail/misc_input.hpp Show resolved Hide resolved
include/seqan3/io/detail/misc_input.hpp Outdated Show resolved Hide resolved
include/seqan3/io/detail/misc_input.hpp Outdated Show resolved Hide resolved
@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from 28a4de1 to 661007e Compare August 14, 2019 15:12
@rrahn rrahn requested a review from h-2 August 14, 2019 15:16

// Check extension.
validate_filename(file);
this->validate_filename(file);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can not make the base class a template and just the derived classes? That would simplify the inheritance, too...

template <typename _file_t = file_t>
requires std::Same<_file_t, void>
//!\endcond
file_validator_base()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just write requires std::Same<file_t, void> after the signature, no need to make it a function template.


//!\cond
template <typename _file_t = file_t>
requires requires { typename _file_t::valid_formats; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforce this constraint via static_assert in the top of the struct

requires requires { typename _file_t::valid_formats; }

file_validator_base() : extensions{detail::valid_file_extensions<typename file_t::valid_formats>()}
{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe just make a single default constructor with if constexpr, that's much easier to read.

//!\cond
template <typename _file_t = file_t>
requires std::Same<_file_t, void>
//!\endcond
explicit file_validator_base(std::vector<std::string> extensions) : extensions{std::move(extensions)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, put requires after the signature

include/seqan3/io/detail/magic_header.hpp Show resolved Hide resolved
include/seqan3/io/detail/misc.hpp Show resolved Hide resolved
seqan3::input_file_validator validator2{std::vector{std::string{"exe"}, std::string{"fasta"}}};
seqan3::debug_stream << validator2.get_help_page_message() << "\n";

// Give as a template argument the seqan3 file type to get all valid extensions for this file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Give as a template argument the seqan3 file type to get all valid extensions for this file.
// Give the seqan3 file type as a template argument to get all valid extensions for this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in other places.


int main(int argc, const char ** argv)
{

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove newline

EXPECT_EQ(validator.get_help_page_message(), "Valid output file formats: [fa, fasta, sam, bam]");
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double newline

@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch 2 times, most recently from d27b096 to 7e97653 Compare August 15, 2019 17:01
@rrahn rrahn requested a review from h-2 August 15, 2019 17:01
@rrahn
Copy link
Contributor Author

rrahn commented Aug 15, 2019

@h-2 I think I addressed all issues now.

Copy link
Member

@h-2 h-2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor 💅

otherwise LGTM

BUT: can you add notes to the CHANGELOG? Making the type a template is in fact API breakage, even if it's not visible in all contexts. Also moving the extensions into the type (unless that's detail). The feature is also something noteworthy, I think.

@@ -13,83 +13,95 @@
#pragma once

#include <array>
#include <tuple>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still there 💅

@rrahn
Copy link
Contributor Author

rrahn commented Aug 15, 2019

Oh yes, good point with the changelog. 👍

Sent with GitHawk

@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from 7e97653 to 1d0d3a4 Compare August 16, 2019 07:10
@rrahn rrahn requested a review from h-2 August 16, 2019 10:09
Copy link
Member

@h-2 h-2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may sound like 💅 but I would prefer to keep the current structure of the CHANGELOG...

CHANGELOG.md Outdated
@@ -25,9 +25,15 @@ If possible, provide tooling that performs the changes, e.g. a shell-script.

## API changes

### The `type_list` header has moved
### Argument parser
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to group things by module? I don't mind, but we should then do it for features, bugs and API changes, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets very nested then altogether. Also, some changes are not module specific, like renaming of Concepts that is going to happen 🤔

Maybe we can just mention the module in the name of the entry and sort by module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the information should be grouped. The most natural grouping would be by module. Then people can review by module whether or not the changes affect their code.

CHANGELOG.md Outdated

If you included `<seqan3/core/type_list.hpp>` you need to change the path to `<seqan3/core/type_list/type_list.hpp>`.
* The signature of input file and output file validator have been modified to template classes.
It is now possible to get the list of file extensions from the type of a seqan input file, respectively output file. Old code will likely not be affected unless the type was used explicitly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mixing up the description of a feature and an API Change. Also Old code will likely not be affected does not help people in actually fixing it, if it is affected. I think the API Changes section should provide at most minimal context and specify exactly what to do to make code compile.

I would propose to write:

## New features

### Argument Parser: file validators accept files as template arguments

Passing a file as template argument to a file validator, e.g. `seqan3::input_file_validator<seqan3::sequence_file_input<>> validator{}` will automatically set all supported extensions.

[...]

## API Changes

### Argument Parser: file validators are now class templates

Most code should be unaffected; places that are affected need to explicitly provide empty template arguments, e.g. `seqan3::input_file_validator<>` instead of `seqan3::input_file_validator`.

### Core

* The `type_list` header has moved
If you included `<seqan3/core/type_list.hpp>` you need to change the path to `<seqan3/core/type_list/type_list.hpp>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be joined into one line with the previous line in our doxygen documentation. I would prefer to keep the ### per item so that we have enough space to describe a feature or API workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no one is looking at our changelog through doxygen. I propose highlighting the begin of the point with bold and adding a colon. This will be rendered properly in markdown and doxygen too.

@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from 1d0d3a4 to 06eba98 Compare August 19, 2019 07:41
@rrahn rrahn requested a review from h-2 August 19, 2019 07:41
@rrahn rrahn force-pushed the feature/seq_file_expose_valid_formats branch from 06eba98 to c9e8995 Compare August 19, 2019 12:58
@rrahn
Copy link
Contributor Author

rrahn commented Aug 19, 2019

@h-2 ping

@h-2 h-2 merged commit 5a0354a into seqan:master Aug 19, 2019
@rrahn rrahn deleted the feature/seq_file_expose_valid_formats branch September 3, 2019 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add valid_file_extensions function to alignment file and structured files. (see sequence_file)
3 participants