-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Accept user-defined tags #3256
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
72f288d
to
88e8642
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good. I just have some general comments/questions to the general parsing.
Thank you!
{ | ||
consume_unsupported_tag_and_print_warning("HD", raw_tag); | ||
parse_and_append_unhandled_tag_to_string(hdr.user_tags, raw_tag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for me to understand the format here. We store all parsed line field tags tab separated within a string. Or are official field tags from the specification stored in a different format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the header stuff (SQ, HD, RG) have strutcs that store the members defined in the SAM spec. Any user defined tags are stored as string. For the alignment records, tags, including User-defined, are stored in the SAM tag dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but do I see it right that here we store the user defined field tags in a tab seperated string? If so, I feel like it would make more sense to put this into a vector of field tags instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be better. Would be nice to store a struct with the name and value.
Anyway, I would do it as a followup, because it breaks api for the header tags where we already store the user tags in a string (SQ
and RG
).
If we do a vector, we should also throw on invalid formatting (it should be TAG:VALUE
with some constraints on allowed characters) like samtools does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Would you mind tracking this on the issue board?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do when working on this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Only updating the documentation of the Options struct which has currently no purpose. Thank you!!!
88e8642
to
32112bd
Compare
697070f
to
d0eabcf
Compare
Documentation preview available at https://docs.seqan.de/preview/seqan/seqan3/3256 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3256 +/- ##
==========================================
- Coverage 98.14% 98.13% -0.01%
==========================================
Files 271 271
Lines 11955 11951 -4
Branches 104 104
==========================================
- Hits 11733 11728 -5
- Misses 222 223 +1 ☔ View full report in Codecov by Sentry. |
Resolves #3251