-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest: Nextclade merge updates #67
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Preserving the line-breaks makes the command much more readable in Snakemake output¹, which is important since I'm changing this rule right now. The \n previously interpreted by Python is now interpreted by `tr`, which is preferable. ¹ <https://docs.nextstrain.org/en/latest/reference/snakemake-style-guide.html#use-triple-quoted-command-definitions> Ported-from: <nextstrain/measles@762acdb> Related-to: <nextstrain/measles#52> Related-to: <#65>
This construction reads much clearer and cleaner. Moves the Nextclade field map directly and more conveniently into the YAML config instead of referencing a separate TSV file. Putting the field map into a separate file seemed to be only for the sake of the --kv-file (-k) interface provided by `cvstk rename2`, which we're no longer using here. Backwards compatibility with configs that name a TSV file is not preserved since this pathogen-repo-guide is expected to be used to stamp out new repos, and we don't have any particular process/plan for how to update previously stamped out repos. Note that `augur curate` commands currently emit CSV-like TSVs that are limited to be IANA-like¹ such that parsing them with tsv-utils is most appropriate, hence the switch from `csvtk cut` to `tsv-select`. ¹ See <nextstrain/augur#1566>. Ported-from: <nextstrain/measles@faebd64> Related-to: <nextstrain/measles#52> Related-to: <#65>
This construction reads a bit clearer and cleaner. It's also a good example of how to use `augur merge`. The limitation on non-seekable streams means the workflow now uses additional transient disk space, but it typically shouldn't be an issue. The way Augur's slow start up time impacts `augur merge` also contributes to a longer rule execution time, but it should be negligible in the context of the larger workflow and presumably we'll fix the slow start up eventually.¹ The output is semantically identical but has some syntactic changes re: quoting. It's worth noting that the pre-existing TSV format was _not_ IANA TSV, despite it (still) being treated as such in a few places, but was (and remains) a CSV-like TSV with some quoted fields. We really need to sort out our TSV formats³, but that's for a larger project. ¹ <nextstrain/augur#1628> ² <nextstrain/augur#1565> ³ <nextstrain/augur#1566> Ported-from: <nextstrain/measles@4d73b7f> Related-to: <nextstrain/measles#52> Related-to: <#65>
genehack
approved these changes
Oct 3, 2024
joverlee521
approved these changes
Oct 3, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! So much simpler!
I was thinking I might need to update the related tutorial, but looks like section on Nextclade as part of ingest did not dive into the specifics of the config. So no need for updates there!
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See commits.
Resolves #65 and #55.
Checklist