-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest: Updates for augur curate
and augur merge
#65
Comments
Since this repo is expected to be used to stamp out new repos, and we don't have any particular process/plan for how to update previously stamped out repos, I'm going to drop this bit of backwards compatibility that was in my measles implementation: diff --git a/ingest/rules/nextclade.smk b/ingest/rules/nextclade.smk
index a70d994..41a2f75 100644
--- a/ingest/rules/nextclade.smk
+++ b/ingest/rules/nextclade.smk
@@ -18,6 +18,8 @@ See Nextclade docs for more details on usage, inputs, and outputs if you would
like to customize the rules:
https://docs.nextstrain.org/projects/nextclade/page/user/nextclade-cli.html
"""
+import sys
+
DATASET_NAME = config["nextclade"]["dataset_name"]
@@ -61,6 +63,14 @@ rule run_nextclade:
"""
+if isinstance(config["nextclade"]["field_map"], str):
+ print(f"Converting config['nextclade']['field_map'] from TSV file ({config['nextclade']['field_map']}) to dictionary; "
+ f"consider putting the field map directly in the config file.", file=sys.stderr)
+
+ with open(config["nextclade"]["field_map"], "r") as f:
+ config["nextclade"]["field_map"] = dict(line.rstrip("\n").split("\t", 1) for line in f if not line.startswith("#"))
+
+
rule join_metadata_and_nextclade:
input:
nextclade="results/nextclade.tsv", |
tsibley
added a commit
that referenced
this issue
Oct 3, 2024
Preserving the line-breaks makes the command much more readable in Snakemake output¹, which is important since I'm changing this rule right now. The \n previously interpreted by Python is now interpreted by `tr`, which is preferable. ¹ <https://docs.nextstrain.org/en/latest/reference/snakemake-style-guide.html#use-triple-quoted-command-definitions> Ported-from: <nextstrain/measles@762acdb> Related-to: <nextstrain/measles#52> Related-to: <#65>
tsibley
added a commit
that referenced
this issue
Oct 3, 2024
This construction reads much clearer and cleaner. Moves the Nextclade field map directly and more conveniently into the YAML config instead of referencing a separate TSV file. Putting the field map into a separate file seemed to be only for the sake of the --kv-file (-k) interface provided by `cvstk rename2`, which we're no longer using here. Backwards compatibility with configs that name a TSV file is not preserved since this pathogen-repo-guide is expected to be used to stamp out new repos, and we don't have any particular process/plan for how to update previously stamped out repos. Note that `augur curate` commands currently emit CSV-like TSVs that are limited to be IANA-like¹ such that parsing them with tsv-utils is most appropriate, hence the switch from `csvtk cut` to `tsv-select`. ¹ See <nextstrain/augur#1566>. Ported-from: <nextstrain/measles@faebd64> Related-to: <nextstrain/measles#52> Related-to: <#65>
tsibley
added a commit
that referenced
this issue
Oct 3, 2024
This construction reads a bit clearer and cleaner. It's also a good example of how to use `augur merge`. The limitation on non-seekable streams means the workflow now uses additional transient disk space, but it typically shouldn't be an issue. The way Augur's slow start up time impacts `augur merge` also contributes to a longer rule execution time, but it should be negligible in the context of the larger workflow and presumably we'll fix the slow start up eventually.¹ The output is semantically identical but has some syntactic changes re: quoting. It's worth noting that the pre-existing TSV format was _not_ IANA TSV, despite it (still) being treated as such in a few places, but was (and remains) a CSV-like TSV with some quoted fields. We really need to sort out our TSV formats³, but that's for a larger project. ¹ <nextstrain/augur#1628> ² <nextstrain/augur#1565> ³ <nextstrain/augur#1566> Ported-from: <nextstrain/measles@4d73b7f> Related-to: <nextstrain/measles#52> Related-to: <#65>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Port changes from nextstrain/measles#52
The text was updated successfully, but these errors were encountered: