Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsampler and augur workflow #483

Merged
merged 52 commits into from
Sep 26, 2023
Merged

Subsampler and augur workflow #483

merged 52 commits into from
Sep 26, 2023

Conversation

dpark01
Copy link
Member

@dpark01 dpark01 commented Sep 8, 2023

This adds a WDL task for the subsampler package, which subsamples genomic data proportional to externally provided epidemiological case counts. This exposes it in a new standalone WDL workflow called subsample_by_casecounts and adds a new workflow augur_from_msa_with_subsampler which glues the case-count-based subsampling step to the standard augur / nextstrain pipeline.

The new task deconstructs subsampler's provided Snakemake pipeline and calls the four scripts manually, in part because the Snakefile doesn't have a good way of passing input parameters at runtime, doesn't seem to honor all of those input parameters within its own rules, forced mandatory parameters that were actually optional, and had a few other bugs to sort out.

This has been tested internally in a few runs on Terra and succeeds at reducing the entirety of the ncov/open dataset to a manageable size for downstream augur steps.

@schaluva schaluva marked this pull request as ready for review September 14, 2023 15:00
@dpark01 dpark01 self-assigned this Sep 21, 2023
@dpark01 dpark01 merged commit 978f10f into master Sep 26, 2023
12 checks passed
@dpark01 dpark01 deleted the subsampler-dp branch September 26, 2023 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants