Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintaining list of sites to exclude if mutation-annotated tree is updated #19

Open
jbloom opened this issue Feb 2, 2023 · 0 comments

Comments

@jbloom
Copy link
Member

jbloom commented Feb 2, 2023

An important part of the pipeline is excluding some mutations that are masked in the pre-built UShER mutation-annotated tree or otherwise problematic.

The plot that informs manual identification of mutations (specified in config.yaml) to exclude is buried in the synonymous_mut_rates notebook in pipeline, and then mutations to exclude are manually specified.

In addition, Angie sometimes masks sites in a clade-specific fashion when she builds the mutation-annotated tree. She does these exclusions in a bash script, which I manually converted to a machine readable YAML at some point (https://github.com/jbloomlab/SARS2-mut-fitness/blob/main/data/usher_masked_sites.yaml).

But I am not sure if I thought to re-check to see if she had added more mutations to her bash script when I updated the mutation-annotated tree.

So if we want to re-run the pipeline regularly on new mutation-annotated trees, we definitely need to automate this. However, that might be a lot of effort.

Assuming we don't automate, somehow when data is updated we need to check if there are new masked or excluded sites. Better instructions on how to do this would probably help.

(This issue recaps issues mentioned by @rneher on our internal Slack when he flagged site 18591 as unusually "mutated" in recent strains.)

See also following related issues on UShER:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant