Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What mutation rate to use for dating when there are multiple hits #6

Open
hyanwong opened this issue Nov 8, 2020 · 1 comment
Open

Comments

@hyanwong
Copy link
Collaborator

hyanwong commented Nov 8, 2020

As we increase the mismatch rates, we increase the number of mutations compared to edges in the TS. When dating, all other things being equal, this will appear to make most node times older. We suspect that the human mutation rate we use, 1e-8, is calculated on the basis of infinite sites assumptions. We suspect that this might be pushing the OOA peak in our inferred+dated tree sequences too early. So there are a few ways we could see if this makes a difference:

  1. Simulate the OOA model with error, and test some different mismatch rates. I have a large number of OOA inferred CSs on cycloid. We can just run tsdate on those.

  2. Tsdate the TGP tree sequence that was produced without mismatch, and plot the Afr/Afr vs Non-afr/Non-afr tMRCA histograms to see if they have the same pattern as we see in the merged data, but shifted in time (we don't need to do this for all individuals, just one or two)

  3. Remove the non IS sites and reduce the equivalent mutation rate, and run tsdate again on just those sites, again plotting the same histograms. To find the IS sites to keep, we could either remove sample mutations first (leaving 92% IS sites), or if we are worried that this will bias the estimates, use only the 40% of sites that have single mutations. Either way, we will need to decrease the mutation rate in tsdate by multiplying by 0.92 or 0.4 respectively.

@hyanwong
Copy link
Collaborator Author

hyanwong commented Nov 8, 2020

Here's a basic answer to 1. using the data from OOA simulated trees inferred with sequencing & ancestral state error. It looks like we underestimate (not overestimate) the OOA even in general, possibly because of not accounting for the demography? Tsdate was run with mutation_rate=1.29e-08, Ne=10000 (the mutation rate is that used in the OOA simulations).

It looks like the extra mutations don't make much of a difference to the OOA peak, increasing it from 2000 generations to ~ 3500 generations (it's actually at 5600 generations in the model: where the red line is). Interesting that whatever the mismatch rate, we put a high peak at 15000 generations in the CEU/CEU plot, whereas it's actually much lower in the original data, about the same height as the OOA peak.

Original tree sequence:
image

No recurrent mutations:
image

Some recurrent mutations (what we have now)
image

Loads of recurrent mutations
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant