What mutation rate to use for dating when there are multiple hits #6

hyanwong · 2020-11-08T14:15:52Z

As we increase the mismatch rates, we increase the number of mutations compared to edges in the TS. When dating, all other things being equal, this will appear to make most node times older. We suspect that the human mutation rate we use, 1e-8, is calculated on the basis of infinite sites assumptions. We suspect that this might be pushing the OOA peak in our inferred+dated tree sequences too early. So there are a few ways we could see if this makes a difference:

Simulate the OOA model with error, and test some different mismatch rates. I have a large number of OOA inferred CSs on cycloid. We can just run tsdate on those.
Tsdate the TGP tree sequence that was produced without mismatch, and plot the Afr/Afr vs Non-afr/Non-afr tMRCA histograms to see if they have the same pattern as we see in the merged data, but shifted in time (we don't need to do this for all individuals, just one or two)
Remove the non IS sites and reduce the equivalent mutation rate, and run tsdate again on just those sites, again plotting the same histograms. To find the IS sites to keep, we could either remove sample mutations first (leaving 92% IS sites), or if we are worried that this will bias the estimates, use only the 40% of sites that have single mutations. Either way, we will need to decrease the mutation rate in tsdate by multiplying by 0.92 or 0.4 respectively.

hyanwong · 2020-11-08T15:54:54Z

Here's a basic answer to 1. using the data from OOA simulated trees inferred with sequencing & ancestral state error. It looks like we underestimate (not overestimate) the OOA even in general, possibly because of not accounting for the demography? Tsdate was run with mutation_rate=1.29e-08, Ne=10000 (the mutation rate is that used in the OOA simulations).

It looks like the extra mutations don't make much of a difference to the OOA peak, increasing it from 2000 generations to ~ 3500 generations (it's actually at 5600 generations in the model: where the red line is). Interesting that whatever the mismatch rate, we put a high peak at 15000 generations in the CEU/CEU plot, whereas it's actually much lower in the original data, about the same height as the OOA peak.

Original tree sequence:

No recurrent mutations:

Some recurrent mutations (what we have now)

Loads of recurrent mutations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What mutation rate to use for dating when there are multiple hits #6

What mutation rate to use for dating when there are multiple hits #6

hyanwong commented Nov 8, 2020 •

edited

Loading

hyanwong commented Nov 8, 2020 •

edited

Loading

What mutation rate to use for dating when there are multiple hits #6

What mutation rate to use for dating when there are multiple hits #6

Comments

hyanwong commented Nov 8, 2020 • edited Loading

hyanwong commented Nov 8, 2020 • edited Loading

hyanwong commented Nov 8, 2020 •

edited

Loading

hyanwong commented Nov 8, 2020 •

edited

Loading