Use ancestral_to
field on segments instead of global lookup table.
#1993
Labels
ancestral_to
field on segments instead of global lookup table.
#1993
A fundamental part of the work in coalescent simulation is to keep track of the amount of ancestral material remaining for each interval along the genome. In msprime we currently do this using an AVL tree (S) (see here), and we update this structure during common ancestor events (e.g., here).
It turns out this is unnecessary and we can achieve the same thing by adding an
ancestral_to
field to each segment, which counts the number samples it is ancestral to. Then, when we merge segments at coalescence, we simply add theancestral_to
values.See this code for how the simulation works when we track common ancestry by segment.
This should simplify the main interval overlap routines in msprime, and also result in a significant performance boost. I think a substantial fraction of the current simulation time (say, 20% for a long genome and large sample size) is taken up with AVL tree operations to maintain S.
The text was updated successfully, but these errors were encountered: