Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ancestral_to field on segments instead of global lookup table. #1993

Closed
jeromekelleher opened this issue Jan 24, 2022 · 3 comments
Closed

Comments

@jeromekelleher
Copy link
Member

A fundamental part of the work in coalescent simulation is to keep track of the amount of ancestral material remaining for each interval along the genome. In msprime we currently do this using an AVL tree (S) (see here), and we update this structure during common ancestor events (e.g., here).

It turns out this is unnecessary and we can achieve the same thing by adding an ancestral_to field to each segment, which counts the number samples it is ancestral to. Then, when we merge segments at coalescence, we simply add the ancestral_to values.

See this code for how the simulation works when we track common ancestry by segment.

This should simplify the main interval overlap routines in msprime, and also result in a significant performance boost. I think a substantial fraction of the current simulation time (say, 20% for a long genome and large sample size) is taken up with AVL tree operations to maintain S.

@jeromekelleher
Copy link
Member Author

I think you demonstrated that this is actually detrimental to performance @GertjanBisschop - can you comment please?

@GertjanBisschop
Copy link
Member

Yep, that is true. The main reason was that a lot of segments cannot be squashed when keeping track of a single ancestral_to value per segment, as squashing can only happen for segments with identical ancestral_to values.

@jeromekelleher
Copy link
Member Author

Closing this as "this was a bad idea" then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants