Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add left and right arguments to TreeSequence.trees() #24

Open
jeromekelleher opened this issue Mar 19, 2018 · 6 comments
Open

Add left and right arguments to TreeSequence.trees() #24

jeromekelleher opened this issue Mar 19, 2018 · 6 comments
Labels
C API Issue is about the C API enhancement New feature or request Python API Issue is about the Python API
Milestone

Comments

@jeromekelleher
Copy link
Member

These should be seen as analogous to the start and stop arguments to Python's builtin range, and be specified in tree sequence coordinates. We should initially ensure that start < stop, but in future we could allow start > stop, and enable reverse iteration in this way.

It would be best to do this with a low-level sparse_tree_seek() method to position the tree at the correct location.

@petrelharp petrelharp transferred this issue from tskit-dev/msprime Jan 10, 2019
@jeromekelleher
Copy link
Member Author

After discussion on #121, I think the best approach here is to add start and stop arguments in sequence coordinates (as above). The trees in the returned sequence would guarantee that the first tree has left == start and the last tree would have right == stop. To do this, we'd need to 'split' the underlying trees so that we guarantee this property (that is, the end points of the trees we return don't necessarily align with the underlying edges). The reason for doing this is that if we're looking for the trees in a particular genomic interval, we almost certainly only want sites also within that interval. If we don't split the trees, we'll have sites arbitrarily far away which would have to be filtered manually, making client code much uglier and error prone.

This isn't hard to do, but will require implementation at the C level.

@hyanwong
Copy link
Member

Copied from #121:

Note that some of the docs use the words [left,right) rather than [start, stop), e.g. https://tskit.readthedocs.io/en/latest/python-api.html#tskit.TreeSequence.edge_diffs . "left" and "right" have the advantage that it's clearer (to me anyway) that we are talking about a length of genome and not e.g. tree indexes, or node times. But it has the disadvantage that it seems weird to specify e.g. (left=20.5, right=10.5) in order to get an iterator that goes right to left. And there's definitely the advantage that start and stop match the range operator.

@jeromekelleher
Copy link
Member Author

Thinking about how to implement this, I think the right way to do it is to add a tsk_tree_set_interval(left, right) method which can shrink the interval for a tree. This would mainly involve updating the list of sites. The tsk_tree_advance method would need to be updated to take this into account, so that if we call next() on a tree that has had the right coordinate moved leftwards, the next tree state is the remainder of the original tree.

The rest of the functionality can be built on this then.

@benjeffery benjeffery added enhancement New feature or request Python API Issue is about the Python API C API Issue is about the C API labels Sep 29, 2020
@jeromekelleher
Copy link
Member Author

Adding this to the 1.0 milestone for now as something to review and plan ahead for.

@jeromekelleher jeromekelleher added this to the C API 1.0.0 milestone Jul 23, 2021
@hyanwong
Copy link
Member

hyanwong commented May 2, 2023

To match the variants() iterator, we should use left and right here.

@jeromekelleher
Copy link
Member Author

Also we can implement this quiet efficiently since #2661 was merged

@jeromekelleher jeromekelleher changed the title Add start and stop arguments to TreeSequence.trees() Add left and right arguments to TreeSequence.trees() May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C API Issue is about the C API enhancement New feature or request Python API Issue is about the Python API
Projects
None yet
Development

No branches or pull requests

3 participants