Skip to content

BiocPy/IRanges

Repository files navigation

Project generated with PyScaffold PyPI-Server Unit tests

Integer ranges in Python

Python implementation of the IRanges Bioconductor package.

To get started, install the package from PyPI

pip install iranges

# To install optional dependencies
pip install iranges[optional]

IRanges

An IRanges holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values.

Note: Ends are inclusive.

from iranges import IRanges

starts = [1, 2, 3, 4]
widths = [4, 5, 6, 7]
x = IRanges(starts, widths)

print(x)
 ## output
 IRanges object with 4 ranges and 0 metadata columns
                start              end            width
      <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]                1                4                4
 [1]                2                6                5
 [2]                3                9                6
 [3]                4               10                7

Interval Operations

IRanges supports most interval based operations. For example to compute gaps

x = IRanges([-2, 6, 9, -4, 1, 0, -6, 10], [5, 0, 6, 1, 4, 3, 2, 3])

gaps = x.gaps()
print(gaps)
 ## output
 IRanges object with 2 ranges and 0 metadata columns
                start              end            width
      <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]               -3               -3                1
 [1]                5                8                4

Or Perform interval set operations

x = IRanges([1, 5, -2, 0, 14], [10, 5, 6, 12, 4])
y = IRanges([14, 0, -5, 6, 18], [7, 3, 8, 3, 3])

intersection = x.intersect(y)
print(intersection)
 ## output
 IRanges object with 3 ranges and 0 metadata columns
                start              end            width
      <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]               -2                2                5
 [1]                6                9                3
 [2]               14               17                4

Overlap operations

IRanges uses nested containment lists under the hood to perform fast overlap and search based operations. These methods typically return a hits-like BiocFrame.

subject = IRanges([2, 2, 10], [1, 2, 3])
query = IRanges([1, 4, 9], [5, 4, 2])

overlap = subject.find_overlaps(query)
print(overlap)
 ## output
 BiocFrame with 3 rows and 2 columns
           self_hits       query_hits
      <ndarray[int64]> <ndarray[int64]>
 [0]                1                0
 [1]                0                0
 [2]                2                2

Similarly one can perform search operations like follow, precede or nearest.

query = IRanges([1, 3, 9], [2, 5, 2])
subject = IRanges([3, 5, 12], [1, 2, 1])

nearest = subject.nearest(query, select="all")
print(nearest)
 ## output
 BiocFrame with 4 rows and 2 columns
           query_hits        self_hits
      <ndarray[int64]> <ndarray[int64]>
 [0]                0                0
 [1]                0                1
 [2]                1                1
 [3]                2                2

Further Information

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.