Merge branch 'master' into develop
CiaranOMara committed Jun 13, 2020
2 parents c10623f + 99632a3 commit d885bc9
Expand Up @@ -6,6 +6,16 @@ and this project adheres to [Semantic Versioning](

## [Unreleased]

## [2.0.3] - 2020-06-13

### Added
- Julia LTS Support
- Benchmarks

### Changed
- Documentation.
- Updated CI for General Repository.

## [2.0.2] - 2020-05-21

### Fixed
2 changes: 1 addition & 1 deletion Project.toml
name = "GenomicFeatures"
uuid = "899a7d2d-5c61-547b-bef9-6698a8d05446"
authors = ["Kenta Sato <[email protected]>", "Ben J. Ward <[email protected]>", "Ciarán O’Mara <[email protected]>"]
version = "2.0.2"
version = "2.0.3"

BioGenerics = "47718e42-2ac5-11e9-14af-e5595289c2ea"
Expand Up @@ -5,6 +5,8 @@ Pkg.instantiate()

using Documenter, GenomicFeatures

DocMeta.setdocmeta!(GenomicFeatures, :DocTestSetup, :(using GenomicFeatures); recursive=true)

format = Documenter.HTML(
edit_link = :commit
Expand Up @@ -8,7 +8,7 @@ Intervals in `GenomicFeatures` are consistent with ranges in Julia: *1-based and
When data is read from formats with different representations (i.e. 0-based and/or end-exclusive) they are always converted automatically.
Similarly when writing data, you should not have to reason about off-by-one errors due to format differences while using functionality provided in `GenomicFeatures`.

The `Interval` type is defined as
The [`Interval`](@ref Interval) type is defined as
struct Interval{T} <: IntervalTrees.AbstractInterval{Int64}
Expand All @@ -19,9 +19,9 @@ struct Interval{T} <: IntervalTrees.AbstractInterval{Int64}

The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the `Interval` object.
The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the [`Interval`](@ref Interval) object.
The `seqname` field holds the sequence name associated with the interval.
The `first` and `last` fields are the leftmost and rightmost positions of the interval, which can be accessed with `leftposition` and `rightposition` functions, respectively.
The `first` and `last` fields are the leftmost and rightmost positions of the interval, which can be accessed with [`leftposition`](@ref leftposition) and [`rightposition`](@ref rightposition) functions, respectively.

The `strand` field can take four kinds of values listed in the next table:

Expand All @@ -32,32 +32,31 @@ The `strand` field can take four kinds of values listed in the next table:
| `'-'` | `STRAND_NEG` | negative strand |
| `'.'` | `STRAND_BOTH` | non-strand-specific feature |

`Interval` is parameterized on metadata type, which lets it efficiently and precisely be specialized to represent intervals from a variety of formats.
[`Interval`](@ref Interval) is parameterized on metadata type, which lets it efficiently and precisely be specialized to represent intervals from a variety of formats.

The default strand and metadata value are `STRAND_BOTH` and `nothing`:
```jldoctest; setup = :(using GenomicFeatures)
julia> Interval("chr1", 10000, 20000)
sequence name: chr1
leftmost position: 10000
rightmost position: 20000
strand: .
metadata: nothing
julia> Interval("chr1", 10000, 20000, '+')
sequence name: chr1
leftmost position: 10000
rightmost position: 20000
strand: +
metadata: nothing

The following example shows all accessor functions for the five fields:
```jldoctest; setup = :(using GenomicFeatures)
julia> i = Interval("chr1", 10000, 20000, '+', "some annotation")
sequence name: chr1
leftmost position: 10000
rightmost position: 20000
Expand All @@ -78,18 +77,18 @@ STRAND_POS
julia> metadata(i)
"some annotation"

## Collections of Intervals

Collections of intervals are represented using the `IntervalCollection` type, which is a general purpose indexed container for intervals.
Collections of intervals are represented using the [`IntervalCollection`](@ref IntervalCollection) type, which is a general purpose indexed container for intervals.
It supports fast intersection operations as well as insertion, deletion, and sorted iteration.

Interval collections can be initialized by inserting elements one by one using `push!`.
Empty interval collections can be initialized, and intervals elements can be added to the collection one-by-one using `push!`.

using GenomicFeatures # hide
# The type parameter (Nothing here) indicates the interval metadata type.
col = IntervalCollection{Nothing}()
Expand All @@ -98,18 +97,32 @@ for i in 1:100:10000

Incrementally building an interval collection like this works, but `IntervalCollection` also has a bulk insertion constructor that is able to build the indexed data structure extremely efficiently from an array of intervals.
Incrementally building an interval collection like this works, but [`IntervalCollection`](@ref IntervalCollection) also has a bulk insertion constructor that is able to build the indexed data structure extremely efficiently from a sorted vector of intervals.

```jldoctest; setup = :(using GenomicFeatures), output = false
col = IntervalCollection([Interval("chr1", i, i + 99) for i in 1:100:10000])
# output
IntervalCollection{Nothing} with 100 intervals:
chr1:1-100 . nothing
chr1:101-200 . nothing
chr1:201-300 . nothing
chr1:301-400 . nothing
chr1:401-500 . nothing
chr1:501-600 . nothing
chr1:601-700 . nothing
chr1:701-800 . nothing

Building `IntervalCollections` in one shot like this should be preferred when it's convenient or speed is an issue.
Building [`IntervalCollection`](@ref IntervalCollection)s in one shot like this should be preferred when it's convenient or speed is an issue.

## Overlap Query

There are number of `eachoverlap` functions in the `GenomicFeatures` module.
There are number of [`eachoverlap`](@ref eachoverlap) functions in the `GenomicFeatures` module.
They follow two patterns:
- interval versus collection queries which return an iterator over intervals in the collection that overlap the query, and
- collection versus collection queries which iterate over all pairs of overlapping intervals.
Expand All @@ -118,7 +131,7 @@ They follow two patterns:

The order of interval pairs is the same as the following nested loop but `eachoverlap` is often much faster:
The order of interval pairs is the same as the following nested loop but [`eachoverlap`](@ref eachoverlap) is often much faster:
for a in intervals_a, b in intervals_b
if isoverlapping(a, b)
Expand Down
This function would return a new set of disjoint intervals with annotated coverage like:
# Example
julia> intervals = [
Interval("chr1", 1, 8),
Interval("chr1", 4, 20),
Interval("chr1", 14, 27)];
julia> coverage(intervals)
IntervalCollection{UInt32} with 5 intervals:
chr1:1-3 . 1
chr1:4-8 . 2
chr1:9-13 . 1
chr1:14-20 . 2
chr1:21-27 . 1
function coverage(stream, seqname_isless::Function=isless)
cov = IntervalCollection{UInt32}()
Expand Up @@ -7,7 +7,18 @@
# License is MIT:

# Note, just to be clear: this shadows IntervalTrees.Interval
"A genomic interval specifies interval with some associated metadata"
struct Interval{T} <: IntervalTrees.AbstractInterval{Int64}
The first three fields (`seqname`, `first`, and `last`) are mandatory arguments when constructing the [`Interval`](@ref Interval) object.
# Fields
- `seqname::String`: the sequence name associated with the interval.
- `first::Int64`: the leftmost position.
- `last::Int64`: the rightmost position.
- `strand::Strand`: the [`strand`](@ref Strand).
- `metadata::T`
struct Interval{T} <: IntervalTrees.AbstractInterval{Int64}
Expand Up @@ -39,6 +39,7 @@ const ICTreeIntersection{T} = IntervalTrees.Intersection{Int64
const ICTreeIntersectionIterator{F,S,T} = IntervalTrees.IntersectionIterator{F,Int64,Interval{S},64,Interval{T},64}
const ICTreeIntervalIntersectionIterator{F,T} = IntervalTrees.IntervalIntersectionIterator{F, Int64,Interval{T},64}

"An IntervalCollection is an efficiently stored and indexed set of annotated genomic intervals."
mutable struct IntervalCollection{T}
# Sequence name mapped to IntervalTree, which in turn maps intervals to a list of metadata.
Expand All @@ -51,11 +52,12 @@ mutable struct IntervalCollection{T}

"Empty initaialzation."
function IntervalCollection{T}() where T
return new{T}(Dict{String,ICTree{T}}(), 0, ICTree{T}[], false)

# Bulk insertion.
"Bulk insertion."
function IntervalCollection{T}(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T
if sort
Expand All @@ -80,17 +82,26 @@ mutable struct IntervalCollection{T}

# Shorthand constructor.
IntervalCollection(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T
Shorthand constructor.
function IntervalCollection(intervals::AbstractVector{Interval{T}}, sort::Bool=false) where T
return IntervalCollection{T}(intervals, sort)

# Constructor that offers conversion through collection.
IntervalCollection{T}(data, sort::Bool=false) where T
Constructor that offers conversion through collection.
function IntervalCollection{T}(data, sort::Bool=false) where T
return IntervalCollection(collect(Interval{T}, data), sort)

# Constructor that guesses metadatatype, and offers conversion through collection.
IntervalCollection(data, sort::Bool=false)
Constructor that guesses metadatatype, and offers conversion through collection.
function IntervalCollection(data, sort::Bool=false)
return IntervalCollection(collect(Interval{metadatatype(data)}, data), sort)
# This file is a part of BioJulia.
# License is MIT:

# Outer constructors
* [`Strand(strand::Char)`](@ref)
* [`Strand(strand::UInt8)`](@ref)
[`Strand`](@ref) can take four kinds of values listed in the next table:
| Symbol | Constant | Meaning |
| :----- | :-------------------- | :-------------------------------- |
| `'?'` | [`STRAND_NA`](@ref) | strand is unknown or inapplicable |
| `'+'` | [`STRAND_POS`](@ref) | positive strand |
| `'-'` | [`STRAND_NEG`](@ref) | negative strand |
| `'.'` | [`STRAND_BOTH`](@ref) | non-strand-specific feature |
primitive type Strand 8 end

Base.convert(::Type{Strand}, strand::UInt8) = reinterpret(Strand, strand)

Strand(strand::UInt8) = convert(Strand, strand)
Base.convert(::Type{UInt8}, strand::Strand) = reinterpret(UInt8, strand)

Expand Down Expand Up @@ -45,6 +63,10 @@ function Base.convert(::Type{Strand}, strand::Char)

error("'$(strand)' is not a valid strand")

Strand(strand::Char) = convert(Strand, strand)

function Base.convert(::Type{Char}, strand::Strand)
Expand Down

