Skip to content

Commit

Permalink
TimeArray: accept duplicated but sorted time index (#455)
Browse files Browse the repository at this point in the history
```julia
julia> a = TimeArray([Date(2015, 10, 1), Date(2015, 10, 2), Date(2015, 10, 3)], [1,2,3]);

julia> b = TimeArray([Date(2015, 10, 2), Date(2015, 10, 3)], [4, 5]);

julia> [a; b]
5×1 TimeArray{Int64,1,Date,Array{Int64,1}} 2015-10-01 to 2015-10-03
│            │ A     │
├────────────┼───────┤
│ 2015-10-01 │ 1     │
│ 2015-10-02 │ 2     │
│ 2015-10-02 │ 4     │
│ 2015-10-03 │ 3     │
│ 2015-10-03 │ 5     │
```
  • Loading branch information
iblislin authored Jun 6, 2020
1 parent e66f657 commit ed0cd20
Show file tree
Hide file tree
Showing 9 changed files with 62 additions and 59 deletions.
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ version = "0.18.0"
[deps]
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
DocStringExtensions = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
RecipesBase = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
DocStringExtensions = "0.8"
RecipesBase = "0.5, 0.7, 0.8, 1.0"
Reexport = "0.1, 0.2"
Tables = "1"
Expand Down
1 change: 1 addition & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
all:
julia --project=. -e 'using Pkg; Pkg.instantiate(); Pkg.develop(PackageSpec(path=joinpath(pwd(), "..")))'
julia --project=. --color=yes make.jl
19 changes: 6 additions & 13 deletions docs/src/combine.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,23 +96,16 @@ collapse(cl, month, last, mean)
## `vcat`

The `vcat` method is used to concatenate time series: if you have two
time series with the same columns, but two distinct periods of time,
this function can merge them into a single object. Notably, it can be
used to merge data that is split into multiple files. Its behaviour is
quite different from `merge`, which does not consider that its arguments
are actually the *same* time series.
time series with the same columns, this function can merge them into a single object.
Notably, it can be used to merge data that is split into multiple files.
Its behaviour is quite different from `merge`,
which does not consider that its arguments are actually the *same* time series.

This concatenation is *vertical* (`vcat`) because it does not create
columns, it extends existing ones (which are represented vertically).

For example:

```@repl
using TimeSeries
a = TimeArray([Date(2015, 10, 01), Date(2015, 11, 01)], [15, 16])
b = TimeArray([Date(2015, 12, 01)], [17])
vcat(a, b)
[a; b] # same as vcat(a,b)
```@docs
vcat(tas::Vararg{TimeArray,N} where N)
```

## `map`
Expand Down
6 changes: 4 additions & 2 deletions src/TimeSeries.jl
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
module TimeSeries

# stdlib
using Dates
using DelimitedFiles
using Statistics
using Tables

# third-party
using DocStringExtensions: SIGNATURES
using RecipesBase
using Reexport
using Tables

export TimeArray, AbstractTimeSeries,
when, from, to, findwhen, find, timestamp, values, colnames, meta, head, tail,
Expand Down
39 changes: 31 additions & 8 deletions src/combine.jl
Original file line number Diff line number Diff line change
Expand Up @@ -139,29 +139,52 @@ end

# vcat ######################

function vcat(TA::TimeArray...)
"""
$(SIGNATURES)
Concatenate two ``TimeArray`` into single object.
If there are duplicated timestamps, we will keep order as the function input.
```julia-repl
julia> a = TimeArray([Date(2015, 10, 1), Date(2015, 10, 2), Date(2015, 10, 3)], [1, 2, 3]);
julia> b = TimeArray([Date(2015, 10, 2), Date(2015, 10, 3)], [4, 5]);
julia> [a; b]
5×1 TimeArray{Int64,1,Date,Array{Int64,1}} 2015-10-01 to 2015-10-03
│ │ A │
├────────────┼───────┤
│ 2015-10-01 │ 1 │
│ 2015-10-02 │ 2 │
│ 2015-10-02 │ 4 │
│ 2015-10-03 │ 3 │
│ 2015-10-03 │ 5 │
```
"""
function vcat(tas::TimeArray...)
# Check all meta fields are identical.
prev_meta = meta(TA[1])
for ta in TA
prev_meta = meta(tas[1])
for ta in tas
if meta(ta) != prev_meta
throw(ArgumentError("metadata doesn't match"))
end
end

# Check column names are identical.
prev_colnames = colnames(TA[1])
for ta in TA
prev_colnames = colnames(tas[1])
for ta in tas
if colnames(ta) != prev_colnames
throw(ArgumentError("column names don't match"))
end
end

# Concatenate the contents.
ts = vcat([timestamp(ta) for ta in TA]...)
val = vcat([values(ta) for ta in TA]...)
ts = vcat([timestamp(ta) for ta in tas]...)
val = vcat([values(ta) for ta in tas]...)

order = sortperm(ts)
if ndims(TA[1]) == 1 # Check for 1D to ensure values remains a 1D vector.
if ndims(tas[1]) == 1 # Check for 1D to ensure values remains a 1D vector.
return TimeArray(ts[order], val[order], prev_colnames, prev_meta)
else
return TimeArray(ts[order], val[order, :], prev_colnames, prev_meta)
Expand Down
9 changes: 4 additions & 5 deletions src/timearray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ abstract type AbstractTimeSeries{T,N,D} end
TimeArray(data::NamedTuple, timestamp = :datetime, meta)
TimeArray(table; timestamp::Symbol, timeparser::Callable = identity)
The second constructor will yields a new TimeArray with the new given fields.
The second constructor yields a new `TimeArray` with the new given fields.
Note that the unchanged fields will be shared, there aren't any copy for the
underlying arrays.
Expand All @@ -27,7 +27,6 @@ The third constructor builds a `TimeArray` from a `NamedTuple`.
# Arguments
- `timestamp::AbstractVector{<:TimeType}`: a vector of sorted timestamps,
Each element in this vector should be unique.
- `timestamp::Symbol`: the column name of the time index from the source table.
The constructor is used for the Tables.jl package integration.
Expand Down Expand Up @@ -74,14 +73,14 @@ struct TimeArray{T,N,D<:TimeType,A<:AbstractArray{T,N}} <: AbstractTimeSeries{T,
nrow != length(timestamp) && throw(DimensionMismatch("values must match length of timestamp"))
ncol != length(colnames) && throw(DimensionMismatch("column names must match width of array"))

_issorted_and_unique(timestamp) && return new(
issorted(timestamp) && return new(
timestamp, values, replace_dupes!(colnames), meta)

timestamp_r = reverse(timestamp)
_issorted_and_unique(timestamp_r) && return new(
issorted(timestamp_r) && return new(
timestamp_r, reverse(values, dims = 1), replace_dupes!(colnames), meta)

throw(ArgumentError("timestamps must be strictly monotonic"))
throw(ArgumentError("timestamps must be monotonic"))
end
end

Expand Down
8 changes: 0 additions & 8 deletions src/utilities.jl
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,6 @@ end
true
end

# helper method for inner constructor
@inline function _issorted_and_unique(x)
for i in 1:length(x)-1
@inbounds !(x[i] < x[i + 1]) && return false
end
true
end

# helper method for inner constructor
function replace_dupes!(cnames::Vector{Symbol})
n = 1
Expand Down
15 changes: 10 additions & 5 deletions test/combine.jl
Original file line number Diff line number Diff line change
Expand Up @@ -217,11 +217,16 @@ end
@test_throws ArgumentError vcat(a, b)
end

@testset "rejects when dates overlap" begin
a = TimeArray([Date(2015, 10, 01), Date(2015, 11, 01)], [15, 16])
b = TimeArray([Date(2015, 11, 01)], [17])

@test_throws ArgumentError vcat(a, b)
@testset "duplicated timestamps order" begin
a = TimeArray([Date(2015, 10, 1), Date(2015, 10, 2), Date(2015, 11, 1)], [15, 16, 17])
b = TimeArray([Date(2015, 10, 2), Date(2015, 11, 1)], [18, 19])

ts = [Date(2015, 10, 1),
Date(2015, 10, 2), Date(2015, 10, 2),
Date(2015, 11, 1), Date(2015, 11, 1)]
ta = vcat(a, b)
@test timestamp(ta) == ts
@test values(ta) == [15, 16, 18, 17, 19]
end

@testset "still works when dates are mixed" begin
Expand Down
22 changes: 4 additions & 18 deletions test/timearray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,7 @@ end


@testset "type constructors enforce invariants" begin
mangled_stamp = vcat(timestamp(cl)[200:end], timestamp(cl)[1:199])
dupe_stamp = vcat(timestamp(cl)[1:499], timestamp(cl)[499])
mangled_stamp = vcat(timestamp(cl)[200:end], timestamp(cl)[1:199])

@testset "unequal length between values and timestamp fails" begin
@test_throws(
Expand All @@ -63,12 +62,6 @@ end
TimeArray(timestamp(cl), values(cl), [:Close, :Open]))
end

@testset "duplicate timestamp values fails" begin
@test_throws(
ArgumentError,
TimeArray(dupe_stamp, values(cl), [:Close]))
end

@testset "mangled order of timestamp values fails" begin
@test_throws(
ArgumentError,
Expand Down Expand Up @@ -101,16 +94,9 @@ end
end

@testset "and doesn't when unchecked" begin
let
ta = TimeArray(mangled_stamp, values(cl); unchecked = true)
@test values(ta) === values(cl)
@test timestamp(ta) === mangled_stamp
end

let
ta = TimeArray(dupe_stamp, values(cl); unchecked = true)
@test timestamp(ta) === dupe_stamp
end
ta = TimeArray(mangled_stamp, values(cl); unchecked = true)
@test values(ta) === values(cl)
@test timestamp(ta) === mangled_stamp
end
end

Expand Down

0 comments on commit ed0cd20

Please sign in to comment.