-
Notifications
You must be signed in to change notification settings - Fork 80
Fix behavior of SARIMAXModel
if simple_differencing=True is set
#837
Merged
Merged
Changes from 20 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
1159ff8
Rename determine_num_steps_to_forecast to determine_num_steps
b719892
Rename determine_num_steps_to_forecast to determine_num_steps
fb8443f
Fix SARIMAXModel to work with simple_differencing
2a1fc09
Update lock, add tests on simple_differencing
ea49210
Get lock back to master:
f411a34
Update changelog
75925b1
Add more precise comment
9d74d7f
Clarify test case for simple_differencing
47f75d4
Add seasonal_prediction_with_confidence as external code
48b61ce
Fix import
bd6cc96
Rename file
37a1ba7
Rename file
e094ae8
Rename file, revert pyproject, add ARMAtoMA
7d0ab41
Make more clear statement about copying
fdaa4a5
Fix bug with version
c05c42f
Change import
7c6cb6b
Update lock
9ea9518
Fix module name
31c5dff
Revert "Update lock"
43b78e4
Remove unnecessary doctest
9163dbd
Remove unnecessary test, add test on simple_differencing
1b664c1
Add more comments about usage of determine_num_steps
dd9978f
Merge branch 'master' into issue-836
Mr-Geekman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
from etna.libs.pmdarima_utils.arima import seasonal_prediction_with_confidence |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
""" | ||
MIT License | ||
|
||
Copyright (c) 2017 Taylor G Smith | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. | ||
""" | ||
# Note: Copied from pmdarima package (https://github.com/blue-yonder/tsfresh/blob/https://github.com/alkaline-ml/pmdarima/blob/v1.8.5/pmdarima/arima/arima.py) | ||
|
||
import numpy as np | ||
import numpy.polynomial.polynomial as np_polynomial | ||
from sklearn.utils.validation import check_array | ||
from pmdarima.utils import diff | ||
from pmdarima.utils import diff_inv | ||
from pmdarima.utils import check_endog | ||
|
||
|
||
def ARMAtoMA(ar, ma, max_deg): | ||
r""" | ||
Convert ARMA coefficients to infinite MA coefficients. | ||
Compute coefficients of MA model equivalent to given ARMA model. | ||
MA coefficients are cut off at max_deg. | ||
The same function as ARMAtoMA() in stats library of R | ||
Parameters | ||
---------- | ||
ar : array-like, shape=(n_orders,) | ||
The array of AR coefficients. | ||
ma : array-like, shape=(n_orders,) | ||
The array of MA coefficients. | ||
max_deg : int | ||
Coefficients are computed up to the order of max_deg. | ||
Returns | ||
------- | ||
np.ndarray, shape=(max_deg,) | ||
Equivalent MA coefficients. | ||
Notes | ||
----- | ||
Here is the derivation. Suppose ARMA model is defined as | ||
.. math:: | ||
x_t - ar_1*x_{t-1} - ar_2*x_{t-2} - ... - ar_p*x_{t-p}\\ | ||
= e_t + ma_1*e_{t-1} + ma_2*e_{t-2} + ... + ma_q*e_{t-q} | ||
namely | ||
.. math:: | ||
(1 - \sum_{i=1}^p[ar_i*B^i]) x_t = (1 + \sum_{i=1}^q[ma_i*B^i]) e_t | ||
where :math:`B` is a backward operator. | ||
Equivalent MA model is | ||
.. math:: | ||
x_t = (1 - \sum_{i=1}^p[ar_i*B^i])^{-1}\\ | ||
* (1 + \sum_{i=1}^q[ma_i*B^i]) e_t\\ | ||
= (1 + \sum_{i=1}[ema_i*B^i]) e_t | ||
where :math:``ema_i`` is a coefficient of equivalent MA model. | ||
The :math:``ema_i`` satisfies | ||
.. math:: | ||
(1 - \sum_{i=1}^p[ar_i*B^i]) * (1 + \sum_{i=1}[ema_i*B^i]) \\ | ||
= 1 + \sum_{i=1}^q[ma_i*B^i] | ||
thus | ||
.. math:: | ||
\sum_{i=1}[ema_i*B^i] = \sum_{i=1}^p[ar_i*B^i] \\ | ||
+ \sum_{i=1}^p[ar_i*B^i] * \sum_{j=1}[ema_j*B^j] \\ | ||
+ \Sum_{i=1}^q[ma_i*B^i] | ||
therefore | ||
.. math:: | ||
ema_i = ar_i (but 0 if i>p) \\ | ||
+ \Sum_{j=1}^{min(i-1,p)}[ar_j*ema_{i-j}] + ma_i(but 0 if i>q) \\ | ||
= \sum_{j=1}{min(i,p)}[ar_j*ema_{i-j}(but 1 if j=i)] \\ | ||
+ ma_i(but 0 if i>q) | ||
""" | ||
p = len(ar) | ||
q = len(ma) | ||
ema = np.empty(max_deg) | ||
for i in range(0, max_deg): | ||
temp = ma[i] if i < q else 0.0 | ||
for j in range(0, min(i + 1, p)): | ||
temp += ar[j] * (ema[i - j - 1] if i - j - 1 >= 0 else 1.0) | ||
ema[i] = temp | ||
return ema | ||
|
||
|
||
# Note: Originally copied from pmdarima package (https://github.com/blue-yonder/tsfresh/blob/https://github.com/alkaline-ml/pmdarima/blob/v1.8.5/pmdarima/arima/arima.py) | ||
def seasonal_prediction_with_confidence(arima_res, | ||
start, | ||
end, | ||
X, | ||
alpha, | ||
**kwargs): | ||
"""Compute the prediction for a SARIMAX and get a conf interval | ||
|
||
Unfortunately, SARIMAX does not really provide a nice way to get the | ||
confidence intervals out of the box, so we have to perform the | ||
``get_prediction`` code here and unpack the confidence intervals manually. | ||
""" | ||
results = arima_res.get_prediction( | ||
start=start, | ||
end=end, | ||
exog=X, | ||
**kwargs) | ||
|
||
f = results.predicted_mean | ||
conf_int = results.conf_int(alpha=alpha) | ||
if arima_res.specification['simple_differencing']: | ||
# If simple_differencing == True, statsmodels.get_prediction returns | ||
# mid and confidence intervals on differenced time series. | ||
# We have to invert differencing the mid and confidence intervals | ||
y_org = arima_res.model.orig_endog | ||
d = arima_res.model.orig_k_diff | ||
D = arima_res.model.orig_k_seasonal_diff | ||
period = arima_res.model.seasonal_periods | ||
# Forecast mid: undifferencing non-seasonal part | ||
if d > 0: | ||
y_sdiff = y_org if D == 0 else diff(y_org, period, D) | ||
f_temp = np.append(y_sdiff[-d:], f) | ||
f_temp = diff_inv(f_temp, 1, d) | ||
f = f_temp[(2 * d):] | ||
# Forecast mid: undifferencing seasonal part | ||
if D > 0 and period > 1: | ||
f_temp = np.append(y_org[-(D * period):], f) | ||
f_temp = diff_inv(f_temp, period, D) | ||
f = f_temp[(2 * D * period):] | ||
# confidence interval | ||
ar_poly = arima_res.polynomial_reduced_ar | ||
poly_diff = np_polynomial.polypow(np.array([1., -1.]), d) | ||
sdiff = np.zeros(period + 1) | ||
sdiff[0] = 1. | ||
sdiff[-1] = 1. | ||
poly_sdiff = np_polynomial.polypow(sdiff, D) | ||
ar = -np.polymul(ar_poly, np.polymul(poly_diff, poly_sdiff))[1:] | ||
ma = arima_res.polynomial_reduced_ma[1:] | ||
n_predMinus1 = end - start | ||
ema = ARMAtoMA(ar, ma, n_predMinus1) | ||
sigma2 = arima_res._params_variance[0] | ||
var = np.cumsum(np.append(1., ema * ema)) * sigma2 | ||
q = results.dist.ppf(1. - alpha / 2, *results.dist_args) | ||
conf_int[:, 0] = f - q * np.sqrt(var) | ||
conf_int[:, 1] = f + q * np.sqrt(var) | ||
|
||
return check_endog(f, dtype=None, copy=False), \ | ||
check_array(conf_int, copy=False, dtype=None) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,9 @@ | ||
import numpy as np | ||
import pytest | ||
from statsmodels.tsa.statespace.sarimax import SARIMAX | ||
|
||
from etna.datasets import TSDataset | ||
from etna.datasets import generate_ar_df | ||
from etna.models import SARIMAXModel | ||
from etna.pipeline import Pipeline | ||
|
||
|
@@ -134,3 +137,30 @@ def test_sarimax_forecast_1_point(example_tsds): | |
assert len(pred.df) == horizon | ||
pred_quantiles = model.forecast(future_ts, prediction_interval=True, quantiles=[0.025, 0.8]) | ||
assert len(pred_quantiles.df) == horizon | ||
|
||
|
||
def test_prediction_simple_differencing(): | ||
"""Check that SARIMAX gives similar results with different values of ``simple_differencing``. | ||
|
||
We generate dataset from ``generate_ar_df`` with ``ar_coef=[1]`` and it gives us (0, 1, 1) process. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (1,0,1), no? |
||
""" | ||
horizon = 7 | ||
df = generate_ar_df(periods=100, n_segments=3, start_time="2020-01-01", ar_coef=[1]) | ||
ts = TSDataset(df=TSDataset.to_dataset(df), freq="D") | ||
|
||
# prepare prediction from regular model | ||
model_regular = SARIMAXModel(order=(0, 1, 1)) | ||
model_regular.fit(ts) | ||
future_ts = ts.make_future(future_steps=horizon) | ||
regular_prediction = model_regular.forecast(future_ts) | ||
regular_prediction = regular_prediction.to_pandas(flatten=True) | ||
|
||
# prepare prediction from model with simple differencing | ||
model_simplified = SARIMAXModel(order=(0, 1, 1), simple_differencing=True) | ||
model_simplified.fit(ts) | ||
future_ts = ts.make_future(future_steps=horizon) | ||
simplified_prediction = model_simplified.forecast(future_ts) | ||
simplified_prediction = simplified_prediction.to_pandas(flatten=True) | ||
|
||
correlation = np.corrcoef(regular_prediction["target"], simplified_prediction["target"])[0, 1] | ||
assert correlation >= 0.95 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we should add assert there, something like
end_idx-start_idx == len(df), "Check that total number of steps to forecast is equal to total ts lenght
and so so onIt's not obvious how number of steps has become index