- ML-Quant.com - Automated Research Repository
Easily develop state of the art time series models to forecast univariate data series. Simply load your data and select which models you want to test. This is the largest repository of automated structural and machine learning time series models. Please get in contact if you want to contribute a model. This is a fledgling project, all advice appreciated.
pip install atspy
ARIMA
- Automated ARIMA ModellingProphet
- Modeling Multiple Seasonality With Linear or Non-linear GrowthHWAAS
- Exponential Smoothing With Additive Trend and Additive SeasonalityHWAMS
- Exponential Smoothing with Additive Trend and Multiplicative SeasonalityNBEATS
- Neural basis expansion analysis (now fixed at 20 Epochs)Gluonts
- RNN-based Model (now fixed at 20 Epochs)TATS
- Seasonal and Trend no Box CoxTBAT
- Trend and Box CoxTBATS1
- Trend, Seasonal (one), and Box CoxTBATP1
- TBATS1 but Seasonal Inference is Hardcoded by PeriodicityTBATS2
- TBATS1 With Two Seasonal Periods
- Implements all your favourite automated time series models in a unified manner by simply running
AutomatedModel(df)
. - Reduce structural model errors with 30%-50% by using LightGBM with TSFresh infused features.
- Automatically identify the seasonalities in your data using singular spectrum analysis, periodograms, and peak analysis.
- Identifies and makes accessible the best model for your time series using in-sample validation methods.
- Combines the predictions of all these models in a simple (average) and complex (GBM) ensembles for improved performance.
- Where appropriate models have been developed to use GPU resources to speed up the automation process.
- Easily access all the models by using
am.models_dict_in
for in-sample andam.models_dict_out
for out-of-sample prediction.
- Univariate forecasting only (single column) and only monthly and daily data have been tested for suitability.
- More work ahead; all suggestions and criticisms appreciated, use the issues tab.
- Here is a Google Colab to run the package in the cloud and here you can run all the models.
from atspy import AutomatedModel
The data requires strict preprocessing, no periods can be skipped and there cannot be any empty values.
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/firmai/random-assets-two/master/ts/monthly-beer-australia.csv")
df.Month = pd.to_datetime(df.Month)
df = df.set_index("Month"); df
Megaliters | |
---|---|
Month | |
1956-01-01 | 93.2 |
1956-02-01 | 96.0 |
1956-03-01 | 95.2 |
1956-04-01 | 77.1 |
1956-05-01 | 70.9 |
AutomatedModel
- Returns a class instance.forecast_insample
- Returns an in-sample forcasted dataframe and performance.forecast_outsample
- Returns an out-of-sample forcasted dataframe.ensemble
- Returns the results of three different forms of ensembles.models_dict_in
- Returns a dictionary of the fully trained in-sample models.models_dict_out
- Returns a dictionary of the fully trained out-of-sample models.
from atspy import AutomatedModel
model_list = ["HWAMS","HWAAS","TBAT"]
am = AutomatedModel(df = df , model_list=model_list,forecast_len=20 )
Other models to try, add as many as you like; note ARIMA
is slow: ["ARIMA","Gluonts","Prophet","NBEATS", "TATS", "TBATS1", "TBATP1", "TBATS2"]
forecast_in, performance = am.forecast_insample(); forecast_in
Target | HWAMS | HWAAS | TBAT | |
---|---|---|---|---|
Date | ||||
1985-10-01 | 181.6 | 161.962148 | 162.391653 | 148.410071 |
1985-11-01 | 182.0 | 174.688055 | 173.191756 | 147.999237 |
1985-12-01 | 190.0 | 189.728744 | 187.649575 | 147.589541 |
1986-01-01 | 161.2 | 155.077205 | 154.817215 | 147.180980 |
1986-02-01 | 155.5 | 148.054292 | 147.477692 | 146.773549 |
performance
Target | HWAMS | HWAAS | TBAT | |
---|---|---|---|---|
rmse | 0.000000 | 17.599400 | 18.993827 | 36.538009 |
mse | 0.000000 | 309.738878 | 360.765452 | 1335.026136 |
mean | 155.293277 | 142.399639 | 140.577496 | 126.590412 |
forecast_out = am.forecast_outsample(); forecast_out
HWAMS | HWAAS | TBAT | |
---|---|---|---|
Date | |||
1995-09-01 | 137.518755 | 137.133938 | 142.906275 |
1995-10-01 | 164.136220 | 165.079612 | 142.865575 |
1995-11-01 | 178.671684 | 180.009560 | 142.827110 |
1995-12-01 | 184.175954 | 185.715043 | 142.790757 |
1996-01-01 | 147.166448 | 147.440026 | 142.756399 |
all_ensemble_in, all_ensemble_out, all_performance = am.ensemble(forecast_in, forecast_out)
all_performance
rmse | mse | mean | |
---|---|---|---|
ensemble_lgb__X__HWAMS | 9.697588 | 94.043213 | 146.719412 |
ensemble_lgb__X__HWAMS__X__HWAMS_HWAAS__X__ensemble_ts__X__HWAAS | 9.875212 | 97.519817 | 145.250837 |
ensemble_lgb__X__HWAMS__X__HWAMS_HWAAS | 11.127326 | 123.817378 | 142.994374 |
ensemble_lgb | 12.748526 | 162.524907 | 156.487208 |
ensemble_lgb__X__HWAMS__X__HWAMS_HWAAS__X__ensemble_ts__X__HWAAS__X__HWAMS_HWAAS_TBAT__X__TBAT | 14.589155 | 212.843442 | 138.615567 |
HWAMS | 15.567905 | 242.359663 | 136.951615 |
HWAMS_HWAAS | 16.651370 | 277.268110 | 135.544299 |
ensemble_ts | 17.255107 | 297.738716 | 163.134079 |
HWAAS | 17.804066 | 316.984751 | 134.136983 |
HWAMS_HWAAS_TBAT | 23.358758 | 545.631579 | 128.785846 |
TBAT | 39.003864 | 1521.301380 | 115.268940 |
all_ensemble_in[["Target","ensemble_lgb__X__HWAMS","HWAMS","HWAAS"]].plot()
all_ensemble_out[["ensemble_lgb__X__HWAMS","HWAMS","HWAAS"]].plot()
am.models_dict_in
{'HWAAS': <statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0x7f42f7822d30>,
'HWAMS': <statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0x7f42f77fff60>,
'TBAT': <tbats.tbats.Model.Model at 0x7f42d3aab048>}
am.models_dict_out
{'HWAAS': <statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0x7f9c01309278>,
'HWAMS': <statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0x7f9c01309cf8>,
'TBAT': <tbats.tbats.Model.Model at 0x7f9c08f18ba8>}
Follow this link if you want to run the package in the cloud.
- Additional in-sample validation steps to stop deep learning models from over and underfitting.
- Extra performance metrics like MAPE and MAE.
- Improved methods to select the window length to use in training and calibrating the model.
- Add the ability to accept dirty data, and have the ability to clean it up, interpolation etc.
- Add a function to resample to a larger frequency for big datasets.
- Add the ability to algorithmically select a good enough chunk of a large dataset to balance performance and time to train.
- More internal model optimisation using AIC, BIC an AICC.
- Code annotations for other developers to follow and improve on the work being done.
- Force seasonality stability between in and out of sample training models.
- Make AtsPy less dependency heavy, currently it draws on tensorflow, pytorch and mxnet.
If you use AtsPy in your research, please consider citing it. I have also written a small report that can be found on SSRN.
BibTeX entry:
@software{atspy,
title = {{AtsPy}: Automated Time Series Models in Python.},
author = {Snow, Derek},
url = {https://github.com/firmai/atspy/},
version = {1.15},
date = {2020-02-17},
}
@misc{atspy,
author = {Snow, Derek},
title = {{AtsPy}: Automated Time Series Models in Python (1.15).},
year = {2020},
url = {https://github.com/firmai/atspy/},
}