New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add the `sp` parameter to ARIMA models #3597

Merged

eccabay merged 9 commits into main from 4285_arima_sp

Jul 8, 2022

Contributor

eccabay commented Jun 30, 2022 •

edited

Loading

A bit of information on setting this parameter: http://alkaline-ml.com/pmdarima/tips_and_tricks.html#period

I tried to offer the flexibility for the user to set it if they know it, but also add some quick default values for common time series frequencies.

Perf tests are complete.

eccabay added 5 commits

June 30, 2022 16:57


          Move component_obj init to fit, add sp param

1c88cfd


          Current tests passing

b585efa


          New tests for sp parameter

ee439b0


          Merge branch 'main' into 4285_arima_sp

0959b8f


          Update release notes

6d031a1

codecov bot commented Jun 30, 2022 •

edited

Loading

Codecov Report

Merging #3597 (cfdf6c1) into main (26896e9) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3597     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        335     335             
  Lines      33387   33456     +69     
=======================================
+ Hits       33258   33327     +69     
  Misses       129     129

Impacted Files	Coverage Δ
...omponents/estimators/regressors/arima_regressor.py	`100.0% <100.0%> (ø)`
...alml/tests/component_tests/test_arima_regressor.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26896e9...cfdf6c1. Read the comment docs.

eccabay marked this pull request as ready for review

July 1, 2022 12:27

auto-assign bot assigned eccabay

eccabay requested review from jeremyliweishih, ParthivNaresh, christopherbunn, chukarsten, fjlanasa and MichaelFu512

July 1, 2022 12:27

jeremyliweishih approved these changes

View reviewed changes

Collaborator

jeremyliweishih left a comment

great work on this @eccabay! Just some additional testing suggestions but lets wait to review the perf test results before merging!

evalml/pipelines/components/estimators/regressors/arima_regressor.py Outdated Show resolved Hide resolved

evalml/pipelines/components/estimators/regressors/arima_regressor.py Outdated Show resolved Hide resolved

evalml/tests/component_tests/test_arima_regressor.py Show resolved Hide resolved

evalml/tests/component_tests/test_arima_regressor.py Show resolved Hide resolved

ParthivNaresh reviewed

View reviewed changes

evalml/pipelines/components/estimators/regressors/arima_regressor.py Outdated

+                          return 1
+                      freq_mappings = {
+                          "D": 7,
+                          "W": 52,

Contributor

ParthivNaresh Jul 5, 2022

One of the main reasons I never implemented the mapping of sp was because of how ridiculously long ARIMA would take to fit on sp values above 12 (the R implementation seems to be much faster). I like this approach of mapping to at least daily, monthly, and quarterly data. I'm curious to see the fit time outcomes of the perf tests for any weekly datasets we have

Contributor Author

eccabay Jul 5, 2022

After taking a look, we don't have any weekly datasets in the perf tests 😭. I'm happy to remove this default for now, since I've noticed similar time issues while testing locally.

chukarsten suggested changes

View reviewed changes

Contributor

chukarsten left a comment

Looking solid, @eccabay . Just left a few comments for exploration, particularly with respect to performance implications of re-inferring temporal frequencies and the setting/re-setting of the _component_obj in __init__() and __fit__().

evalml/pipelines/components/estimators/regressors/arima_regressor.py Outdated Show resolved Hide resolved

evalml/pipelines/components/estimators/regressors/arima_regressor.py

+                      time_index = self._parameters.get("time_index", None)
+                      sp = self.arima_parameters["sp"]
+                      if sp == "detect":
+                          inferred_freqs = X.ww.infer_temporal_frequencies()

Contributor

chukarsten Jul 5, 2022

So, dumb question here: if X already has its freqs inferred, will this reinfer them? Inference of data types is expensive, I'd assume freq inference is, too.

Contributor Author

eccabay Jul 6, 2022

Unfortunately, from what I can tell, neither woodwork nor pandas saves the frequency information - you can run inference in either library to get the frequency, but it's not saved anywhere between runs.

Contributor

chukarsten Jul 6, 2022

Hmm, that's weird, because I know in a pandas DatetimeIndex, it saves the inferred frequency in both the freq and freqstr attributes. If you instantiate the DatetimeIndex like dti = pd.DatetimeIndex( ... , freq="D"), it should carry that freq around with it. I guess this is different because Woodwork doesn't necessarily have a datetime index and just has columns, one of which being the time index column.

evalml/pipelines/components/estimators/regressors/arima_regressor.py Outdated Show resolved Hide resolved

evalml/tests/component_tests/test_arima_regressor.py

+                  sp_ = clf_month._get_sp(X)
+                  assert sp_ == 12
+                  X = pd.DataFrame({"dates": pd.date_range("2021-01-01", periods=500, freq="2D")})

Contributor

chukarsten Jul 5, 2022

I think going back to one of my earlier comments, if X already knows the frequency, we might not want to have the freq re-infered. Perhaps we want to add a test counting the number of times infer_temporal_freqs() is called in the case of a known frequency provided and limit it to 1?

eccabay added 2 commits

July 6, 2022 09:42


          PR comments

107b005


          Merge branch 'main' into 4285_arima_sp

86f31b2

eccabay requested a review from chukarsten

July 6, 2022 14:18


          Return default sp to 1

e5422aa

chukarsten approved these changes

View reviewed changes

Contributor

chukarsten left a comment

Thanks!


          Merge branch 'main' into 4285_arima_sp

cfdf6c1

eccabay enabled auto-merge (squash)

July 8, 2022 15:08

eccabay merged commit a303470 into main

eccabay deleted the 4285_arima_sp branch

July 8, 2022 15:23

chukarsten mentioned this pull request

Release v0.55.0 #3625

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ParthivNaresh ParthivNaresh left review comments

jeremyliweishih jeremyliweishih approved these changes

chukarsten chukarsten approved these changes

christopherbunn Awaiting requested review from christopherbunn

fjlanasa Awaiting requested review from fjlanasa

MichaelFu512 Awaiting requested review from MichaelFu512

Labels

None yet