Change calculation of ARIMA confidence intervals #4248

Nyrio · 2021-09-29T16:38:18Z

The formula that we have been using since I added support for confidence intervals in ARIMA is slightly different than the one used in statsmodels. The difference is in particular quite pronounced when datasets have missing observations, which pushed me to raise tolerance for the intervals unit tests when I added test cases in the recent PR #4058.

In this PR, I change our calculation to match statsmodels, and decrease the corresponding test tolerance, as we now have a strict match with statsmodels.

Previous formula:

lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))

New formula:

lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t)
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t)

…formula used in statsmodels

tfeher

Hi @Nyrio thanks for this fix! The implementation looks good.

I see that the updated formula omits the d_sigma2 value while we calculate the margin. Could you add a line to the PR description describing this change in the margin definition? Is there a theoretical reason to prefer the new formula, or is it just to align better with statsmodels?

Nyrio · 2021-09-30T11:30:59Z

@tfeher I have added the previous and new formulas in the description.

As to why we need to change: both statsmodels and R use this formula. I can't remember where/how I found the formula I was using, but I'd rather trust the two most popular implementations than my past self.

dantegd · 2021-09-30T15:29:08Z

@Nyrio I think merging branch-21.10 into the PR would be a good idea to get CI to pass, the errors that happened should be fixed by now

ajschmidt8 · 2021-10-04T15:44:47Z

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

Nyrio · 2021-10-04T15:57:09Z

@ajschmidt8 I misclicked branch-0.12 instead of branch-21.12 and it triggered the request for review, sorry for that.

codecov-commenter · 2021-10-04T23:33:38Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@57a6ff7). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.12    #4248   +/-   ##
===============================================
  Coverage                ?   86.06%           
===============================================
  Files                   ?      231           
  Lines                   ?    18691           
  Branches                ?        0           
===============================================
  Hits                    ?    16087           
  Misses                  ?     2604           
  Partials                ?        0

Flag	Coverage Δ
dask	`47.01% <0.00%> (?)`
non-dask	`78.75% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57a6ff7...904e81c. Read the comment docs.

Nyrio · 2021-10-05T11:16:44Z

@rapidsai/cuml-python-codeowners for required approval

dantegd · 2021-10-11T18:25:33Z

@gpucibot merge

The formula that we have been using since I added support for confidence intervals in ARIMA is slightly different than the one used in statsmodels. The difference is in particular quite pronounced when datasets have missing observations, which pushed me to raise tolerance for the intervals unit tests when I added test cases in the recent PR rapidsai#4058. In this PR, I change our calculation to match statsmodels, and decrease the corresponding test tolerance, as we now have a strict match with statsmodels. Previous formula: ```python lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i)) upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i)) ``` New formula: ```python lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t) upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t) ``` Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4248

Exclude term from confidence intervals calculation to align with the …

7b83898

…formula used in statsmodels

Nyrio added 3 - Ready for Review Ready for review by team CUDA / C++ CUDA issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 29, 2021

Nyrio requested review from a team as code owners September 29, 2021 16:38

github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels Sep 29, 2021

Clang format

576d9c0

tfeher approved these changes Sep 30, 2021

View reviewed changes

Nyrio added 2 commits September 30, 2021 08:37

Merge branch 'branch-21.10' into bug-arima-intervals

54d863a

Merge branch 'branch-21.12' into bug-arima-intervals

904e81c

github-actions bot added CMake conda conda issue labels Oct 4, 2021

Nyrio mentioned this pull request Oct 4, 2021

Add support for exogenous variables to ARIMA #4221

Merged

Nyrio changed the base branch from branch-21.10 to branch-0.12 October 4, 2021 15:28

Nyrio requested review from a team as code owners October 4, 2021 15:28

Nyrio changed the base branch from branch-0.12 to branch-21.12 October 4, 2021 15:28

ajschmidt8 removed the request for review from a team October 4, 2021 15:44

dantegd approved these changes Oct 11, 2021

View reviewed changes

rapids-bot bot merged commit 0c13f44 into rapidsai:branch-21.12 Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change calculation of ARIMA confidence intervals #4248

Change calculation of ARIMA confidence intervals #4248

Nyrio commented Sep 29, 2021 •

edited

Loading

tfeher left a comment

Nyrio commented Sep 30, 2021

dantegd commented Sep 30, 2021

ajschmidt8 commented Oct 4, 2021

Nyrio commented Oct 4, 2021 •

edited

Loading

codecov-commenter commented Oct 4, 2021

Nyrio commented Oct 5, 2021

dantegd commented Oct 11, 2021

Change calculation of ARIMA confidence intervals #4248

Change calculation of ARIMA confidence intervals #4248

Conversation

Nyrio commented Sep 29, 2021 • edited Loading

tfeher left a comment

Choose a reason for hiding this comment

Nyrio commented Sep 30, 2021

dantegd commented Sep 30, 2021

ajschmidt8 commented Oct 4, 2021

Nyrio commented Oct 4, 2021 • edited Loading

codecov-commenter commented Oct 4, 2021

Codecov Report

Nyrio commented Oct 5, 2021

dantegd commented Oct 11, 2021

Nyrio commented Sep 29, 2021 •

edited

Loading

Nyrio commented Oct 4, 2021 •

edited

Loading