-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change calculation of ARIMA confidence intervals #4248
Conversation
…formula used in statsmodels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Nyrio thanks for this fix! The implementation looks good.
I see that the updated formula omits the d_sigma2
value while we calculate the margin. Could you add a line to the PR description describing this change in the margin definition? Is there a theoretical reason to prefer the new formula, or is it just to align better with statsmodels?
@tfeher I have added the previous and new formulas in the description. As to why we need to change: both statsmodels and R use this formula. I can't remember where/how I found the formula I was using, but I'd rather trust the two most popular implementations than my past self. |
@Nyrio I think merging branch-21.10 into the PR would be a good idea to get CI to pass, the errors that happened should be fixed by now |
Removing |
@ajschmidt8 I misclicked |
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #4248 +/- ##
===============================================
Coverage ? 86.06%
===============================================
Files ? 231
Lines ? 18691
Branches ? 0
===============================================
Hits ? 16087
Misses ? 2604
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
@rapidsai/cuml-python-codeowners for required approval |
@gpucibot merge |
The formula that we have been using since I added support for confidence intervals in ARIMA is slightly different than the one used in statsmodels. The difference is in particular quite pronounced when datasets have missing observations, which pushed me to raise tolerance for the intervals unit tests when I added test cases in the recent PR rapidsai#4058. In this PR, I change our calculation to match statsmodels, and decrease the corresponding test tolerance, as we now have a strict match with statsmodels. Previous formula: ```python lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i)) upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i)) ``` New formula: ```python lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t) upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t) ``` Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4248
The formula that we have been using since I added support for confidence intervals in ARIMA is slightly different than the one used in statsmodels. The difference is in particular quite pronounced when datasets have missing observations, which pushed me to raise tolerance for the intervals unit tests when I added test cases in the recent PR #4058.
In this PR, I change our calculation to match statsmodels, and decrease the corresponding test tolerance, as we now have a strict match with statsmodels.
Previous formula:
New formula: