-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] VECM #3246
[WIP] VECM #3246
Conversation
Just wanted to let you know that I am going to make an effort to help review this. |
Thank you @ChadFulton! I had an exam this week and will now finish the integration of Josef's code (cointegration test) ASAP. IMO the PR will be ready to be merged once this is done. |
Travis is giving a |
@josef-pkt I think that - apart from one open question - the code is ready to be merged. This question is related to the |
AFAIR, I added it when a user asked for more flexible trend options. I only realized later that the options for the cointegration tests need to be limited to the cases for which we have the p-values or critical values for the test. The estimation of VECM can have more options, but the cointegration test can only allow the restricted set of options with p-values. We might be able to extend those a bit, but some combinations don't have a good (IIRC parameter or data independent) limiting distribution. |
From what I can see in It looks like travis is using |
I don't know why travis doesn't find the csv file One local check is to run |
Thanks for your help @josef-pkt! I really appreciate it! |
@josef-pkt, now the tests are passing in Travis. To me it looks as if the AppVeyor build is failing randomly (this time a Python 2 build job failed, whereas after my last commit it was the Python 3 build job). I think that the PR is now ready to be merged. What are the next steps? Shall I do a rebase using a copy of this branch (to avoid messing up this PR)? |
If the rebase goes well, then force pushing on this branch is fine. (There are two names in the previously rebased commits, but it looks fine in the network. It looks like the initial commits where with a different user name, but that wouldn't cause problems.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the first part of vecm.py, up to beginning of VECM class, with looking mainly at the API and docstrings.
Some names are not very informative but the part I went through is mostly internal, so it's not urgent to come up with better names there.
statsmodels/tsa/vecm/vecm.py
Outdated
seasons_centered=False, exog=None, exog_coint=None): | ||
""" | ||
Use the VECM's deterministic terms to construct an array that is suitable | ||
to convey this information to VAR in form of the exog-argument for VAR's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstrings need to have a one line summary at the beginning (will be used in tables or lists of functions and methods in the generated documentation)
statsmodels/tsa/vecm/vecm.py
Outdated
Parameters | ||
---------- | ||
deterministic : str | ||
See VECM's docstring for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should mention briefly what it is before pointing to the full VECM docstring
statsmodels/tsa/vecm/vecm.py
Outdated
deterministic : str | ||
See VECM's docstring for more information. | ||
seasons : int | ||
Number of seasons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be more explicit, maybe number of periods in a seasonal cycle.
(I don't remember how this is usually explained in seasonal_decompose (using freq) or SARIMAX.
statsmodels/tsa/vecm/vecm.py
Outdated
return np.column_stack(exogs) if exogs else None | ||
|
||
|
||
def _mat_sqrt(_2darray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_2darray
is not a pretty name
but it's just internal
statsmodels/tsa/vecm/vecm.py
Outdated
""" | ||
u_, s_, v_ = svd(_2darray, full_matrices=False) | ||
s_ = np.sqrt(s_) | ||
return chain_dot(u_, np.diag(s_), v_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just u_.dot(s_ * v_)
or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s_
is a 1D-array, so using *
won't be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s_ should broadcasts. I use this shortcut all the time with diagonal matrices. I just need to check each time whether it needs to be row or column s_ or s_[:, None]
statsmodels/tsa/vecm/vecm.py
Outdated
|
||
Parameters | ||
---------- | ||
endog_tot : array-like (nobs_tot x neqs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just endog
statsmodels/tsa/vecm/vecm.py
Outdated
endog_tot : array-like (nobs_tot x neqs) | ||
2-d endogenous response variable. | ||
exog: ndarray (nobs_tot x neqs) or None | ||
Deterministic terms outside the cointegration relation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add "User provided deterministic terms and weakly exogenous variables ..."
add comment about deterministic
, for example, "deterministic trends specified by the deterministic
keyword will be added to it by the model"
That's what it does, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also season dummies (?) will be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exog
is not affected by deterministic
and seasons
. But it is possible to get all the estimators for the deterministic terms through the det_coef
attribute of a VECMResults
object. The parts of det_coef
are also available via the const
, seasonal
, lin_trend
, and exog_coefs
attributes.
statsmodels/tsa/vecm/vecm.py
Outdated
References | ||
---------- | ||
.. [1] Lutkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a Notes
section with more explanation about the options.
e.g. the exog, deterministic and seasons and their relationship could be explained here.
statsmodels/tsa/vecm/vecm.py
Outdated
See :class:`statsmodels.tsa.base.tsa_model.TimeSeriesModel` for more | ||
information. | ||
freq : str, optional | ||
See :class:`statsmodels.tsa.base.tsa_model.TimeSeriesModel` for more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is freq
used for?
It should be the same as specifying season if it applies.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot freq
and missing
which were part of the scaffold I started from. After the last rebase I get warnings related to freq
. Shall I just use freq
instead of seasons
?
statsmodels/tsa/vecm/vecm.py
Outdated
See :class:`statsmodels.tsa.base.tsa_model.TimeSeriesModel` for more | ||
information. | ||
missing : str, optional | ||
See :class:`statsmodels.base.model.Model` for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really correctly handled for time series? Might need a warning or explanation, that dropping observations might not make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I just remove missing
?
@josef-pkt, I have seen your suggestions right after force pushing. I will go through them now. Thanks for your feedback! |
statsmodels/tsa/vector_ar/irf.py
Outdated
@@ -277,7 +287,7 @@ def errband_mc(self, orth=False, svar=False, repl=1000, | |||
signif=signif, seed=seed, | |||
burn=burn, cum=False) | |||
else: | |||
return model.irf_errband_mc(orth=orth, repl=repl, T=periods, | |||
return model.irf_errband_mc(orth=orth, repl=repl, T=periods, # TODO: irf_errband relies on self.intercept --> solve with exog-refactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long comments (if beyond 80 character line limit) should go into separate line
statsmodels/tsa/vector_ar/output.py
Outdated
null_hyp = 'H_0: %s do not Granger-cause %s' % (variables, equation) | ||
def causality_summary(results, variables, equation, kind, inst_caus=False): | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one line summary missing (doc rendering will have problems)
@yogabonito Do you have any notebooks with for example the test cases against jmulti? We don't commit the output in notebooks (to avoid increasing repo memory with rendered output and plots), so it's better to post draft versions somewhere else, e.g. in a gist or scratch/working repo on github. |
I will make a notebook after I have gone through your suggestions :) |
…ng todos The code was buggy because VAR() was called without specifying trend and exog so always the default values (i.e. a constant deterministic term) were used.
…improvement * documentation improvements: - Added classes from this PR to `tsa`'s documentation - Made sure LaTeX, enumerations, and a table in docstrings are rendered nicely by Sphinx. - Made links in the documentaion (e.g. from a method to its result-class) - small fixes * API-changes: - for lags in levels: `p` --> `k_ar` - for lags in difference: `diff_lags` --> `k_ar_diff` - added an underscore to a few methods to mark them as internal * coverage improvement: Achieved by adding `# pragma: no cover`-comment to lines which are only meant for debugging purposes.
- change signif to alpha instead of 1-alpha (e.g. 0.05 instead of 0.95) - fix bug in the `summary`-method of `CointRankResults` which caused an `IndexError` if `rank==neqs`.
@josef-pkt I have moved |
@yogabonito Thanks, I will have another quick look, but expect to merge it by tomorrow, assuming all is green. (Costa Brava is calling.:) |
All is green 😀 Have a nice holiday, Josef! |
@yogabonito I'm merging this as is, move and rebase look good. Given the lack of reviewer time, there are still some details left to review and change. Followup issue is #3775 |
Finding out whether two tests are testing the same thing probably isn't an indispensable feature so the |
It's great to see VECM merged! 😃 Thank you @josef-pkt and @bashtage for leading me through the GSoC - it was a great experience! |
Thanks for the great contribution. Of course, future contributions always welcome! |
@bashtage for now I will concentrate on my GSoC 2017 but it's definitely something I will consider! |
.. [1] Lutkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer. | ||
""" | ||
|
||
# todo: rewrite m such that a big (TxT) matrix is avoided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the nobs x nobs problem be avoided by separating out the left-multiplication by delta_x.T
?:
not_m = inv(delta_x.dot(delta_x.T)).dot(delta_x) # shape --> (k_ar_diff*neqs, nobs)
pre_r0 = delta_y_1_T.dot(delta_x.T) # shape --> (neqs, k_ar_diff*neqs)
pre_r1 = y_lag1.dot(delta_x.T) # shape --> (neqs, k_ar_diff*neqs)
r0 = delta_y_1_T - pre_r0.dot(not_m)
r1 = y_lag1 - pre_r1.dot(not_m)
Continuing the work from #3070.