Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] VECM #3246

Merged
merged 95 commits into from
Jun 28, 2017
Merged

[WIP] VECM #3246

merged 95 commits into from
Jun 28, 2017

Conversation

yogabonito
Copy link
Contributor

Continuing the work from #3070.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 87.698% when pulling d878c8f on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 87.716% when pulling b00be6d on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 88.134% when pulling 0b932de on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.5%) to 88.342% when pulling 06deff7 on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.5%) to 88.339% when pulling 77ce5da on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 88.285% when pulling 99b5ff9 on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 88.285% when pulling 6c504a0 on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.5%) to 88.338% when pulling f32ea35 on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@ChadFulton
Copy link
Member

Just wanted to let you know that I am going to make an effort to help review this.

@yogabonito
Copy link
Contributor Author

Thank you @ChadFulton! I had an exam this week and will now finish the integration of Josef's code (cointegration test) ASAP. IMO the PR will be ready to be merged once this is done.

@yogabonito
Copy link
Contributor Author

Travis is giving a FileNotFoundError while in AppVeyor all tests pass. @ChadFulton, may I ask you to have a look at it?

@yogabonito
Copy link
Contributor Author

@josef-pkt I think that - apart from one open question - the code is ready to be merged. This question is related to the coint_trend argument of the coint_johansen function. This argument is not used in your tests and I haven't seen it in the corresponding Matlab code. How shall this argument be treated?

@josef-pkt
Copy link
Member

AFAIR, I added it when a user asked for more flexible trend options. I only realized later that the options for the cointegration tests need to be limited to the cases for which we have the p-values or critical values for the test. The estimation of VECM can have more options, but the cointegration test can only allow the restricted set of options with p-values. We might be able to extend those a bit, but some combinations don't have a good (IIRC parameter or data independent) limiting distribution.

@josef-pkt
Copy link
Member

From what I can see in setup.py txt and csv files are only included in the distribution if the directory ends with results. I don't know why appveyor would pick them up anyway.

It looks like travis is using python setup.py install and appveyor uses python setup.py develop. So only travis is testing that the sdist includes all required files.

@josef-pkt
Copy link
Member

I don't know why travis doesn't find the csv file

One local check is to run python setup.py sdist and check in the created archive file if the csv/data files are included.

@yogabonito
Copy link
Contributor Author

Thanks for your help @josef-pkt! I really appreciate it!
The tar.gz-archive produced by python3 setup.py sdist contains the csv-file on my machine.

@coveralls
Copy link

Coverage Status

Coverage increased (+2.7%) to 90.561% when pulling 0c5c878 on yogabonito:gsoc-vecm into fc3584f on statsmodels:master.

@yogabonito
Copy link
Contributor Author

@josef-pkt, now the tests are passing in Travis. To me it looks as if the AppVeyor build is failing randomly (this time a Python 2 build job failed, whereas after my last commit it was the Python 3 build job).

I think that the PR is now ready to be merged. What are the next steps? Shall I do a rebase using a copy of this branch (to avoid messing up this PR)?

@josef-pkt
Copy link
Member

If the rebase goes well, then force pushing on this branch is fine.
However, given the large number of commits, I would create a new branch for the rebase in case the rebase or merge conflicts is messy. If there are no conflicts, then force pushing here should be safe, otherwise open a new PR.

(There are two names in the previously rebased commits, but it looks fine in the network. It looks like the initial commits where with a different user name, but that wouldn't cause problems.)

Copy link
Member

@josef-pkt josef-pkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the first part of vecm.py, up to beginning of VECM class, with looking mainly at the API and docstrings.

Some names are not very informative but the part I went through is mostly internal, so it's not urgent to come up with better names there.

seasons_centered=False, exog=None, exog_coint=None):
"""
Use the VECM's deterministic terms to construct an array that is suitable
to convey this information to VAR in form of the exog-argument for VAR's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstrings need to have a one line summary at the beginning (will be used in tables or lists of functions and methods in the generated documentation)

Parameters
----------
deterministic : str
See VECM's docstring for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should mention briefly what it is before pointing to the full VECM docstring

deterministic : str
See VECM's docstring for more information.
seasons : int
Number of seasons.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be more explicit, maybe number of periods in a seasonal cycle.
(I don't remember how this is usually explained in seasonal_decompose (using freq) or SARIMAX.

return np.column_stack(exogs) if exogs else None


def _mat_sqrt(_2darray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_2darray is not a pretty name
but it's just internal

"""
u_, s_, v_ = svd(_2darray, full_matrices=False)
s_ = np.sqrt(s_)
return chain_dot(u_, np.diag(s_), v_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just u_.dot(s_ * v_) or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_ is a 1D-array, so using * won't be possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_ should broadcasts. I use this shortcut all the time with diagonal matrices. I just need to check each time whether it needs to be row or column s_ or s_[:, None]


Parameters
----------
endog_tot : array-like (nobs_tot x neqs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just endog

endog_tot : array-like (nobs_tot x neqs)
2-d endogenous response variable.
exog: ndarray (nobs_tot x neqs) or None
Deterministic terms outside the cointegration relation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "User provided deterministic terms and weakly exogenous variables ..."
add comment about deterministic, for example, "deterministic trends specified by the deterministic keyword will be added to it by the model"
That's what it does, does it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also season dummies (?) will be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exog is not affected by deterministic and seasons. But it is possible to get all the estimators for the deterministic terms through the det_coef attribute of a VECMResults object. The parts of det_coef are also available via the const, seasonal, lin_trend, and exog_coefs attributes.

References
----------
.. [1] Lutkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a Notes section with more explanation about the options.
e.g. the exog, deterministic and seasons and their relationship could be explained here.

See :class:`statsmodels.tsa.base.tsa_model.TimeSeriesModel` for more
information.
freq : str, optional
See :class:`statsmodels.tsa.base.tsa_model.TimeSeriesModel` for more
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is freq used for?
It should be the same as specifying season if it applies.
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot freq and missing which were part of the scaffold I started from. After the last rebase I get warnings related to freq. Shall I just use freq instead of seasons?

See :class:`statsmodels.tsa.base.tsa_model.TimeSeriesModel` for more
information.
missing : str, optional
See :class:`statsmodels.base.model.Model` for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really correctly handled for time series? Might need a warning or explanation, that dropping observations might not make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I just remove missing?

@yogabonito
Copy link
Contributor Author

@josef-pkt, I have seen your suggestions right after force pushing. I will go through them now. Thanks for your feedback!
The two usernames stem from changing the git config during the development.

@@ -277,7 +287,7 @@ def errband_mc(self, orth=False, svar=False, repl=1000,
signif=signif, seed=seed,
burn=burn, cum=False)
else:
return model.irf_errband_mc(orth=orth, repl=repl, T=periods,
return model.irf_errband_mc(orth=orth, repl=repl, T=periods, # TODO: irf_errband relies on self.intercept --> solve with exog-refactor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long comments (if beyond 80 character line limit) should go into separate line

null_hyp = 'H_0: %s do not Granger-cause %s' % (variables, equation)
def causality_summary(results, variables, equation, kind, inst_caus=False):
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line summary missing (doc rendering will have problems)

@josef-pkt
Copy link
Member

@yogabonito Do you have any notebooks with for example the test cases against jmulti?
It would be good to add some to the documentation to illustrate the usage and features of VECM.

We don't commit the output in notebooks (to avoid increasing repo memory with rendered output and plots), so it's better to post draft versions somewhere else, e.g. in a gist or scratch/working repo on github.

@yogabonito
Copy link
Contributor Author

I will make a notebook after I have gone through your suggestions :)

…ng todos

The code was buggy because VAR() was called without specifying trend and exog so always the default values (i.e. a constant deterministic term) were used.
…improvement

 * documentation improvements:
   - Added classes from this PR to `tsa`'s documentation
   - Made sure LaTeX, enumerations, and a table in docstrings are rendered nicely
     by Sphinx.
   - Made links in the documentaion (e.g. from a method to its result-class)
   - small fixes

 * API-changes:
   - for lags in levels: `p` --> `k_ar`
   - for lags in difference: `diff_lags` --> `k_ar_diff`
   - added an underscore to a few methods to mark them as internal

* coverage improvement:
Achieved by adding `# pragma: no cover`-comment to lines which are only meant
for debugging purposes.
- change signif to alpha instead of 1-alpha (e.g. 0.05 instead of 0.95)
- fix bug in the `summary`-method of `CointRankResults` which caused an `IndexError` if `rank==neqs`.
@coveralls
Copy link

coveralls commented Jun 27, 2017

Coverage Status

Coverage increased (+0.3%) to 90.811% when pulling a0be32c on yogabonito:gsoc-vecm into 1c7f487 on statsmodels:master.

@yogabonito
Copy link
Contributor Author

@josef-pkt I have moved vecm to vector_ar. I have also rebased again today since there were a few new commits in upstream/master. Now this PR should be ready to get merged.

@josef-pkt
Copy link
Member

@yogabonito Thanks, I will have another quick look, but expect to merge it by tomorrow, assuming all is green. (Costa Brava is calling.:)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 90.804% when pulling bf2d2b0 on yogabonito:gsoc-vecm into 1c7f487 on statsmodels:master.

@coveralls
Copy link

coveralls commented Jun 27, 2017

Coverage Status

Coverage increased (+0.02%) to 90.56% when pulling bf2d2b0 on yogabonito:gsoc-vecm into 1c7f487 on statsmodels:master.

@yogabonito
Copy link
Contributor Author

All is green 😀 Have a nice holiday, Josef!

@josef-pkt
Copy link
Member

@yogabonito I'm merging this as is, move and rebase look good.
This is a BIG contribution, and thanks for the GSOC.

Given the lack of reviewer time, there are still some details left to review and change. Followup issue is #3775
One immediate question: Why does HypothesisTestResults and subclasses implement __eq__? What's the intention and purpose for this?

@josef-pkt josef-pkt merged commit 09bd0c5 into statsmodels:master Jun 28, 2017
@yogabonito yogabonito deleted the gsoc-vecm branch June 28, 2017 13:53
@yogabonito
Copy link
Contributor Author

Why does HypothesisTestResults and subclasses implement __eq__?

Finding out whether two tests are testing the same thing probably isn't an indispensable feature so the __eq__ method can safely be removed. I just felt like implementing it ;)

@yogabonito
Copy link
Contributor Author

It's great to see VECM merged! 😃 Thank you @josef-pkt and @bashtage for leading me through the GSoC - it was a great experience!

@bashtage
Copy link
Member

Thanks for the great contribution. Of course, future contributions always welcome!

@yogabonito
Copy link
Contributor Author

future contributions always welcome!

@bashtage for now I will concentrate on my GSoC 2017 but it's definitely something I will consider!

.. [1] Lutkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. Springer.
"""

# todo: rewrite m such that a big (TxT) matrix is avoided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the nobs x nobs problem be avoided by separating out the left-multiplication by delta_x.T?:

not_m = inv(delta_x.dot(delta_x.T)).dot(delta_x) # shape --> (k_ar_diff*neqs, nobs)
pre_r0 = delta_y_1_T.dot(delta_x.T)              # shape --> (neqs, k_ar_diff*neqs)
pre_r1 = y_lag1.dot(delta_x.T)                   # shape --> (neqs, k_ar_diff*neqs)
r0 = delta_y_1_T - pre_r0.dot(not_m)
r1 = y_lag1 - pre_r1.dot(not_m)

@josef-pkt josef-pkt mentioned this pull request Sep 29, 2017
@josef-pkt josef-pkt added this to the 0.9 milestone Apr 15, 2018
@josef-pkt josef-pkt mentioned this pull request Apr 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants