Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port timeseries distributions to V4 #4642

Closed
twiecki opened this issue Apr 14, 2021 · 24 comments · Fixed by #6227
Closed

Port timeseries distributions to V4 #4642

twiecki opened this issue Apr 14, 2021 · 24 comments · Fixed by #6227
Assignees
Labels
Milestone

Comments

@twiecki
Copy link
Member

twiecki commented Apr 14, 2021

import pymc3 as pm
import numpy as np

returns = pd.read_csv(pm.get_data("SP500.csv"), index_col="Date")
returns["change"] = np.log(returns["Close"]).diff()
returns = returns.dropna().astype('float32')
returns.head()

with pm.Model() as model_pymc3:
    step_size = pm.Exponential("step_size", 10.)
    volatility = pm.GaussianRandomWalk("volatility", sigma=step_size, 
                                       size=returns.shape[0], 
                                       init=pm.Normal.dist(0, step_size))
    nu = pm.Exponential("nu", 0.1)
    obs = pm.StudentT(
        "obs", nu=nu, sigma=np.exp(volatility), observed=returns["change"]
    )

yields:

TypeError                                 Traceback (most recent call last)
<ipython-input-11-3f1576c8cf84> in <module>
      1 with pm.Model(check_bounds=False) as model_pymc3:
      2     step_size = pm.Exponential("step_size", 10.)
----> 3     volatility = pm.GaussianRandomWalk("volatility", sigma=step_size, 
      4                                        size=returns.shape[0],
      5                                        init=pm.Normal.dist(0, step_size))

~/projects/pymc/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
    161         transform = kwargs.pop("transform", UNSET)
    162 
--> 163         rv_out = cls.dist(*args, rng=rng, **kwargs)
    164 
    165         return model.register_rv(rv_out, name, data, total_size, dims=dims, transform=transform)

TypeError: dist() missing 1 required positional argument: 'dist_params'
@twiecki twiecki changed the title TypeError dist_params when using Normal.dist() TypeError dist_params when using pm.GaussianRandomWalk Apr 14, 2021
@twiecki
Copy link
Member Author

twiecki commented Apr 14, 2021

Seems related to pm.GaussianRandomWalk because if I comment out the init it leads to the same error.

@ricardoV94
Copy link
Member

Is this related to check_bounds and/or V4?

@twiecki
Copy link
Member Author

twiecki commented Apr 14, 2021

This is on V4, worked fine on V3.

@ricardoV94
Copy link
Member

I could be wrong but I don't think the timeseries were refactored yet

@twiecki
Copy link
Member Author

twiecki commented Apr 14, 2021

Unrelated to check_bounds. I updated the code to make it runnable.

@twiecki
Copy link
Member Author

twiecki commented Apr 14, 2021

I see.

@twiecki twiecki changed the title TypeError dist_params when using pm.GaussianRandomWalk Port timeseries distributions to V4 Apr 14, 2021
@michaelosthege michaelosthege added this to the vNext (4.0.0) milestone Jun 5, 2021
@ricardoV94 ricardoV94 removed the bug label Jun 28, 2021
@fonnesbeck fonnesbeck self-assigned this Dec 14, 2021
@twiecki
Copy link
Member Author

twiecki commented Dec 28, 2021

This is now possible, and much simpler, using the aesara.scan operator. For example, a GRW looks like this:

with pm4.Model():
    random_walk, _ = aesara.scan(
        fn=lambda Y_tm1: Y_tm1 + pm4.Normal.dist(sigma=.1),
        outputs_info=[{"initial": at.as_tensor([0.0])}],
        n_steps=100,
        name="Y"
    )

@fonnesbeck is this still something you want to take on? Definitely we could have a coordinated effort to rewrite the existing ones one-by-one.

@twiecki
Copy link
Member Author

twiecki commented Dec 28, 2021

Actually there is still some work required to make the above work with PyMC.

@canyon289
Copy link
Member

canyon289 commented Dec 28, 2021

@mjhajharia and i would like to be involved in time series porting efforts. Admittedly I've been slow on this but very much would like to learn (and not get in the way too much) if someone else takes the lead on this.

Also big thanks to @ricardoV94 whos already spent many hours talking this through with us

@mjhajharia
Copy link
Member

+1 at what @canyon289 said, we talked to ricardo and planned on first porting random walk which can probably be done with a cumsum and then AR, followed by ARMA a lot of which is already there in the ARMA branch. I've set some time aside for this and will get started next week (as holidays get over soon), i'll keep updating things here in this project

@mjhajharia
Copy link
Member

@ricardoV94 already made a minimal example to explain aeppl and scan which might be useful for others seeing this issue

@sokol11
Copy link

sokol11 commented Dec 30, 2021

Just encountered this with EulerMaruyama too. I guess most (if not all) ts distributions are affected.

@NowanIlfideme
Copy link

Sorry if this question should've been researched more thoroughly from my side, but... is there anything in PyMC v4 that will make life simpler for folks to want to do out-of-sample predictions on time series?

I've always found the shape changes to be problematic, even with shared variables (pm.Data), because the Timeseries distribution traces essentially save the input series. 🙁

@ricardoV94 ricardoV94 removed this from the v4.0.0b3 milestone Feb 3, 2022
@ricardoV94 ricardoV94 added this to the v4.0.0 milestone Feb 3, 2022
@ricardoV94 ricardoV94 modified the milestones: v4.0.0, v4.1.0 Apr 1, 2022
@AlexAndorra
Copy link
Contributor

Do we have a list of time series distributions already ported to v4 (I'm updating the release notes)

@twiecki
Copy link
Member Author

twiecki commented Jun 1, 2022

GaussianRandomWalk and AR I believe. Missing are GARCH11 and some other weird one I forgot the name of.

@flo-schu
Copy link

I've been working a lot recently with pymc3, just transitioned to pymc 4.0.1 and am very happy with the new version of GaussianRandomWalk. It can handle mutliple dimensions better then previously. Thanks for that :)

I figured out by trial and error and very helpful error messages that the time dimension always has to be specified last. This wasn't clear to me from the documentation, so perhaps I didn't know where to look? But in case it is not mentioned, I think it would be a great hint for inexperienced users like myself.

Are there any plans to transfer MVRandomWalk soon? I would be interested in contributing to that, but don't know where to start.

@canyon289
Copy link
Member

canyon289 commented Jun 23, 2022

hey @flo-schu, my apologies but for this ticket is claimed by a GSOC proect. I should have updated it to reflect that. We would love for you to contribute elsewhere in the codebase though, would it help If I suggested something?

Assigning to myself since @nicospinu handle is not showing up for some reason
cc @junpenglao

@canyon289 canyon289 assigned canyon289 and unassigned fonnesbeck Jun 23, 2022
@flo-schu
Copy link

hey @flo-schu, my apologies but for this ticket is claimed by a GSOC proect. I should have updated it to reflect that. We would love for you to contribute elsewhere in the codebase though, would it help If I suggested something?

No problem. I'm happy to hear that the issue is being worked on. I'd be happy if you could point out some beginner-friendly opportunities for contribution

@sokol11
Copy link

sokol11 commented Jun 23, 2022

I've been working a lot recently with pymc3, just transitioned to pymc 4.0.1 and am very happy with the new version of GaussianRandomWalk. It can handle mutliple dimensions better then previously. Thanks for that :)

I figured out by trial and error and very helpful error messages that the time dimension always has to be specified last. This wasn't clear to me from the documentation, so perhaps I didn't know where to look? But in case it is not mentioned, I think it would be a great hint for inexperienced users like myself.

Are there any plans to transfer MVRandomWalk soon? I would be interested in contributing to that, but don't know where to start.

FWIW, I encountered the same issue in numpyro, which also gives unexpected results with numpyro.plate if the time dimension is not rightmost. I imagine it just trickles up to pymc4. It is also not obvious from numpyro documentation, I believe. A workaround I use in numpyro is to reshape before and after.

@michaelosthege michaelosthege modified the milestones: v4.1.0, v4.2.0 Jul 2, 2022
@ricardoV94
Copy link
Member

With #6072 we should be able to quickly refactor the multivariate random walks. After that I would suggest closing this issue and opening one mentioning the specific distributions that are still missing.

@maedoc
Copy link

maedoc commented Oct 20, 2022

I wrote the EulerMaruyama implementation and interested to update it (and include some improvements for non-centered form & a 2nd order scheme as well). It's not clear to me why it needs porting though, since it's just some tensor expressions (at least for the centered EM method)?

@junpenglao
Copy link
Member

The aim of porting is mostly so that the distribution is compatible under v4 (currently it is broken as we change the initialization signature and dispatching). Functionally, porting to v4 adds better support for forward sampling (otherwise in the previous implementation only pm.sample works but not prior/posterior predictive sampling

@maedoc
Copy link

maedoc commented Oct 21, 2022

@junpenglao thanks for the explanation. Indeed the port is much more complicated than I expected, so thanks for your work on that.

@junpenglao
Copy link
Member

@maedoc You are welcome! I am looking forward to the improvements for non-centered form & 2nd order scheme you mentioned ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.