Refactor AR distribution #5734

ricardoV94 · 2022-04-29T19:01:14Z

Adds random method
~~Adds arbitrary noise for the innovations~~ Not yet
Batched dimensions are now on the left (like other RVs)
More extensive tests for batching
Removes AR1 class
- Could not figure out why tau_e is specified as it is, and in general it lacked a good docstring and other kwargs. We can always reintroduce it, but maybe it's useful to first deprecate because of tau_e

Closes #5291
Supersedes #5388

Also considered relying on Aeppl, but the returned Scan based logp performs extremely slowly: https://gist.github.com/ricardoV94/2167ab7214affef47a86582141205bf5

codecov · 2022-04-29T19:21:31Z

Codecov Report

Merging #5734 (21c2e6c) into main (21c2e6c) will not change coverage.
The diff coverage is n/a.

❗ Current head 21c2e6c differs from pull request most recent head 45d0c08. Consider uploading reports for the commit 45d0c08 to get more accurate results

@@           Coverage Diff           @@
##             main    #5734   +/-   ##
=======================================
  Coverage   88.95%   88.95%           
=======================================
  Files          75       75           
  Lines       13751    13751           
=======================================
  Hits        12232    12232           
  Misses       1519     1519

pymc/distributions/timeseries.py

twiecki · 2022-05-03T07:58:09Z

This is pretty intense, I could have never written that. I suppose the complexity is somewhat warranted because the distribution has to handle multiple orders and batch dimensions so I'd expect there to be some shape complexity. But I suppose my hope was that with scan this could look much simpler.

Anyway, that's more of a general statement as to what it takes to create a new distribution (which I think @canyon289 also expressed), not a comment on this PR which I trust could not be simplified.

twiecki · 2022-05-03T07:59:12Z

Also considered relying on Aeppl, but the returned Scan based logp performs extremely slowly: https://gist.github.com/ricardoV94/2167ab7214affef47a86582141205bf5

Is there an issue about this? Also, curious if this would have made things much simpler?

ricardoV94 · 2022-05-03T08:02:19Z

This is pretty intense, I could have never written that. I suppose the complexity is somewhat warranted because the distribution has to handle multiple orders and batch dimensions so I'd expect there to be some shape complexity. But I suppose my hope was that with scan this could look much simpler.

Anyway, that's more of a general statement as to what it takes to create a new distribution (which I think @canyon289 also expressed), not a comment on this PR which I trust could not be simplified.

The reason this is complicated, is that we are not talking about a single distribution but more like a self-contained model-factory. The extra logic is there to 1) coordinate different distributions and 2) make life easier for users so that they don't need to create in advance the building blocks with the right shapes.

Once we have it in place it is also easy to extend. For instance, it is now pretty straightforward to allow for arbitrary noise distributions instead of only Normal, by letting users to pass a noise_dist like they can do with init_dist, and plug it in the appropriate places.

ricardoV94 · 2022-05-03T08:04:11Z

Also considered relying on Aeppl, but the returned Scan based logp performs extremely slowly: https://gist.github.com/ricardoV94/2167ab7214affef47a86582141205bf5
Is there an issue about this? Also, curious if this would have made things much simpler?

Not really, it would only save us having to write the logp method ourselves. We still need the logic to generate the random method and do automatic resizing, which is where the new complexity comes from. This was completely missing in V3!

For the logp performance question, there is a somewhat related issue here: aesara-devs/aesara#617

ricardoV94 · 2022-05-03T10:33:47Z

pymc/distributions/timeseries.py

+        )
+
+        ar = ar_op(rhos, sigma, init_dist, steps)
+        noise_rng.default_update = ar.owner.outputs[0]


This will be changed in #5738, or here, depending on what gets merged first

canyon289 · 2022-05-03T12:55:35Z

Once we have it in place it is also easy to extend. For instance, it is now pretty straightforward to allow for arbitrary noise distributions instead of only Normal, by letting users to pass a noise_dist like they can do with init_dist, and plug it in the appropriate places.

In the following comment I mean this with complete respect, its easy for you to extend Ricardo because you're the man genius that knows whats going on. What would help new contributors a lot are comments, type hints in everything, including tests, and even for lines of code that look trivial.

I didnt want to bug you too much but if the above sounds good I could indicate everywhere that I personally could use a comment to understand whats going on, would that be helpful?

ricardoV94 · 2022-05-03T13:06:05Z

I didnt want to bug you too much but if the above sounds good I could indicate everywhere that I personally could use a comment to understand whats going on, would that be helpful?

Perhaps take a look first at how Mixtures are implemented, it's basically the same process as in here. Seeing the commonalities might help you understand what is going on: https://github.com/pymc-devs/pymc/blob/main/pymc/distributions/mixture.py

canyon289 · 2022-05-03T13:17:25Z

I didnt want to bug you too much but if the above sounds good I could indicate everywhere that I personally could use a comment to understand whats going on, would that be helpful?

Perhaps take a look first at how Mixtures are implemented, it's basically the same process as in here. Seeing the commonalities might help you understand what is going on: https://github.com/pymc-devs/pymc/blob/main/pymc/distributions/mixture.py

I certainly will ! Be prepared that I might start requesting docstrings there as well!

Here's where I'm coming from
Nicoleta will (likely) join us for GSOC soon. When she pulls up this code how will she know to look to mixtures to understand whats going on from the code here? And then there'lll be a dev after that and another dev.

PyMC is an awesome project and its going to be around for years so in addition to building great functionality now, itll be good to ensure

I'm not saying you havent done this btw, I havent done a review yet as Ive been awake for 20 mins, just responding to the comments

canyon289 · 2022-05-03T13:25:57Z

@ricardoV94 is it fine if I review on Saturday my time? I really want to do a thorough job, both for my learning and so I can support the Time Series GSOC project

Thats my side of things , let me know what you "i want this merged by X date" timeline is and I'll work around that :)

ricardoV94 · 2022-05-03T13:35:58Z

Sure, I am in no rush

ricardoV94 · 2022-05-03T14:02:16Z

Nicoleta will (likely) join us for GSOC soon. When she pulls up this code how will she know to look to mixtures to understand what's going on from the code here? And then there will be a dev after that and another dev.

They will have to read the code and check documentation of the functions used, and if that's not enough set up a debugger and start breaking the code.

Taking to the extreme we can't include a whole Aesara tutorial as comments for future devs. If the same functionality can be achieved by simpler more maintainable means that's a different question.

canyon289 · 2022-05-03T15:33:34Z

Taking to the extreme we can't include a whole Aesara tutorial as comments for future devs. If the same functionality can be achieved by simpler more maintainable means that's a different question.

Agree we cant take this to the extreme, just exploring if there is an optimum between the two

They will have to read the code and check documentation of the functions used, and if that's not enough set up a debugger and start breaking the code.

Normally id agree but realize this also is more difficult than normal code due to the way aesara and pymc is integrated. The code isn't "run directly" but sort of sucked into Aesara and recompiled and goes through a number of transformational steps that make it difficult to trace whats exactly going on.

Anyhow this is stuff thats outside of the scope of this PR, overall really excited to keep building out great functionality, and youre always quite kind and generous in mentoring others ❤️

Looking forward to reviewing this and asking a bunch of questions!

pymc/distributions/timeseries.py

canyon289 · 2022-05-04T14:08:34Z

pymc/distributions/timeseries.py

+                    "of rho or remove constant_term"
+                )
+
+        if init_dist is not None:


Is this something we should make a utility function for all time series eventually? No need on this PR just asking

Yeah something like check_valid_dist(dist, allowed_ndim_supp=(x, y)), as this is used in all the distributions that take other distributions as inputs. Want to open an issue/PR for that?

pymc/distributions/timeseries.py

canyon289 · 2022-05-04T14:15:16Z

pymc/distributions/timeseries.py

-            )
-            eps = value[self.p :] - self.rho[0] - x
+    @classmethod
+    def ndim_supp(cls, *args):


Please add docstring to explain this as well. I think I get it, though a short docstring would reduce friction for those who dont know or in case I forgot :)

Added a docstring suggestion

canyon289 · 2022-05-04T14:16:26Z

pymc/distributions/timeseries.py

-        innov_like = Normal.dist(mu=0.0, tau=self.tau).logp(eps)
-        init_like = self.init.logp(value[: self.p])
+    @classmethod
+    def change_size(cls, rv, new_size, expand=False):


Add docstring, especially for expand

canyon289 · 2022-05-04T14:17:16Z

pymc/distributions/timeseries.py

+
+@_get_measurable_outputs.register(AutoRegressiveRV)
+def _get_measurable_outputs_ar(op, node):
+    # This tells Aeppl that the second output is the measurable one


Suggested change

# This tells Aeppl that the second output is the measurable one

"""This tells Aeppl that the second output is the measurable one."""

I don't think this should be a docstring... but I can't quite put it to words why

pymc/distributions/timeseries.py

canyon289 · 2022-05-04T14:18:13Z

pymc/distributions/timeseries.py

+
+@_moment.register(AutoRegressiveRV)
+def ar_moment(op, rv, rhos, sigma, init_dist, steps, noise_rng):
+    # Use last entry of init_dist moment as the moment for the whole AR


Suggested change

# Use last entry of init_dist moment as the moment for the whole AR

"""Use last entry of init_dist moment as the moment for the whole AR."""

canyon289 · 2022-05-04T14:19:17Z

pymc/tests/test_distributions_timeseries.py

+        np.testing.assert_allclose(ar_like, reg_like)
+
+    @pytest.mark.parametrize("constant", (False, True))
+    def test_batched_size(self, constant):


Add docstring, its bordering obvious but its not obvious enough that I think a newcomer would get it

What do you think a useful docstring would be?

canyon289 · 2022-05-04T14:19:27Z

pymc/tests/test_distributions_timeseries.py

+        assert y_eval[0].shape == (batch_size, steps)
+        assert not np.any(np.isclose(y_eval[0], y_eval[1]))
+
+    def test_batched_rhos(self):


Same comment as above

canyon289 · 2022-05-04T14:19:53Z

pymc/tests/test_distributions_timeseries.py

+        assert y_eval.shape == (batch_size, steps)
+        assert np.all(abs(y_eval[1]) < 5)
+
+    def test_batched_sigma(self):


Same for all batched tests

canyon289

This is awesome. Most comments are essentially add more docstrings on internal functions to aid future developers, or forgetful developers.

canyon289 · 2022-05-04T14:29:09Z

pymc/distributions/censored.py

+        return 1
+
+    @classmethod
+    def ndim_supp(cls, *dist_params):


Suggested change

def ndim_supp(cls, *dist_params):

def ndim_supp(cls, *dist_params):

"""Refer to base class SymbolicDistribution for documentation."""

canyon289 · 2022-05-04T14:29:21Z

pymc/distributions/censored.py

@@ -76,6 +76,14 @@ def dist(cls, dist, lower, upper, **kwargs):
        check_dist_not_registered(dist)
        return super().dist([dist, lower, upper], **kwargs)

+    @classmethod
+    def num_rngs(cls, *args, **kwargs):


Suggested change

def num_rngs(cls, *args, **kwargs):

def num_rngs(cls, *args, **kwargs):

"""Refer to base class SymbolicDistribution for documentation."""

pymc/distributions/distribution.py

canyon289 · 2022-05-04T14:37:45Z

Thanks Ricardo and nicely done. Definitely ready for a merge and if theres anything I still have questions, or think we need docstrings which Im clearly a fan of, I can do it outside of this PR

…of RNGs needed Reordered methods for consistency

* Deprecates AR1 distribution * Implements random and moment methods * Batching works on the left as with other distributions

canyon289 · 2022-05-04T15:37:11Z

@ricardoV94 I saw your comments but wont be able to give you proper replies for a couple of days. There's nothing that should hold up a merge. Please feel free to do so, but keep your branch around so I can come back to this PR later and refer to the comments to make proposals in separate issues or PRs 🎉

ricardoV94 · 2022-05-05T08:02:53Z

Thanks for reviewing @twiecki and @canyon289

ricardoV94 changed the title ~~Refactor AR distributions~~ Refactor AR distribution Apr 29, 2022

ricardoV94 force-pushed the AR1 branch 4 times, most recently from f4fc1b6 to 5d8b2ca Compare May 2, 2022 16:26

ricardoV94 marked this pull request as ready for review May 2, 2022 18:06

ricardoV94 requested review from twiecki and canyon289 May 2, 2022 18:07

twiecki reviewed May 3, 2022

View reviewed changes

pymc/distributions/timeseries.py Outdated Show resolved Hide resolved

twiecki reviewed May 3, 2022

View reviewed changes

pymc/distributions/timeseries.py Show resolved Hide resolved

twiecki reviewed May 3, 2022

View reviewed changes

pymc/distributions/timeseries.py Show resolved Hide resolved

ricardoV94 force-pushed the AR1 branch 4 times, most recently from 19635f7 to 7214bca Compare May 3, 2022 10:24

ricardoV94 commented May 3, 2022

View reviewed changes

ricardoV94 force-pushed the AR1 branch from 7214bca to 7fb7887 Compare May 4, 2022 08:03

twiecki approved these changes May 4, 2022

View reviewed changes

canyon289 reviewed May 4, 2022

View reviewed changes

pymc/distributions/timeseries.py Show resolved Hide resolved

canyon289 reviewed May 4, 2022

View reviewed changes

pymc/distributions/timeseries.py Show resolved Hide resolved

canyon289 reviewed May 4, 2022

View reviewed changes

pymc/distributions/timeseries.py Show resolved Hide resolved

canyon289 reviewed May 4, 2022

View reviewed changes

pymc/distributions/timeseries.py Outdated Show resolved Hide resolved

canyon289 reviewed May 4, 2022

View reviewed changes

pymc/distributions/timeseries.py Show resolved Hide resolved

canyon289 reviewed May 4, 2022

View reviewed changes

pymc/distributions/timeseries.py Show resolved Hide resolved

canyon289 reviewed May 4, 2022

View reviewed changes

canyon289 approved these changes May 4, 2022

View reviewed changes

canyon289 reviewed May 4, 2022

View reviewed changes

pymc/distributions/distribution.py Outdated Show resolved Hide resolved

canyon289 reviewed May 4, 2022

View reviewed changes

pymc/distributions/distribution.py Outdated Show resolved Hide resolved

Do not create temporary SymbolicDistribution just to retrieve number …

343869a

…of RNGs needed Reordered methods for consistency

ricardoV94 force-pushed the AR1 branch from 090997f to d6fd98d Compare May 4, 2022 15:17

ricardoV94 added 2 commits May 4, 2022 17:22

Move SymbolicDistribution docstring to body of class

9a436b7

Refactor AR distribution

45d0c08

* Deprecates AR1 distribution * Implements random and moment methods * Batching works on the left as with other distributions

ricardoV94 force-pushed the AR1 branch from d6fd98d to 45d0c08 Compare May 4, 2022 15:23

ricardoV94 merged commit c76b9b9 into pymc-devs:main May 5, 2022

ricardoV94 mentioned this pull request May 5, 2022

Enabling vector autoregressions #4665

Open

ricardoV94 deleted the AR1 branch August 31, 2022 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor AR distribution #5734

Refactor AR distribution #5734

ricardoV94 commented Apr 29, 2022 •

edited

Loading

codecov bot commented Apr 29, 2022 •

edited

Loading

twiecki commented May 3, 2022

twiecki commented May 3, 2022

ricardoV94 commented May 3, 2022 •

edited

Loading

ricardoV94 commented May 3, 2022 •

edited

Loading

ricardoV94 May 3, 2022

canyon289 commented May 3, 2022

ricardoV94 commented May 3, 2022

canyon289 commented May 3, 2022

canyon289 commented May 3, 2022

ricardoV94 commented May 3, 2022

ricardoV94 commented May 3, 2022

canyon289 commented May 3, 2022 •

edited

Loading

canyon289 May 4, 2022

ricardoV94 May 4, 2022 •

edited

Loading

canyon289 May 4, 2022

ricardoV94 May 4, 2022

canyon289 May 4, 2022

canyon289 May 4, 2022

canyon289 May 4, 2022

ricardoV94 May 4, 2022

canyon289 May 4, 2022

ricardoV94 May 4, 2022

canyon289 May 4, 2022

ricardoV94 May 4, 2022

canyon289 May 4, 2022

canyon289 May 4, 2022

canyon289 left a comment

canyon289 May 4, 2022 •

edited

Loading

canyon289 May 4, 2022

canyon289 commented May 4, 2022

canyon289 commented May 4, 2022

ricardoV94 commented May 5, 2022

	# This tells Aeppl that the second output is the measurable one
	"""This tells Aeppl that the second output is the measurable one."""

	# Use last entry of init_dist moment as the moment for the whole AR
	"""Use last entry of init_dist moment as the moment for the whole AR."""

	def ndim_supp(cls, *dist_params):
	def ndim_supp(cls, *dist_params):
	"""Refer to base class SymbolicDistribution for documentation."""

	def num_rngs(cls, args, *kwargs):
	def num_rngs(cls, args, *kwargs):
	"""Refer to base class SymbolicDistribution for documentation."""

Refactor AR distribution #5734

Refactor AR distribution #5734

Conversation

ricardoV94 commented Apr 29, 2022 • edited Loading

codecov bot commented Apr 29, 2022 • edited Loading

Codecov Report

twiecki commented May 3, 2022

twiecki commented May 3, 2022

ricardoV94 commented May 3, 2022 • edited Loading

ricardoV94 commented May 3, 2022 • edited Loading

Choose a reason for hiding this comment

canyon289 commented May 3, 2022

ricardoV94 commented May 3, 2022

canyon289 commented May 3, 2022

canyon289 commented May 3, 2022

ricardoV94 commented May 3, 2022

ricardoV94 commented May 3, 2022

canyon289 commented May 3, 2022 • edited Loading

Choose a reason for hiding this comment

ricardoV94 May 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

canyon289 left a comment

Choose a reason for hiding this comment

canyon289 May 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

canyon289 commented May 4, 2022

canyon289 commented May 4, 2022

ricardoV94 commented May 5, 2022

ricardoV94 commented Apr 29, 2022 •

edited

Loading

codecov bot commented Apr 29, 2022 •

edited

Loading

ricardoV94 commented May 3, 2022 •

edited

Loading

ricardoV94 commented May 3, 2022 •

edited

Loading

canyon289 commented May 3, 2022 •

edited

Loading

ricardoV94 May 4, 2022 •

edited

Loading

canyon289 May 4, 2022 •

edited

Loading