Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new RandomVariable Op and optimizations #137

Merged

Conversation

brandonwillard
Copy link
Member

@brandonwillard brandonwillard commented Oct 30, 2020

This PR adds a more optimizable and robust Op for random variables aptly named RandomVariable.

  • Add RandomVariable Op and tests
    • Broadcastable multivariate normal, Dirichlet, categorical, and multinomial Ops (i.e. they support multiple, stacked independent parameter arguments—with size parameter support, as well)
  • Remove RandomFunction, its modules, and the functions that depend on it.
  • Create DimShuffle lift optimization for RandomVariables
    • E.g. normal(mean, std).T is replaced with normal(mean.T, std.T)
  • Create *Subtensor lift optimization for RandomVariables
    • E.g. normal(mean, std)[idx] is replaced with normal(mean[idx], std[idx])

(This is also a replacement for #131 that comes from my new fork)

@codecov
Copy link

codecov bot commented Oct 30, 2020

Codecov Report

Merging #137 (0505138) into master (ca215a2) will increase coverage by 0.24%.
The diff coverage is 98.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #137      +/-   ##
==========================================
+ Coverage   71.26%   71.51%   +0.24%     
==========================================
  Files         158      162       +4     
  Lines       54376    54653     +277     
==========================================
+ Hits        38752    39084     +332     
+ Misses      15624    15569      -55     
Impacted Files Coverage Δ
theano/gpuarray/rng_mrg.py 36.50% <ø> (ø)
theano/sandbox/rng_mrg.py 90.66% <50.00%> (ø)
theano/tensor/basic.py 89.59% <78.57%> (-0.12%) ⬇️
theano/gof/graph.py 91.14% <93.33%> (+0.39%) ⬆️
theano/tensor/random/opt.py 97.27% <97.27%> (ø)
theano/tensor/random/op.py 99.38% <99.38%> (ø)
theano/compile/profiling.py 78.93% <100.00%> (ø)
theano/tensor/random/basic.py 100.00% <100.00%> (ø)
theano/tensor/random/type.py 100.00% <100.00%> (ø)
theano/tensor/random/utils.py 100.00% <100.00%> (ø)
... and 10 more

@brandonwillard
Copy link
Member Author

While attempting to create a lift optimization for DimShuffles on RandomVariables I came across an issue involving numeric reproducibility.

The problem is neatly summarized by the following NumPy-only example:

>>> np.random.RandomState(123).normal(mean, std).T
array([[0.99989144, 3.99984937],
       [2.00009973, 4.99994214],
       [3.0000283 , 6.00016514]])

>>> np.random.RandomState(123).normal(mean.T, std.T)
array([[0.99989144, 4.00009973],
       [2.0000283 , 4.99984937],
       [2.99994214, 6.00016514]])

The first case is the numeric result one would obtain from a DimShuffled RandomVariable graph. The second is the lifted version of the same graph. Both result are theoretically equivalent and—ideally—should produce the same numeric result for the same RNG and seed. As we can see, they do not.

Here's an example of how it could be made to work:

>>> (mean + std * np.random.RandomState(123).standard_normal((2, 3))).T
array([[0.99989144, 3.99984937],
       [2.00009973, 4.99994214],
       [3.0000283 , 6.00016514]])

>>> mean.T + std.T * np.random.RandomState(123).standard_normal((2, 3)).T
array([[0.99989144, 3.99984937],
       [2.00009973, 4.99994214],
       [3.0000283 , 6.00016514]])

Simply put, by implementing the affine transform in RandomState.normal, we can add a transpose to the block of standard normals. This is apparently what we're missing when we use RandomState.normal.

Since I don't think we want to effectively reimplement all the samplers in NumPy's RandomState, we can either think of a good work around to preserve equality, or we can accept the fact that the two graphs will produce different results although they're theoretically equivalent.

@brandonwillard brandonwillard force-pushed the add-randomvariable-op branch 4 times, most recently from 9d168b9 to e877ff8 Compare November 9, 2020 00:56
@brandonwillard brandonwillard changed the title Add new RandomVariable Op and implementations Add new RandomVariable Op and optimizations Nov 9, 2020
@brandonwillard brandonwillard force-pushed the add-randomvariable-op branch 2 times, most recently from d9411fb to 7c36c55 Compare November 10, 2020 05:09
@brandonwillard brandonwillard force-pushed the add-randomvariable-op branch 4 times, most recently from 661824b to 4199a7d Compare November 24, 2020 04:05
@brandonwillard brandonwillard marked this pull request as draft November 24, 2020 04:31
@brandonwillard
Copy link
Member Author

A preliminary Subtensor lift optimization was just added; however, it needs one addition in order to work with multivariate distributions—and a lot more tests.

@brandonwillard
Copy link
Member Author

brandonwillard commented Nov 27, 2020

OK, it has occurred to me—in another context—that we should address the RNG consistency issue mentioned above if we want to apply these optimizations more often.

Problem

Imagine that we're creating a Theano graph for the NumPy operations that produce z in the following:

import numpy as np

seed = 34893
rng = np.random.RandomState(seed)

x = rng.normal(np.arange(2))

z = x - x[1]
>>>  z
array([-0.7960794,  0.       ])

Just as with NumPy, we would expect a Theano-PyMC graph for z to necessarily have a 0 for the element at index 1. This should also hold for any RNG state.

The naive local_subtensor_rv_lift rewrite rule would effectively substitute x[1] with np.random.RandomState(seed).normal(np.arange(2)[1]), which would only imply that the expectation of z[1] is 0. I.e.

rng = np.random.RandomState(seed)

x = rng.normal(np.arange(2))

rng_2 = np.random.RandomState(seed)
y = rng_2.normal(np.arange(2)[1])

z = x - y
>>>  z
array([-1.       , -0.2039206])

Unfortunately, that's not what the graph actually represents, so this rewrite is inconsistent.

As a simple way to avoid introducing this issue, we should not perform the rewrite if there's another reference to x in the graph; however, that would limit the applicability of the optimization. This restriction can be loosened a bit by allowing references to invariant properties (e.g. the shape of x) and not the values in x themselves.

Potential Solutions

RNG-based

We might be able to solve a larger number of cases using an RNG-based approach. Such an approach might also preserve numeric equality between graphs (i.e. equality of graphs pre-and-post rewrite, as described above), but it will require some additional Theano-PyMC functionality.

The idea is that we track the number of elements to skip, which might not be too difficult in most cases, especially since we're already computing all the requisite shape and index information for the rewrites themselves. In other words, the Theano-PyMC RNG objects would carry a set of state "jumps" that determine the evolution of the internal RNG state based on the indexing applied to it.

The most basic way of implementing this could use a seed-based approach (offsets from a seed, really). This would work with all RNGs and samplers, but I'm not sure if it could be efficiently extended to blocks/slices of indices. It seems like we would have to ensure that all values were drawn individually from a flattened version of the array. This isn't difficult to do, and it could be implemented in C/Cython to cut costs.

Alternatively, we could—and eventually should—add support for at least one of the two more flexible NumPy BitGenerators: PCG64 and/or Philox. These RNGs implement an .advance method that would allow us to manipulate the state in a manner that preserves consistency between shuffles and subsets of RandomVariable arrays.

Our simple example above can be fixed in this way:

x = drng.normal(np.arange(2))

drng = np.random.default_rng(seed)
# Move the state forward so that the next sample matches the second entry in
# `x`
drng.bit_generator.advance(1)
y = drng.normal(np.arange(2)[1])

z = x - y
>>>  z
array([-2.68521984,  0.        ])

Naturally, this .advance-based approach won't work for certain samplers (e.g. rejection-based ones), but it should work for more than a few of the samplers for basic random variables.

Unfortunately, this approach would end up sampling the same value multiple times throughout a graph if it's implemented without some form of caching.

Otherwise, these RNG-based approaches have a direct correspondence with the actual source of change between rewrites (i.e. the RNG state), which adds to their appeal. In other words, indexing is equivalent to shifting an abstract rng state: normal(mean, stddev, rng)[index] is converted to normal(mean[index], stddev[index], new_rng).

Graph-based

We could also attempt to synchronize slices of x throughout the graph by replacing the rewritten RandomVariables with stand-ins that are updated in-place. In effect, we would replace indexed random arrays with some type of sparse, lazy random arrays that operate like a sparse array would, except that when elements are indexed a value is generated and permanently saved for those index locations.

This is a nice solution because it would work for any RNG and sampling method. It would also avoid the RNG-based issue of producing duplicate samples, since it's effectively an extreme type of the caching needed to reduce duplicates in that approach.

Unfortunately, it would incur most of the same overhead that sparse arrays do, but some of that could be ameliorated by a simple, low-level C implementation—at least for certain key steps. It also doesn't address the simpler pre-and-post graph rewrite numerical consistency.

@brandonwillard brandonwillard force-pushed the add-randomvariable-op branch 3 times, most recently from 74f8b62 to a518bce Compare November 29, 2020 00:02
@brandonwillard
Copy link
Member Author

To keep things moving, we should probably disable the automatic use of these rewrites until a good RNG/rewrite-consistency solution is worked out. I'll create a separate issue for that.

@brandonwillard brandonwillard marked this pull request as ready for review December 5, 2020 23:34
@brandonwillard brandonwillard force-pushed the add-randomvariable-op branch 5 times, most recently from 52495db to b807eea Compare December 7, 2020 19:08
This optimization does *not* preserve equality between the numeric
results of the untransformed and transformed graphs when the RNGs and seeds are
equal.  The reason is that the underlying sampler methods themselves are not
implemented in Theano, so we cannot apply the requisite DimShuffle-like
operations to the intermediate samples used to generate multiple replications
and/or independent variates.

For example, sampling a normal of size (3, 2) requires a draw of size (3, 2)
from a standard normal and we can't transpose that (3, 2) array.  If we could,
then we would be able to maintain numerical equality between graphs.
@brandonwillard brandonwillard merged commit 50c60b6 into aesara-devs:master Dec 14, 2020
@brandonwillard brandonwillard deleted the add-randomvariable-op branch December 14, 2020 00:46
@brandonwillard brandonwillard linked an issue Dec 15, 2020 that may be closed by this pull request
@kyleabeauchamp
Copy link
Contributor

Is there a branch of pymc3 that is compatible with this change? I'm running into a ImportError: cannot import name 'MRG_RandomStreams' exception when I try to use master pymc3 against master theano-pymc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge RandomVariable Ops
2 participants