Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Likelihood Op Fails v3.8 and v3.9 #4057

Closed
tvwenger opened this issue Aug 17, 2020 · 6 comments
Closed

Custom Likelihood Op Fails v3.8 and v3.9 #4057

tvwenger opened this issue Aug 17, 2020 · 6 comments
Assignees
Labels

Comments

@tvwenger
Copy link
Contributor

Description of your problem

Implementing a custom likelihood theano.tensor.Op fails in v3.8 with chains > 1 and always in v3.9. See the following example:

import numpy as np
import pymc3 as pm
import theano.tensor as tt
import theano.tests.unittest_tools as utt

class Loglike(tt.Op):
    itypes = [tt.dvector]
    otypes = [tt.dscalar]

    def __init__(self, data):
        self.data = data
        self.loglike_grad = LoglikeGrad(self.data)

    def perform(self, node, inputs, outputs):
        theta, = inputs
        mu, sigma = theta
        logp = -np.log(np.sqrt(2.0*np.pi)*sigma)
        logp += -np.sum((self.data - mu)**2.0)/(2.0*sigma**2.0)
        outputs[0][0] = np.array(logp)

    def grad(self, inputs, grad_outputs):
        theta, = inputs
        grads = self.loglike_grad(theta)
        return [grad_outputs[0]*grads]

class LoglikeGrad(tt.Op):
    itypes = [tt.dvector]
    otypes = [tt.dvector]

    def __init__(self, data):
        self.data = data

    def perform(self, node, inputs, outputs):
        theta, = inputs
        mu, sigma = theta
        dmu = np.sum(self.data - mu)/sigma**2.0
        dsigma = np.sum((self.data - mu)**2.0)/sigma**3.0 - 1.0/sigma
        outputs[0][0] = np.array([dmu, dsigma])

def main():
    mu_true = 5.0
    sigma_true = 1.0
    data = np.random.normal(loc=mu_true, scale=sigma_true, size=1000)
    loglike = Loglike(data)
    utt.verify_grad(loglike, [np.array([3.0, 2.0])])
    # verify_grad passes with no errors
    with pm.Model() as model:
        mu = pm.Normal('mu', mu=4.0, sigma=2.0, testval=4.0)
        sigma = pm.HalfNormal('sigma', sigma=5.0, testval=5.0)
        theta = tt.as_tensor_variable([mu, sigma])
        like = pm.DensityDist('like', lambda v: loglike(v), observed={'v': theta})
    with model:
        trace = pm.sample(cores=2, chains=2)

if __name__ == "__main__":
    print("pymc3 version: ", pm.__version__)
    main()

This runs fine with pymc v3.8 as long as I run with only 1 chain. If I run on v3.8 with chains > 1, or with any number of chains on v3.9.3, I get the following error after sampling finishes:

MissingInputError: Input 0 of the graph (indices start from 0), used to compute Elemwise{exp,no_inplace}(sigma_log__), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

Here is the output on v3.8 with chains=1

pymc3 version:  3.8
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [sigma, mu]
Sampling chain 0, 0 divergences: 100%|████████████████████████████████████████████████████████████████████████| 1000/1000 [00:01<00:00, 869.02it/s]
Only one chain was sampled, this makes it impossible to run some convergence checks

Here is the full 3.8 traceback with chains = 2:

pymc3 version:  3.8
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, mu]
Sampling 2 chains, 0 divergences: 100%|███████████████████████████████████████████████████████████████████| 2000/2000 [00:00<00:00, 2108.88draws/s]
Traceback (most recent call last):
  File "test.py", line 57, in <module>
    main()
  File "test.py", line 53, in main
    trace = pm.sample(cores=2, chains=2)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py", line 498, in sample
    trace.report._run_convergence_checks(trace, model)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/pymc3/backends/report.py", line 84, in _run_convergence_checks
    self._ess = ess = ess(trace, var_names=varnames)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/pymc3/stats/__init__.py", line 24, in wrapped
    return func(*args, **kwargs)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/stats/diagnostics.py", line 191, in ess
    dataset = convert_to_dataset(data, group="posterior")
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/data/converters.py", line 177, in convert_to_dataset
    inference_data = convert_to_inference_data(obj, group=group, coords=coords, dims=dims)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/data/converters.py", line 91, in convert_to_inference_data
    return from_pymc3(trace=kwargs.pop(group), **kwargs)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/data/io_pymc3.py", line 531, in from_pymc3
    save_warmup=save_warmup,
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/data/io_pymc3.py", line 159, in __init__
    self.observations, self.multi_observations = self.find_observations()
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/data/io_pymc3.py", line 172, in find_observations
    multi_observations[key] = val.eval() if hasattr(val, "eval") else val
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/gof/graph.py", line 522, in eval
    self._fn_cache[inputs] = theano.function(inputs, self)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/function.py", line 317, in function
    output_keys=output_keys)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc
    output_keys=output_keys)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 1839, in orig_function
    name=name)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 1487, in __init__
    accept_inplace)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 181, in std_fgraph
    update_mapping=update_mapping)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/gof/fg.py", line 175, in __init__
    self.__import_r__(output, reason="init")
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/gof/fg.py", line 346, in __import_r__
    self.__import__(variable.owner, reason=reason)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/gof/fg.py", line 391, in __import__
    raise MissingInputError(error_msg, variable=r)
theano.gof.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute Elemwise{exp,no_inplace}(sigma_log__), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

And here is the full v3.9.3 traceback with chains=anything:

pymc3 version:  3.9.3
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [sigma, mu]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 2 seconds.00:01<00:00 Sampling 2 chains, 0 divergences]
Traceback (most recent call last):
  File "test.py", line 57, in <module>
    main()
  File "test.py", line 53, in main
    trace = pm.sample(cores=2, chains=2)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py", line 624, in sample
    idata = arviz.from_pymc3(trace, **ikwargs)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/data/io_pymc3.py", line 531, in from_pymc3
    save_warmup=save_warmup,
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/data/io_pymc3.py", line 159, in __init__
    self.observations, self.multi_observations = self.find_observations()
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/arviz/data/io_pymc3.py", line 172, in find_observations
    multi_observations[key] = val.eval() if hasattr(val, "eval") else val
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/gof/graph.py", line 522, in eval
    self._fn_cache[inputs] = theano.function(inputs, self)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/function.py", line 317, in function
    output_keys=output_keys)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc
    output_keys=output_keys)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 1839, in orig_function
    name=name)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 1487, in __init__
    accept_inplace)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 181, in std_fgraph
    update_mapping=update_mapping)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/gof/fg.py", line 175, in __init__
    self.__import_r__(output, reason="init")
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/gof/fg.py", line 346, in __import_r__
    self.__import__(variable.owner, reason=reason)
  File "/home/twenger/anaconda3/lib/python3.7/site-packages/theano/gof/fg.py", line 391, in __import__
    raise MissingInputError(error_msg, variable=r)
theano.gof.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute Elemwise{exp,no_inplace}(sigma_log__), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

Versions and main components

  • PyMC3 Version: 3.8 or 3.9.3
  • Theano Version: 1.0.5
  • Python Version: 3.7.6
  • Operating system: Ubuntu 20.04 on Windows Subsystem for Linux 2
  • How did you install PyMC3: pip
@tvwenger
Copy link
Contributor Author

This worked in v3.7, and I've just downgraded to v3.7 and confirmed it still works (with chains>1) in v3.7:

pymc3 version:  3.7
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, mu]
Sampling 4 chains: 100%|██████████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:00<00:00, 4000.38draws/s]

I've also tried it on Ubuntu 16.04 (i.e., not a WSL environment), and confirmed that the issues with v3.9.3 and v3.8 (chains > 1) exist there as well.

@junpenglao
Copy link
Member

Thanks for reporting, can confirm the bug. Seems it is relate to io with Arviz - @OriolAbril could you take a look?

@OriolAbril
Copy link
Member

This is related to #4002 and arviz-devs/arviz#1279. It is currently an incompatibility between ArviZ and PyMC3. And we have to decide how to best fix it from either side or from both at the same time. TL;DR from the other issue, the root of the problem is using the observed keyword of the density dist for sampled variables, theta has a different value for each sample which breaks ArviZ.

In the meantime I'd recommend using a Potential instead of a DensityDist, or going all the way to define a custom Distribution.

        sigma = pm.HalfNormal('sigma', sigma=5.0, testval=5.0)
        theta = tt.as_tensor_variable([mu, sigma])
        pm.Potential("like", loglike(theta))
    with model:
        trace = pm.sample(cores=2, chains=2)

@tvwenger
Copy link
Contributor Author

Excellent - thanks for the recommendation. I can confirm that the following works. Are there any downsides to using Potential instead of DensityDist?

import numpy as np
import pymc3 as pm
import theano.tensor as tt
import theano.tests.unittest_tools as utt

class Loglike(tt.Op):
    itypes = [tt.dvector]
    otypes = [tt.dscalar]

    def __init__(self, data):
        self.data = data
        self.loglike_grad = LoglikeGrad(self.data)

    def perform(self, node, inputs, outputs):
        theta, = inputs
        mu, sigma = theta
        logp = -len(self.data)*np.log(np.sqrt(2.0*np.pi)*sigma)
        logp += -np.sum((self.data - mu)**2.0)/(2.0*sigma**2.0)
        outputs[0][0] = np.array(logp)

    def grad(self, inputs, grad_outputs):
        theta, = inputs
        grads = self.loglike_grad(theta)
        return [grad_outputs[0]*grads]

class LoglikeGrad(tt.Op):
    itypes = [tt.dvector]
    otypes = [tt.dvector]

    def __init__(self, data):
        self.data = data

    def perform(self, node, inputs, outputs):
        theta, = inputs
        mu, sigma = theta
        dmu = np.sum(self.data - mu)/sigma**2.0
        dsigma = np.sum((self.data - mu)**2.0)/sigma**3.0 - len(self.data)/sigma
        outputs[0][0] = np.array([dmu, dsigma])

def main():
    mu_true = 5.0
    sigma_true = 1.0
    data = np.random.normal(loc=mu_true, scale=sigma_true, size=10000)
    loglike = Loglike(data)
    utt.verify_grad(loglike, [np.array([3.0, 2.0])])
    # verify_grad passes with no errors
    with pm.Model() as model:
        mu = pm.Normal('mu', mu=4.0, sigma=2.0, testval=4.0)
        sigma = pm.HalfNormal('sigma', sigma=5.0, testval=2.0)
        theta = tt.as_tensor_variable([mu, sigma])
        like = pm.Potential('like', loglike(theta))
    with model:
        trace = pm.sample()
        print(pm.summary(trace).to_string())

if __name__ == "__main__":
    print("pymc3 version: ", pm.__version__)
    main()

johannes-mueller added a commit to boschresearch/pylife that referenced this issue Oct 14, 2020
@tvwenger
Copy link
Contributor Author

Any updates on this?

@OriolAbril
Copy link
Member

ArviZ 0.11.0 added the density_dist_obs argument to az.from_pymc3. Set it to false to get the old behaviour of DensityDist using FreeRVs as values passed with the observed kwarg. If idata_kwargs argument is not present yet in pm.sample you should do the following:

with ...:
    trace = pm.sample(..., compute_convergence_checks=False)
    idata = az.from_pymc3(idata, density_dist_obs=False)
# use idata for plotting and stats like ess, summary... *independently* of using az.function or pm.function
# both use the same arviz function under the hood.

As this concerns old pymc3 versions, stick with 0.11.0. The .1 and .2 releases were necessary to fix some issues with the latest cmdstanpy release at the time and while fixing the issues with latest cmdstanpy and working seamlessly with latest pymc3 which is where we test, they break arviz compatibility with old pymc3 versions.

Things will be completely different in pymc3 4+, the converter to inferencedata will be part of the pymc3 codebase and DensityDist may no longer exist in the same form, therefore I am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants