Rethinking_2 Chp_4: Different pymc3 output between notebook and local run #129

joannadiong · 2020-12-06T02:38:01Z

Hi there. I ran the following code sections in my IDE and obtained different output to the output in the Jupyter notebook.

To Reproduce
For example, I ran:

d = pd.read_csv("Data/Howell1.csv", sep=";", header=0)
d2 = d[d.age >= 18]

# Code 4.27
with pm.Model() as m4_1:
    mu = pm.Normal("mu", mu=178, sd=20)
    sigma = pm.Uniform("sigma", lower=0, upper=50)
    height = pm.Normal("height", mu=mu, sd=sigma, observed=d2.height)
with m4_1:
    trace_4_1 = pm.sample(1000, tune=1000)

# Code 4.29
az.summary(trace_4_1, round_to=2, kind="stats")

And obtained the following output, compared to the output in the notebook. The decimal place values differ between them:

# Output: notebook
	mean 	sd 	hpd_5.5% 	hpd_94.5%
mu 	154.62 	0.41 	153.96 	155.25
sigma 	7.77 	0.29 	7.30 	8.23

# Output: mine
         mean    sd  hdi_3%  hdi_97%
mu     154.60  0.41  153.84   155.38
sigma    7.76  0.31    7.18     8.32

There were also numeric differences in the decimal place values for the variance-covariance matrix, variances, and the correlation matrix:

# Code 4.32 output
# variance-covariance: notebook
               mu 	    sigma
mu     	 0.178473 	-0.007866
sigma 	-0.007866 	0.087795
# variance-covariance: mine
             mu     sigma
mu     0.171058 -0.002112
sigma -0.002112  0.093520

# Code 4.33 output
# variances: notebook
array([0.17847295, 0.08779492])
# variances: mine
array([0.17105813, 0.09351962])

# Code 4.33 output
# correlations: notebook
	          mu    	sigma
mu   	1.000000 	-0.062842
sigma 	-0.062842 	1.000000
# correlations: mine
             mu     sigma
mu     1.000000 -0.016702
sigma -0.016702  1.000000

The differences in numeric output don't seem to be due to rounding errors, so I'm not sure what might explain those differences. Would appreciate if you had any pointers.

I'm a Python user but new to Bayesian analysis and the pymc3 package. Thanks very much for porting all the Rethinking R code to Python. The explanations and examples are great!

Python:
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] on linux
pymc3 3.9.3, arviz 0.10.0

The text was updated successfully, but these errors were encountered:

aloctavodia · 2020-12-06T03:07:12Z

Hi @joannadiong thanks for getting in touch. I am very happy that these resources are useful to you.
Regarding your question, I think that differences are within the expected range. The method used by PyMC3 to compute the posterior (NUTS) is an stochastic method, thus each time you run it you will get slightly different results. You can fix the random seed used by pm.sample with the argument "random_seed". Additionally az.summary (without the kind="stats" argument) provides an stimate of the mc error, that is the error intruduced by the stochastic method. If you need to reduce the error you can increase the number of draws (1000 in your example) to a higher number. As rule of thumb I would say that setting draws=1000 or 2000 should be ok (specially if you are running at least two chains). I do not remember the chapter number, but in the book you will find one chapter with a discussion on how to diagnose the mcmc chains you be sure you are getting trustworthy results. Nevertheless if you need more info or have some more doubts doubts, please do not hesitate to ask questions here (or in pymc3 discourse).

joannadiong · 2020-12-06T19:52:16Z

Hi @aloctavodia, thanks for the detailed and helpful explanation. I can manage those fixes and will keep working through the book.

For where I'm at in the book now, there are minor fixes needed for the notebook:

# Code 4.18
ax.imshow(zi, origin="bottom") # "bottom" would need to be "lower"

And trace_to_dataframe will soon be removed, which will break here:

# Code 4.32
trace_df = pm.trace_to_dataframe(trace_4_1)

but I think there is an open issue for this.

I'm not familiar enough just yet with the packages and compiling the notebooks for pymc3 standards, but hopefully in time I'll learn enough to contribute more. Happy for you to close this issue if you feel the main things are covered. Many thanks again!

aloctavodia · 2020-12-08T10:59:48Z

Thanks for the report, this is all useful feedback. We should update the notebooks to directly work with ArviZ's InferenceData object.
Looking forward to more contributions from you. Let us know if you need help of have more doubts.

joannadiong changed the title ~~Rethinking_2 Chp_2: Different pymc3 output between notebook and local run~~ Rethinking_2 Chp_4: Different pymc3 output between notebook and local run Dec 6, 2020

aloctavodia closed this as completed Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethinking_2 Chp_4: Different pymc3 output between notebook and local run #129

Rethinking_2 Chp_4: Different pymc3 output between notebook and local run #129

joannadiong commented Dec 6, 2020

aloctavodia commented Dec 6, 2020

joannadiong commented Dec 6, 2020

aloctavodia commented Dec 8, 2020

Rethinking_2 Chp_4: Different pymc3 output between notebook and local run #129

Rethinking_2 Chp_4: Different pymc3 output between notebook and local run #129

Comments

joannadiong commented Dec 6, 2020

aloctavodia commented Dec 6, 2020

joannadiong commented Dec 6, 2020

aloctavodia commented Dec 8, 2020