Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking_2 Chp_4: Different pymc3 output between notebook and local run #129

Closed
joannadiong opened this issue Dec 6, 2020 · 3 comments

Comments

@joannadiong
Copy link
Contributor

Hi there. I ran the following code sections in my IDE and obtained different output to the output in the Jupyter notebook.

To Reproduce
For example, I ran:

d = pd.read_csv("Data/Howell1.csv", sep=";", header=0)
d2 = d[d.age >= 18]

# Code 4.27
with pm.Model() as m4_1:
    mu = pm.Normal("mu", mu=178, sd=20)
    sigma = pm.Uniform("sigma", lower=0, upper=50)
    height = pm.Normal("height", mu=mu, sd=sigma, observed=d2.height)
with m4_1:
    trace_4_1 = pm.sample(1000, tune=1000)

# Code 4.29
az.summary(trace_4_1, round_to=2, kind="stats")

And obtained the following output, compared to the output in the notebook. The decimal place values differ between them:

# Output: notebook
	mean 	sd 	hpd_5.5% 	hpd_94.5%
mu 	154.62 	0.41 	153.96 	155.25
sigma 	7.77 	0.29 	7.30 	8.23

# Output: mine
         mean    sd  hdi_3%  hdi_97%
mu     154.60  0.41  153.84   155.38
sigma    7.76  0.31    7.18     8.32

There were also numeric differences in the decimal place values for the variance-covariance matrix, variances, and the correlation matrix:

# Code 4.32 output
# variance-covariance: notebook
               mu 	    sigma
mu     	 0.178473 	-0.007866
sigma 	-0.007866 	0.087795
# variance-covariance: mine
             mu     sigma
mu     0.171058 -0.002112
sigma -0.002112  0.093520

# Code 4.33 output
# variances: notebook
array([0.17847295, 0.08779492])
# variances: mine
array([0.17105813, 0.09351962])

# Code 4.33 output
# correlations: notebook
	          mu    	sigma
mu   	1.000000 	-0.062842
sigma 	-0.062842 	1.000000
# correlations: mine
             mu     sigma
mu     1.000000 -0.016702
sigma -0.016702  1.000000

The differences in numeric output don't seem to be due to rounding errors, so I'm not sure what might explain those differences. Would appreciate if you had any pointers.

I'm a Python user but new to Bayesian analysis and the pymc3 package. Thanks very much for porting all the Rethinking R code to Python. The explanations and examples are great!

Python:
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] on linux
pymc3 3.9.3, arviz 0.10.0

@aloctavodia
Copy link
Member

Hi @joannadiong thanks for getting in touch. I am very happy that these resources are useful to you.
Regarding your question, I think that differences are within the expected range. The method used by PyMC3 to compute the posterior (NUTS) is an stochastic method, thus each time you run it you will get slightly different results. You can fix the random seed used by pm.sample with the argument "random_seed". Additionally az.summary (without the kind="stats" argument) provides an stimate of the mc error, that is the error intruduced by the stochastic method. If you need to reduce the error you can increase the number of draws (1000 in your example) to a higher number. As rule of thumb I would say that setting draws=1000 or 2000 should be ok (specially if you are running at least two chains). I do not remember the chapter number, but in the book you will find one chapter with a discussion on how to diagnose the mcmc chains you be sure you are getting trustworthy results. Nevertheless if you need more info or have some more doubts doubts, please do not hesitate to ask questions here (or in pymc3 discourse).

@joannadiong joannadiong changed the title Rethinking_2 Chp_2: Different pymc3 output between notebook and local run Rethinking_2 Chp_4: Different pymc3 output between notebook and local run Dec 6, 2020
@joannadiong
Copy link
Contributor Author

Hi @aloctavodia, thanks for the detailed and helpful explanation. I can manage those fixes and will keep working through the book.

For where I'm at in the book now, there are minor fixes needed for the notebook:

# Code 4.18
ax.imshow(zi, origin="bottom") # "bottom" would need to be "lower"

And trace_to_dataframe will soon be removed, which will break here:

# Code 4.32
trace_df = pm.trace_to_dataframe(trace_4_1) 

but I think there is an open issue for this.

I'm not familiar enough just yet with the packages and compiling the notebooks for pymc3 standards, but hopefully in time I'll learn enough to contribute more. Happy for you to close this issue if you feel the main things are covered. Many thanks again!

@aloctavodia
Copy link
Member

Thanks for the report, this is all useful feedback. We should update the notebooks to directly work with ArviZ's InferenceData object.
Looking forward to more contributions from you. Let us know if you need help of have more doubts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants