-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blobs example a bit confusing #352
Comments
Thanks for reporting! You're absolutely right that there are several typos in the docs. The first example should read: def log_prob(params):
lp = log_prior(params)
if not np.isfinite(lp):
return -np.inf, None
ll = log_like(params)
if not np.isfinite(ll):
return -np.inf, None
return lp + ll, lp And the second should read: def log_prob(params):
lp = log_prior(params)
if not np.isfinite(lp):
return -np.inf, None, None
ll = log_like(params)
if not np.isfinite(ll):
return -np.inf, None, None
return lp + ll, lp, np.mean(params) In summary:
Let me know if this helps your use cases and we can update the docs. |
Ah, you're right. Returning
I'm not sure why, exactly, but it's easily avoidable, just by returning |
Can you post exactly the code you're running to get this, with a fixed seed and all the relevant imports and data, etc? I wouldn't expect this to ever be hit: it shouldn't inspect the blobs if logprob is |
Sure! import emcee
import numpy as np
def log_prob(p):
mu, sig = p
if not 0 < mu < 1:
ll = -np.inf
return -np.inf, 0,0,0
ll = np.sum(-(data - mu)**2 / (2 * sig**2)) - sig
return ll, mu*sig, mu+sig, (data - mu)/sig
np.random.seed(1234)
data = np.random.normal(loc=0.2, scale=0.3, size=25)
sampler = emcee.EnsembleSampler(
nwalkers = 100,
ndim = 2,
log_prob_fn = log_prob,
)
init_pos = np.array([0.2, 0.3]) + 1e-4 * np.random.normal(size=(100, 2))
sampler.run_mcmc(init_pos, nsteps=1000, progress=True) |
Hi, I'm very glad I found this issue after also being confused by the blobs documentation. It has a lot of info that could be useful to add in the docs. Here's what I gathered so far (assuming I understand the intended behaviour correctly):
Now regarding 3. I think there might be a bug indeed. Playing with the example from the previous post, it seems that
|
Some of this is the result of how numpy has changed their handling of dtypes (for the better!) since I originally wrote this forever ago. But either way, I'd say that the best approach in this particular case (whenever you have non-trival blob types) is to provide the dtype explicitly. And don't use blobs_dtype=np.dtype([("a", float), ("b", float), ("c", float, len(data))]) Then your resulting blobs array with be a structured array that will be easy to slice, etc. |
The blobs docs were found to be a bit confusing by some users (including me). This merge request updates them following the discussion in dfm#352 to clarify how this interface can be used and the intent behind it.
General information:
Problem description:
Expected behavior:
There are actually a few things that are confusing me (based on the docs), and one thing that (given I understand correctly) could potentially be made better in the code. The first confusing thing in the docs is that (on the "Blobs" page) it gives the following example:
This looks like it's returning
lp
as the log-likelihood whenll
is infinite. That doesn't seem right... ifll
is infinite, then shouldn't the returned probability be-np.inf
? Are the returns backwards?Now the next example, for "named blobs and custom dtypes" has this
log_prob
:Again, this has the same seemingly backwards return when
ll
is not finite. But it also has a different number of returns when eitherlp
orll
are infinite (2 instead of 3). Is this really allowed? If so, great (though see below), but then I think it should be more clear that this is really allowed and encouraged.However, in my own application, I have a
log_prob
that has dynamic blobs (I let the user specify a bunch of keys that they want to pull out of an object to store as blobs). This works perfectly fine, except that when I do a similarif not np.isfinite(ll): return -np.inf, -np.inf
,emcee
complains:Minimal Example(s)
Let
data = np.random.normal(loc=0.2, scale=0.3, size=25)
andinit_pos = np.array([0.2, 0.3]) + 1e-4 * np.random.normal(size=(100, 2))
Example 1:
This is almost the same as the docs example -- returning two blobs, but only one blob if returning a
-np.inf
prior. Setup sampler withIt exits with
Example 2:
New
log_prob
where one of the blobs is an array:Exits with
Example 3:
Using the same log prob as example 2, but with a HDF5 backend, and sampler defined as:
Exits with
If I explicitly return
-np.inf, -np.inf, -np.inf, np.zeros(25)
when the prior is infinite, then it works fine. However, returning an array of zeros of length 1 but with the correct dtype does not work.This makes it kind of hard to dynamically return blobs. What I could do is create the relevant blob dtype, then unpack it as a tuple, add the likelihood to the front of the tuple, and return that. In this case, I'd have to write the following:
This works, but is clearly a lot more brittle and complex. In cases like these, we know that the blobs will never actually be saved because the likelihood is
-np.inf
anyway. I wonder if you couldn't just ignore the blobs in these cases?The text was updated successfully, but these errors were encountered: