Blobs example a bit confusing #352

steven-murray · 2020-08-04T16:53:12Z

General information:

emcee version: 3.0.2
platform: Linux
installation method (pip/conda/source/other?): pip

Problem description:

Expected behavior:

There are actually a few things that are confusing me (based on the docs), and one thing that (given I understand correctly) could potentially be made better in the code. The first confusing thing in the docs is that (on the "Blobs" page) it gives the following example:

def log_prob(params):
    lp = log_prior(params)
    if not np.isfinite(lp):
        return -np.inf, -np.inf
    ll = log_like(params)
    if not np.isfinite(ll):
        return lp, -np.inf
    return lp + ll, lp

This looks like it's returning lp as the log-likelihood when ll is infinite. That doesn't seem right... if ll is infinite, then shouldn't the returned probability be -np.inf? Are the returns backwards?

Now the next example, for "named blobs and custom dtypes" has this log_prob:

def log_prob(params):
    lp = log_prior(params)
    if not np.isfinite(lp):
        return -np.inf, -np.inf
    ll = log_like(params)
    if not np.isfinite(ll):
        return lp, -np.inf
    return lp + ll, lp, np.mean(params)

Again, this has the same seemingly backwards return when ll is not finite. But it also has a different number of returns when either lp or ll are infinite (2 instead of 3). Is this really allowed? If so, great (though see below), but then I think it should be more clear that this is really allowed and encouraged.

However, in my own application, I have a log_prob that has dynamic blobs (I let the user specify a bunch of keys that they want to pull out of an object to store as blobs). This works perfectly fine, except that when I do a similar if not np.isfinite(ll): return -np.inf, -np.inf, emcee complains:

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/moves/red_blue.py in propose(self, model, state)
     91 
     92             # Compute the lnprobs of the proposed position.
---> 93             new_log_probs, new_blobs = model.compute_log_prob_fn(q)
     94 
     95             # Loop over the walkers and update them accordingly.

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in compute_log_prob(self, coords)
    443                 except ValueError:
    444                     dt = np.dtype("object")
--> 445             blob = np.array(blob, dtype=dt)
    446 
    447             # Deal with single blobs properly

ValueError: could not assign tuple of length 1 to structure with 3 fields.

Minimal Example(s)

Let data = np.random.normal(loc=0.2, scale=0.3, size=25) and init_pos = np.array([0.2, 0.3]) + 1e-4 * np.random.normal(size=(100, 2))

Example 1:

def log_prob(p, data):
    mu, sig = p
    
    # "prior"
    if not 0 < mu < 1:
        return -np.inf, -np.inf
    
    
    ll = np.sum(-(data - mu)**2 / (2 * sig**2)) - sig
    
    return ll, mu*sig, mu+sig

This is almost the same as the docs example -- returning two blobs, but only one blob if returning a -np.inf prior. Setup sampler with

sampler = emcee.EnsembleSampler(
    nwalkers = 100,
    ndim = 2,
    log_prob_fn = log_prob,
    args=(data,)
)

It exits with

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-c79fbf8d39e5> in <module>
----> 1 sampler.run_mcmc(init_pos, nsteps=1000, progress=True)

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in run_mcmc(self, initial_state, nsteps, **kwargs)
    382 
    383         results = None
--> 384         for results in self.sample(initial_state, iterations=nsteps, **kwargs):
    385             pass
    386 

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in sample(self, initial_state, log_prob0, rstate0, blobs0, iterations, tune, skip_initial_state_check, thin_by, thin, store, progress)
    341 
    342                     # Propose
--> 343                     state, accepted = move.propose(model, state)
    344                     state.random_state = self.random_state
    345 

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/moves/red_blue.py in propose(self, model, state)
     91 
     92             # Compute the lnprobs of the proposed position.
---> 93             new_log_probs, new_blobs = model.compute_log_prob_fn(q)
     94 
     95             # Loop over the walkers and update them accordingly.

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in compute_log_prob(self, coords)
    443                 except ValueError:
    444                     dt = np.dtype("object")
--> 445             blob = np.array(blob, dtype=dt)
    446 
    447             # Deal with single blobs properly

ValueError: setting an array element with a sequence.

Example 2:
New log_prob where one of the blobs is an array:

def log_prob(p, data):
    mu, sig = p
    
    # "prior"
    if not 0 < mu < 1:
        return -np.inf, -np.inf
    
    
    ll = np.sum(-(data - mu)**2 / (2 * sig**2)) - sig
    
    return ll, mu*sig, mu+sig, (data -mu)/sig

Exits with

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-c79fbf8d39e5> in <module>
----> 1 sampler.run_mcmc(init_pos, nsteps=1000, progress=True)

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in run_mcmc(self, initial_state, nsteps, **kwargs)
    382 
    383         results = None
--> 384         for results in self.sample(initial_state, iterations=nsteps, **kwargs):
    385             pass
    386 

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in sample(self, initial_state, log_prob0, rstate0, blobs0, iterations, tune, skip_initial_state_check, thin_by, thin, store, progress)
    341 
    342                     # Propose
--> 343                     state, accepted = move.propose(model, state)
    344                     state.random_state = self.random_state
    345 

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/moves/red_blue.py in propose(self, model, state)
    102 
    103             new_state = State(q, log_prob=new_log_probs, blobs=new_blobs)
--> 104             state = self.update(state, new_state, accepted, S1)
    105 
    106         return state, accepted

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/moves/move.py in update(self, old_state, new_state, accepted, subset)
     41                     "blobs at that position."
     42                 )
---> 43             old_state.blobs[m1] = new_state.blobs[m2]
     44 
     45         return old_state

ValueError: shape mismatch: value array of shape (41,) could not be broadcast to indexing result of shape (41,3)

Example 3:
Using the same log prob as example 2, but with a HDF5 backend, and sampler defined as:

backend = emcee.backends.HDFBackend("backend.h5")
sampler = emcee.EnsembleSampler(
    nwalkers = 100,
    ndim = 2,
    log_prob_fn = log_prob,
    args=(data,),
    backend=backend,
    blobs_dtype=[('mu*sig', float), ('mu+sig', float), ('z', (float, 25))]
)

Exits with

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-c79fbf8d39e5> in <module>
----> 1 sampler.run_mcmc(init_pos, nsteps=1000, progress=True)

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in run_mcmc(self, initial_state, nsteps, **kwargs)
    382 
    383         results = None
--> 384         for results in self.sample(initial_state, iterations=nsteps, **kwargs):
    385             pass
    386 

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in sample(self, initial_state, log_prob0, rstate0, blobs0, iterations, tune, skip_initial_state_check, thin_by, thin, store, progress)
    341 
    342                     # Propose
--> 343                     state, accepted = move.propose(model, state)
    344                     state.random_state = self.random_state
    345 

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/moves/red_blue.py in propose(self, model, state)
     91 
     92             # Compute the lnprobs of the proposed position.
---> 93             new_log_probs, new_blobs = model.compute_log_prob_fn(q)
     94 
     95             # Loop over the walkers and update them accordingly.

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in compute_log_prob(self, coords)
    443                 except ValueError:
    444                     dt = np.dtype("object")
--> 445             blob = np.array(blob, dtype=dt)
    446 
    447             # Deal with single blobs properly

ValueError: could not assign tuple of length 2 to structure with 3 fields.

If I explicitly return -np.inf, -np.inf, -np.inf, np.zeros(25) when the prior is infinite, then it works fine. However, returning an array of zeros of length 1 but with the correct dtype does not work.

This makes it kind of hard to dynamically return blobs. What I could do is create the relevant blob dtype, then unpack it as a tuple, add the likelihood to the front of the tuple, and return that. In this case, I'd have to write the following:

def log_prob(p, data, blobs_dtype):
    mu, sig = p
    
    # "prior"
    if not 0 < mu < 1:
        ll = -np.inf
        blobs = np.zeros(1, dtype=blobs_dtype)
        return (ll, ) + tuple(blobs[name] for name in blobs.dtype.names)    
    
    ll = np.sum(-(data - mu)**2 / (2 * sig**2)) - sig
    
    return ll, mu*sig, mu+sig, (data - mu)/sig

backend = emcee.backends.HDFBackend("backend.h5")
sampler = emcee.EnsembleSampler(
    nwalkers = 100,
    ndim = 2,
    log_prob_fn = log_prob,
    args=(data, blobs_dtype),
    backend=backend,
    blobs_dtype=blobs_dtype
)

This works, but is clearly a lot more brittle and complex. In cases like these, we know that the blobs will never actually be saved because the likelihood is -np.inf anyway. I wonder if you couldn't just ignore the blobs in these cases?

The text was updated successfully, but these errors were encountered:

dfm · 2020-08-04T17:45:44Z

Thanks for reporting! You're absolutely right that there are several typos in the docs. The first example should read:

def log_prob(params):
    lp = log_prior(params)
    if not np.isfinite(lp):
        return -np.inf, None
    ll = log_like(params)
    if not np.isfinite(ll):
        return -np.inf, None
    return lp + ll, lp

And the second should read:

def log_prob(params):
    lp = log_prior(params)
    if not np.isfinite(lp):
        return -np.inf, None, None
    ll = log_like(params)
    if not np.isfinite(ll):
        return -np.inf, None, None
    return lp + ll, lp, np.mean(params)

In summary:

The number of returned objects should be the same, but if the log prob is -np.inf these returned objects can be anything.
You should definitely return -np.inf when the log likelihood fails - that was a typo in the docs!

Let me know if this helps your use cases and we can update the docs.

steven-murray · 2020-08-04T19:15:12Z

Ah, you're right. Returning None for each of the blobs works both with and without setting blobs_dtype, and that is easy to do (even dynamically). However, one strange case: setting the blobs to each be 0 (since you say it doesn't matter what they are) without the blobs_dtype does result in an error:

~/miniconda3/envs/halomod/lib/python3.7/site-packages/emcee/ensemble.py in compute_log_prob(self, coords)
    443                 except ValueError:
    444                     dt = np.dtype("object")
--> 445             blob = np.array(blob, dtype=dt)
    446 
    447             # Deal with single blobs properly

ValueError: setting an array element with a sequence.

I'm not sure why, exactly, but it's easily avoidable, just by returning None. That should be made explicit in the docs though!

dfm · 2020-08-04T19:22:36Z

Can you post exactly the code you're running to get this, with a fixed seed and all the relevant imports and data, etc? I wouldn't expect this to ever be hit: it shouldn't inspect the blobs if logprob is -np.inf and it should throw an error earlier if your initial coordinates are invalid so this suggests that there might be a bug.

steven-murray · 2020-08-04T22:53:06Z

Sure!

import emcee
import numpy as np

def log_prob(p):
    mu, sig = p
    
    if not 0 < mu < 1:
        ll = -np.inf
        return -np.inf, 0,0,0

    ll = np.sum(-(data - mu)**2 / (2 * sig**2)) - sig
    
    return ll, mu*sig, mu+sig, (data - mu)/sig

np.random.seed(1234)
data = np.random.normal(loc=0.2, scale=0.3, size=25)

sampler = emcee.EnsembleSampler(
    nwalkers = 100,
    ndim = 2,
    log_prob_fn = log_prob,
)

init_pos = np.array([0.2, 0.3]) + 1e-4 * np.random.normal(size=(100, 2))

sampler.run_mcmc(init_pos, nsteps=1000, progress=True)

lboogaard · 2021-09-24T09:36:20Z

Hi, I'm very glad I found this issue after also being confused by the blobs documentation. It has a lot of info that could be useful to add in the docs. Here's what I gathered so far (assuming I understand the intended behaviour correctly):

If your log probability return more than 1 argument, the following arguments are automatically assumed to be blobs
If you don't provide blobs_dtype, the dtype of the extra args is automatically guessed on the first pass
If your log probability returns -np.inf, the blobs are also not inspected and can be anything (but you should return the same number of arguments)

Now regarding 3. I think there might be a bug indeed. Playing with the example from the previous post, it seems that return -np.inf, 0, 0, 0 fails because the the last return is not the same shape as (data - mu)/sig. This is surprising if it's true the blobs are not inspected? Here's a MWE:

import emcee
import numpy as np

def log_prob(p):
    mu, sig = p
    
    if not 0 < mu < 1:
        #return -np.inf, None, None, None          # works
        #return -np.inf, 0, 0, np.zeros_like(data) # works
        return -np.inf, 0, 0, 0                    # fails (unless blobs_dtype='object' is explicilty set)

    ll = np.sum(-(data - mu)**2 / (2 * sig**2)) - sig
    
    return ll, mu*sig, mu+sig, (data - mu)/sig

np.random.seed(1234)
data = np.random.normal(loc=0.2, scale=0.3, size=25)

sampler = emcee.EnsembleSampler(
    nwalkers = 100,
    ndim = 2,
    log_prob_fn = log_prob,
    #blobs_dtype='object'
)

init_pos = np.array([0.2, 0.3]) + 1e-4 * np.random.normal(size=(100, 2))

state = sampler.run_mcmc(init_pos, nsteps=100, progress=True)

dfm · 2021-09-24T12:24:54Z

Some of this is the result of how numpy has changed their handling of dtypes (for the better!) since I originally wrote this forever ago. But either way, I'd say that the best approach in this particular case (whenever you have non-trival blob types) is to provide the dtype explicitly. And don't use object! Instead, here I would use:

blobs_dtype=np.dtype([("a", float), ("b", float), ("c", float, len(data))])

Then your resulting blobs array with be a structured array that will be easy to slice, etc.

The blobs docs were found to be a bit confusing by some users (including me). This merge request updates them following the discussion in dfm#352 to clarify how this interface can be used and the intent behind it.

The blobs docs were found to be a bit confusing by some users (including me). This merge request updates them following the discussion in #352 to clarify how this interface can be used and the intent behind it.

znicholls mentioned this issue Mar 15, 2023

Update blobs docs (#352) #464

Merged

dfm closed this as completed in #464 May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blobs example a bit confusing #352

Blobs example a bit confusing #352

steven-murray commented Aug 4, 2020

dfm commented Aug 4, 2020

steven-murray commented Aug 4, 2020

dfm commented Aug 4, 2020

steven-murray commented Aug 4, 2020

lboogaard commented Sep 24, 2021

dfm commented Sep 24, 2021

Blobs example a bit confusing #352

Blobs example a bit confusing #352

Comments

steven-murray commented Aug 4, 2020

Expected behavior:

Minimal Example(s)

dfm commented Aug 4, 2020

steven-murray commented Aug 4, 2020

dfm commented Aug 4, 2020

steven-murray commented Aug 4, 2020

lboogaard commented Sep 24, 2021

dfm commented Sep 24, 2021