Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation on InferenceData.map #1252

Closed
kyleabeauchamp opened this issue Jun 20, 2020 · 12 comments
Closed

Improve documentation on InferenceData.map #1252

kyleabeauchamp opened this issue Jun 20, 2020 · 12 comments

Comments

@kyleabeauchamp
Copy link
Contributor

kyleabeauchamp commented Jun 20, 2020

  1. Is InferenceData.map always done elementwise or is some sort of reduction applied?
  2. What is the function signature (particularly input shape and output shape) expected for a function passed to map? Currently, the docs just say "fun: callable / Function to be applied to each group."

For example, suppose you want to evaluate a new function of the model parameters (iterated for each step in each chain). Often, one might use pymc3.Deterministic to add those to the compute graph during model sampling, but I imagine that InferenceData.map might be a nice way to run such calculations without having to build it into the original compute graph at training time. I think a couple of examples and some guardrails on what fun ought to be might improve the usability here.

@kyleabeauchamp
Copy link
Contributor Author

In particular, I have two concrete examples of what you might want to do with map:

  1. Suppose you have a model that uniformly samples (x, y) points in the unit square. A nice application of map might be to transform the posterior samples into polar coordinates, giving you (r, theta)
  2. Suppose you have a bayesian regression model. Another application of map might be to apply a different scoring function (e.g., https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error) on the samples in your posterior.

@kyleabeauchamp
Copy link
Contributor Author

Here's my current working code for what I'm trying to do. Even after reading #1255, I'm still not sure if that falls under map, another arviz function that needs to be written, or possibly just code that belongs in user-space:

def runner(posterior, func):
    results = {}
    n_chains = posterior.dims["chain"]
    n_draws = posterior.dims["draw"]
    for chain in range(n_chains):
        results[chain] = {}
        for draw in range(n_draws):
            slice = posterior.isel(chain=chain, draw=draw)
            chunk = func(slice)
            results[chain][draw] = chunk

# TODO: infer shape of chunk and create output xarray object

@OriolAbril
Copy link
Member

I have extended docs a little, map basically loops over groups to apply the function to all groups (the same way that array.Dataset.map loops over data variables and therefore there is no xarray.DataArray.map. This means that data from other groups is not available unless manually passed as an argument or kwarg.

I think map would be a perfect fit for 1. if the transformation had to be done to both prior and posterior, it it is a posterior only transformation it can still be used and may be more convenient but would be equivalent to the following:

idata.posterior = func(idata.posterior)
idata = idata.map(func, groups="posterior")
# to apply func to both prior and posterior we can use latent_vars metagroup
idata = idata.map(func, groups="latent_vars")

Metagroups are still quite experimental so any comment will be very welcome.

I think 2. requires access to both observed_data and posterior predictive? I think it diverges from the groupwise case of map. Having concrete example is super useful though, so I am linking #1066 here, if you had time to create a minimal example of this it would surely help in developing inter group operations capabilities.

@ahartikainen
Copy link
Contributor

I have thinking about this too lately.

We would basically need to create a function, then having some simple way to combine InferenceData groups + external data where function arguments could be inferred / manually defined.

Something similar as in wrap_xarray_ufunc we have, but user friendly with some predetermined options.

And this would pull out wanted output.

@kyleabeauchamp
Copy link
Contributor Author

kyleabeauchamp commented Jun 20, 2020

OK, I put together a really simple example of (1), which after your suggestions seems to be working well:

import pymc3 as pm
import numpy as np


n_points = 1000
n_samples = 100
chains = 2

with pm.Model():
    x = pm.Uniform("x", 0.0, 1.0, shape=n_points)
    y = pm.Uniform("y", 0.0, 1.0, shape=n_points)

    tr = pm.sample(n_samples, chains=chains, return_inferencedata=True)

# Do some extra calculations after sampling.
distance = tr.map(lambda v, x0, y0: ((v.x - x0) ** 2 + (v.y - y0) ** 2) ** 0.5, groups="latent_vars", x0=0.0, y0=0.0).posterior
print(np.linalg.norm(tr.posterior.isel(chain=0, draw=0, x_dim_0=0, y_dim_0=0).to_array()))
print(distance.isel(chain=0, draw=0, x_dim_0=0, y_dim_0=0))

I'll try to think about example (2) now.

@kyleabeauchamp
Copy link
Contributor Author

kyleabeauchamp commented Jun 20, 2020

Here's my code for example (2). I can't get the sklearn <> map stuff working because of the need for an implicit broadcast / aggregation across one dimension. I think making that commented out line work in an easily comprehensible way might be a win for usability.

import pymc3 as pm
import numpy as np
import sklearn.metrics


n_features, n_observations = 10, 500
n_samples = 100
chains = 2

x = np.random.normal(size=(n_features, n_observations))
epsilon = 0.1 * np.random.normal(size=n_observations)
a0 = np.eye(10)[3] * 10.0  # Only 4th feature active
b0 = 25.0
y0 = x.T.dot(a0) + b0 + epsilon


with pm.Model():
    a = pm.Uniform("a", -100, 100, shape=n_features)
    b = pm.Uniform("b", -100, 100)
    sigma = pm.Uniform("sigma", 0, 100)

    mu = pm.Deterministic("mu", pm.math.dot(x.T, a) + b)

    y = pm.Normal("y", mu, sigma, shape=n_observations, observed=y0)

    tr = pm.sample(n_samples, chains=chains, return_inferencedata=True)

# After sampling, calculate additional metrics via map operation on InferenceData
# Try two ways: via sklearn and manual
# sklearn currently not working due to ValueError: Found input variables with inconsistent numbers of samples: [500, 2]
#mse0 = tr.map(lambda x, y0: sklearn.metrics.mean_squared_error(y0, x.mu.values), groups="latent_vars", y0=y0).posterior
mse1 = tr.map(lambda x, y0: ((y0 - x.mu) ** 2).mean(("mu_dim_0")), groups="latent_vars", y0=y0).posterior

# Compare to manual MSE calculated on first chain and draw
#print(mse0.isel(chain=0, draw=0))
print(mse1.isel(chain=0, draw=0))
print(sklearn.metrics.mean_squared_error(y0, tr.posterior.mu.isel(chain=0, draw=0).values))

@kyleabeauchamp
Copy link
Contributor Author

kyleabeauchamp commented Jun 20, 2020

PS: the reason I'm pushing for this functionality is that I think it would be great to be able to have a clean separation between "model training" and "model evaluation", where we use the InferenceData interface to do model evaluation (using available metrics in, e.g., sklearn) after the model was sampled.

@OriolAbril
Copy link
Member

map seems like an overkill for these two cases in my opinion. There is not a grupowise component to these calculations.

I'd recommend reading the updated radon notebook from pymc-devs/pymc#3963 to see examples of postprocessing calculations, you can access the different variables as idata.posterior.x and do anything you'd do with xarray objects (which is more or less the same as numpy's).

In the second example in fact, using xarray should take care of all broadcasting, the issue is that y0 is a numpy array, so xarray cannot broadcast automatically, using tr.observed_data.y will solve the issue if dims are all annotated

@kyleabeauchamp
Copy link
Contributor Author

Thanks, the new features (coords in pm.Model() and pm.Data) seem extremely useful for ensuring good bookkeeping and reproducibility in these situations.

I still don't actually see how to solve the issue in (2). Basically, the issue there is that I specifically want to avoid re-writing the metric and instead leverage a unit-tested, open-source version from sklearn. To me, the only way to use that existing code is to use some sort of "coordinate aware" function like map, apply, agg, or reduce where a user specifies which axes are iterated over and which ones are "presented to func".

@OriolAbril
Copy link
Member

Oh, my bad, I assumed sklearn function would work, xarray preserves coords and broadcasts automatically with most numpy functions. It looks like you'll have to use xarray.apply_ufunc or arviz.wrap_xarray_ufunc, I am afraid of both functions need some improvements I hope they are clear enough, if not, maybe this SO answer can help and in some cases you may need to call xr.broadcast before passing the datasets to apply_ufunc, see pydata/xarray#3032

@AlexAndorra
Copy link
Contributor

Closing, as the doc improvement has been merged and released in 0.9.0, but feel free to reopen if needed 😉

@kyleabeauchamp
Copy link
Contributor Author

kyleabeauchamp commented Jul 5, 2020

FYI, I was able to figure out my use case combining arviz + xarray + sklearn metrics. I spent several hours staring at the docs for wrap_xarray_ufunc and eventually was able to identify the following pattern:

xr.apply_ufunc(f, y, yhat, input_core_dims=[[dim], [dim]], vectorize=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants