Imputation does not work in combination with pm.Data #4441

michaelosthege · 2021-01-25T12:08:41Z

Description

Even after #4439 imputations don't work in combination with pm.Data.
This is because pm.Data creates a SharedVariable that currently does not support a np.ma.MaskedArray.

Almost identical to the example from #4437:

data = numpy.array([
    [1,2,3],
    [4,5,float("nan")],
    [7,8,9],
])
print(data)
with pymc3.Model():
    pymc3.Normal(
        "L",
        mu=pymc3.Normal("x", shape=data.shape),
        sd=10,
        observed=pm.Data("D", data),
        shape=data.shape
    )
    pymc3.sample()

Please provide the full traceback.

SamplingError                             Traceback (most recent call last)
<ipython-input-32-e837139c32a4> in <module>
     13         shape=data.shape
     14     )
---> 15     pymc3.sample()

c:\users\osthege\repos\pymc3-dev\pymc3\sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
    425     model = modelcontext(model)
    426     if start is None:
--> 427         check_start_vals(model.test_point, model)
    428     else:
    429         if isinstance(start, dict):

c:\users\osthege\repos\pymc3-dev\pymc3\util.py in check_start_vals(start, model)
    236                 "Initial evaluation of model at starting point failed!\n"
    237                 "Starting values:\n{}\n\n"
--> 238                 "Initial evaluation results:\n{}".format(elem, str(initial_eval))
    239             )
    240 

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'x': array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])}

Initial evaluation results:
x   -8.27
L     NaN
Name: Log-probability of test_point, dtype: float64

Versions and main components

PyMC3 Version: master (with Support imputations with ndarray data #4439)
Theano Version: 1.1.0

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2021-01-25T12:27:17Z

Is this something that might be problematic if automatically inputed (setting new Data with more/less missing values than the initial data)?

michaelosthege · 2021-01-25T12:33:19Z

The whole implementation for switching the data in an existing model is broken.
Shape issues are one thing, but @ricardoV94 is right that with imputation it could get even worse. Imputation is realized by automatically changing the model graph. Switching out the data afterwards will almost certainly break it unless the mask is identical.

I see pm.Data primarily as a tool to get the data & coords nicely represented in the model graph and resulting InferenceData.
We should probably separate those two use cases into something like pm.Data and pm.MutableData.

michaelosthege · 2021-01-25T12:37:20Z

I have a hunch that we'll have to revisit the whole imputation feature under the new RandomVariable paradigm. After merging #4439 I'm fine with doing observed=pm.Data(...).container.data as a workaround.

Let's label this "wontfix" and revisit it for PyMC3 >=4.0.

AlexAndorra · 2021-01-25T14:08:43Z

Agreed 👌
Out of curiosity, what do you mean by "doing observed=pm.Data(...).container.data as a workaround" ?

michaelosthege · 2021-01-25T14:23:26Z

Agreed 👌
Out of curiosity, what do you mean by "doing observed=pm.Data(...).container.data as a workaround" ?

This way I can use the pm.Data to include my data into the InferenceData, but also use it for imputation. Only the graph I get from pm.model_to_graphviz now has the Data node disconnected, but I can live with that.

ricardoV94 · 2021-12-22T07:33:54Z

After learning about imputed variables, this feature would require a considerable change in the internals, since all the imputation logic is happening during the model definition.

michaelosthege · 2021-12-22T08:10:23Z

Yes, we can't have support for the combination of SharedVariable+imputation.
That brings me back to the proposal of distinguishing between pm.ConstantData and pm.MutableData or something like that.

ricardoV94 · 2021-12-22T08:22:54Z

Isn't the default observed ConstantData?

michaelosthege · 2021-12-22T08:27:01Z

Observed are not automatically tracked with dims/coords and don't show up in model_to_graphviz.
Also it is not always the case that these arrays become observed. Sometimes you need a vector of x values as an input to the regression and so far only by making it a pm.Data you can get it into the InferenceData..

ricardoV94 · 2021-12-22T08:54:21Z

I see... should we check for nans in pm.set_data and just prohibit it? Or too much of an edge case to be worth bothering?

michaelosthege · 2021-12-22T09:07:00Z

Let's implement pm.ConstantData and pm.MutableData. Then we can check and warn on NaN and point users to ConstantData which creates a TensorConstant and registers it for dims/coords with the model. .

Could give us some speed-up too, because with constant data the shape is known..

By passing `pm.Data(mutable=False)` one can create a `TensorConstant` instead of a `SharedVariable`. Data variables with known, fixed shape can enhance performance and compatibility in some situations. `pm.ConstantData` or `pm.MutableData` wrappers are provided as alternative syntax. This is the basis for solving pymc-devs#4441.

By passing `pm.Data(mutable=False)` one can create a `TensorConstant` instead of a `SharedVariable`. Data variables with known, fixed shape can enhance performance and compatibility in some situations. `pm.ConstantData` or `pm.MutableData` wrappers are provided as alternative syntax. This is the basis for solving #4441.

michaelosthege · 2022-06-18T10:52:03Z

I think this can be closed since one can now use pm.ConstantData

ricardoV94 · 2022-06-18T11:48:05Z

I think imputation will still fail with ConstantData

michaelosthege · 2022-06-18T11:57:12Z

I think imputation will still fail with ConstantData

Oh because we're not yet checking NaN in tensors? Yeah sorry, I forgot about that. Good call to open it again 🙏

michaelosthege added enhancements theano-related labels Jan 25, 2021

michaelosthege added the wontfix label Jan 25, 2021

michaelosthege mentioned this issue Dec 30, 2021

Option to create non-shared pm.Data #5295

Merged

5 tasks

michaelosthege closed this as completed Jun 18, 2022

ricardoV94 reopened this Jun 18, 2022

michaelosthege added bug pytensor and removed wontfix theano-related labels Jun 18, 2022

fonnesbeck added the wontfix label Jun 16, 2024

fonnesbeck closed this as completed Jun 16, 2024

fonnesbeck reopened this Jun 16, 2024

fonnesbeck removed the wontfix label Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imputation does not work in combination with pm.Data #4441

Imputation does not work in combination with pm.Data #4441

michaelosthege commented Jan 25, 2021 •

edited

Loading

ricardoV94 commented Jan 25, 2021 •

edited

Loading

michaelosthege commented Jan 25, 2021

michaelosthege commented Jan 25, 2021

AlexAndorra commented Jan 25, 2021

michaelosthege commented Jan 25, 2021

ricardoV94 commented Dec 22, 2021

michaelosthege commented Dec 22, 2021

ricardoV94 commented Dec 22, 2021

michaelosthege commented Dec 22, 2021

ricardoV94 commented Dec 22, 2021

michaelosthege commented Dec 22, 2021 •

edited

Loading

michaelosthege commented Jun 18, 2022

ricardoV94 commented Jun 18, 2022

michaelosthege commented Jun 18, 2022

Imputation does not work in combination with pm.Data #4441

Imputation does not work in combination with pm.Data #4441

Comments

michaelosthege commented Jan 25, 2021 • edited Loading

Description

Versions and main components

ricardoV94 commented Jan 25, 2021 • edited Loading

michaelosthege commented Jan 25, 2021

michaelosthege commented Jan 25, 2021

AlexAndorra commented Jan 25, 2021

michaelosthege commented Jan 25, 2021

ricardoV94 commented Dec 22, 2021

michaelosthege commented Dec 22, 2021

ricardoV94 commented Dec 22, 2021

michaelosthege commented Dec 22, 2021

ricardoV94 commented Dec 22, 2021

michaelosthege commented Dec 22, 2021 • edited Loading

michaelosthege commented Jun 18, 2022

ricardoV94 commented Jun 18, 2022

michaelosthege commented Jun 18, 2022

michaelosthege commented Jan 25, 2021 •

edited

Loading

ricardoV94 commented Jan 25, 2021 •

edited

Loading

michaelosthege commented Dec 22, 2021 •

edited

Loading