Consider making channel scaling part of the model definition #1383

ricardoV94 · 2025-01-16T10:17:48Z

When working on #1357 to make the optimizar model-agnostic, I still had to worry about channel scales, because these are not part of the model. I imagine when defining the model they happen as a pre-proccessing step?

If instead, the model was defined with raw data and the scaling happened symbolically, it wouldn't be needed. Is there any part of the codebase that requires sometimes applying the scale and other times not?

with pm.Model() as m:
  natural_x = pm.Data("x", ...)
  rescaled_x = natural_x / natural_x.max().eval()  # So it doesn't change when you change `natural_x`.
  ... # Make use of rescaled_x.

If we needed a function that takes rescaled_x as input that would also be easy, by wrapping the operation in a Deterministic, which gives us a handle to it later.

wd60622 · 2025-01-16T10:59:41Z

Totally. Does this work with scaling the regression target though?

ricardoV94 · 2025-01-16T13:26:20Z

Totally. Does this work with scaling the regression target though?

I don't know exactly what you're asking :)

wd60622 · 2025-01-16T13:37:49Z

This doesn't work ... (simple version of the model and how I am interpreting your suggestion)

import numpy as np
import pymc as pm

seed = sum(map(ord, "Scaling the likelihood depended variables doesn't work in PyMC"))
rng = np.random.default_rng(seed)

true_mu = 100
true_sigma = 30

n_obs = 10
coords = {
    "date": np.arange(n_obs),
}

dist = pm.Normal.dist(mu=true_mu, sigma=true_sigma, shape=n_obs)
data = pm.draw(dist, random_seed=rng)

scaling = data.max()

with pm.Model(coords=coords) as model:
    mu = pm.Normal("mu")
    sigma = pm.HalfNormal("sigma")

    target = pm.Data("target", data, dims="date")
    scaled_target = target / scaling

    pm.Normal("observed", mu=mu, sigma=sigma, observed=scaled_target)

Results in:

TypeError                              Traceback (most recent call last)
Cell In[6], line 30
     27 target = pm.Data("target", data, dims="date")
     28 scaled_target = target / scaling
---> 30 pm.Normal("observed", mu=mu, sigma=sigma, observed=scaled_target)

File ~/micromamba/envs/pymc-marketing-dev/lib/python3.10/site-packages/pymc/distributions/distribution.py:513, in Distribution.__new__(cls, name, rng, dims, initval, observed, total_size, transform, default_transform, *args, **kwargs)
    509         kwargs["shape"] = tuple(observed.shape)
    511 rv_out = cls.dist(*args, **kwargs)
--> 513 rv_out = model.register_rv(
    514     rv_out,
    515     name,
    516     observed=observed,
    517     total_size=total_size,
    518     dims=dims,
    519     transform=transform,
    520     default_transform=default_transform,
    521     initval=initval,
    522 )
    524 # add in pretty-printing support
    525 rv_out.str_repr = types.MethodType(str_for_dist, rv_out)

File ~/micromamba/envs/pymc-marketing-dev/lib/python3.10/site-packages/pymc/model/core.py:1245, in Model.register_rv(self, rv_var, name, observed, total_size, dims, default_transform, transform, initval)
   1243 else:
   1244     if not is_valid_observed(observed):
-> 1245         raise TypeError(
   1246             "Variables that depend on other nodes cannot be used for observed data."
   1247             f"The data variable was: {observed}"
   1248         )
   1250     # `rv_var` is potentially changed by `make_obs_var`,
   1251     # for example into a new graph for imputation of missing data.
   1252     rv_var = self.make_obs_var(
   1253         rv_var, observed, dims, default_transform, transform, total_size
   1254     )

TypeError: Variables that depend on other nodes cannot be used for observed data.The data variable was: True_div.0

wd60622 · 2025-01-16T13:40:53Z

However, doing for the covariated in the model works fine so breaking this into two steps (covariates and target) would be fine.

ricardoV94 · 2025-01-16T13:47:18Z

Ah I see what you mean. We can and should make observed less restrictive in PyMC. As long as it is a function that involves no RVs / value_vars, it should be fine.

We already have a bunch of exceptions for casting and minibatch (which actually involves RVs, but carefully defined ones)

wd60622 · 2025-01-16T14:53:36Z

Cool. Seems good to work toward then. Are there open PyMC issues for this?

ricardoV94 · 2025-01-16T15:00:51Z

Cool. Seems good to work toward then. Are there open PyMC issues for this?

I know it has been talked about repeatedly but can't find any issue

wd60622 · 2025-01-16T15:12:12Z

Cool. I know there are some related issues in pymc-marketing

Related to #154 #407 #299 and others that are linked there.

cetagostini · 2025-01-20T23:04:33Z

@ricardoV94 @wd60622
The following code works for me ->

with pm.Model(
            coords=self.model_coords,
        ) as self.model:
            _channel_scale = pm.Data(
                "channel_scale",
                self.scalers._channel.values,
                mutable=False,
                dims="channel",
            )
            _target_scale = pm.Data(
                "target_scale",
                self.scalers._target.item(),
                mutable=False,
            )

            # Scale `channel_data` and `target`
            channel_data_ = pm.Data(
                name="channel_data",
                value=(
                    self.xarray_dataset._channel.transpose(
                        "date", *self.dims, "channel"
                    ).values
                    / _channel_scale.eval()
                ),
                dims=("date", *self.dims, "channel"),
            )

            target_ = pm.Data(
                name="target",
                value=(
                    self.xarray_dataset._target.sum(dim="target")
                    .transpose("date", *self.dims)
                    .values
                ),
                dims=("date", *self.dims),
            )

. . . . 

            mu_var *= _target_scale.eval()

            mu = pm.Deterministic(name="mu", var=mu_var, dims=("date", *self.dims))

            self.model_config["likelihood"].dims = ("date", *self.dims)
            self.model_config["likelihood"].create_likelihood_variable(
                name=self.output_var,
                mu=mu,
                observed=target_,
            )

You can see the full implementation on the #1036 PR.

ricardoV94 changed the title ~~Make channel scaling part of the model definition~~ Consider making channel scaling part of the model definition Jan 16, 2025

github-actions bot added the Needs Triage label Jan 16, 2025

ricardoV94 added MMM and removed Needs Triage labels Jan 16, 2025

ricardoV94 mentioned this issue Jan 16, 2025

Budget optimizer refactor #1357

Merged

3 tasks

wd60622 mentioned this issue Jan 16, 2025

BUG: data as observed in RV pymc-devs/pymc#7649

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider making channel scaling part of the model definition #1383

Consider making channel scaling part of the model definition #1383

ricardoV94 commented Jan 16, 2025 •

edited

Loading

wd60622 commented Jan 16, 2025

ricardoV94 commented Jan 16, 2025

wd60622 commented Jan 16, 2025 •

edited

Loading

wd60622 commented Jan 16, 2025

ricardoV94 commented Jan 16, 2025 •

edited

Loading

wd60622 commented Jan 16, 2025

ricardoV94 commented Jan 16, 2025

wd60622 commented Jan 16, 2025 •

edited

Loading

cetagostini commented Jan 20, 2025 •

edited

Loading

Consider making channel scaling part of the model definition #1383

Consider making channel scaling part of the model definition #1383

Comments

ricardoV94 commented Jan 16, 2025 • edited Loading

wd60622 commented Jan 16, 2025

ricardoV94 commented Jan 16, 2025

wd60622 commented Jan 16, 2025 • edited Loading

wd60622 commented Jan 16, 2025

ricardoV94 commented Jan 16, 2025 • edited Loading

wd60622 commented Jan 16, 2025

ricardoV94 commented Jan 16, 2025

wd60622 commented Jan 16, 2025 • edited Loading

cetagostini commented Jan 20, 2025 • edited Loading

ricardoV94 commented Jan 16, 2025 •

edited

Loading

wd60622 commented Jan 16, 2025 •

edited

Loading

ricardoV94 commented Jan 16, 2025 •

edited

Loading

wd60622 commented Jan 16, 2025 •

edited

Loading

cetagostini commented Jan 20, 2025 •

edited

Loading