-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider making channel scaling part of the model definition #1383
Comments
Totally. Does this work with scaling the regression target though? |
I don't know exactly what you're asking :) |
This doesn't work ... (simple version of the model and how I am interpreting your suggestion) import numpy as np
import pymc as pm
seed = sum(map(ord, "Scaling the likelihood depended variables doesn't work in PyMC"))
rng = np.random.default_rng(seed)
true_mu = 100
true_sigma = 30
n_obs = 10
coords = {
"date": np.arange(n_obs),
}
dist = pm.Normal.dist(mu=true_mu, sigma=true_sigma, shape=n_obs)
data = pm.draw(dist, random_seed=rng)
scaling = data.max()
with pm.Model(coords=coords) as model:
mu = pm.Normal("mu")
sigma = pm.HalfNormal("sigma")
target = pm.Data("target", data, dims="date")
scaled_target = target / scaling
pm.Normal("observed", mu=mu, sigma=sigma, observed=scaled_target) Results in:
|
However, doing for the covariated in the model works fine so breaking this into two steps (covariates and target) would be fine. |
Ah I see what you mean. We can and should make We already have a bunch of exceptions for casting and minibatch (which actually involves RVs, but carefully defined ones) |
Cool. Seems good to work toward then. Are there open PyMC issues for this? |
I know it has been talked about repeatedly but can't find any issue |
@ricardoV94 @wd60622 with pm.Model(
coords=self.model_coords,
) as self.model:
_channel_scale = pm.Data(
"channel_scale",
self.scalers._channel.values,
mutable=False,
dims="channel",
)
_target_scale = pm.Data(
"target_scale",
self.scalers._target.item(),
mutable=False,
)
# Scale `channel_data` and `target`
channel_data_ = pm.Data(
name="channel_data",
value=(
self.xarray_dataset._channel.transpose(
"date", *self.dims, "channel"
).values
/ _channel_scale.eval()
),
dims=("date", *self.dims, "channel"),
)
target_ = pm.Data(
name="target",
value=(
self.xarray_dataset._target.sum(dim="target")
.transpose("date", *self.dims)
.values
),
dims=("date", *self.dims),
)
. . . .
mu_var *= _target_scale.eval()
mu = pm.Deterministic(name="mu", var=mu_var, dims=("date", *self.dims))
self.model_config["likelihood"].dims = ("date", *self.dims)
self.model_config["likelihood"].create_likelihood_variable(
name=self.output_var,
mu=mu,
observed=target_,
) You can see the full implementation on the #1036 PR. |
When working on #1357 to make the optimizar model-agnostic, I still had to worry about channel scales, because these are not part of the model. I imagine when defining the model they happen as a pre-proccessing step?
If instead, the model was defined with raw data and the scaling happened symbolically, it wouldn't be needed. Is there any part of the codebase that requires sometimes applying the scale and other times not?
If we needed a function that takes rescaled_x as input that would also be easy, by wrapping the operation in a Deterministic, which gives us a handle to it later.
The text was updated successfully, but these errors were encountered: