-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/bayesian ridge #247
Feat/bayesian ridge #247
Changes from 57 commits
ae425f1
ad92978
8cbcabb
67b716d
864a0dd
3be9884
8572534
30952da
0ca6725
cb2a2fa
0ab4741
b7f4a9d
ae8b3b0
afe88ee
421a8ef
9be10e4
02b285d
373eda0
3c508a4
2cf1c98
895dd97
1989c3f
f5ba5b6
115c447
1a3070c
c0ebe68
5533ab5
df1be29
afa417d
a761eab
9cdeecc
d5f2510
0f054c6
7cc4683
7d8d470
e1e62d6
a575c3b
6d5d2d4
fc4e124
c96a6f0
bd75143
37de967
469a9b9
4d2d782
3cf69c2
988243c
889de37
8c969c6
514b167
e4cd5b0
f7b9a57
60004c7
0248749
9f114f4
92e434e
7a53d4d
cd3822c
fdd0324
287eb6a
3694a0d
199e1f2
c409e83
d9628a6
a4c6728
f14f3fa
8e0fb45
9ba1bc4
44d10e6
5d59f87
3ecde56
0d43838
0006e71
de7b972
7762408
5ae61c4
7ce464b
0fe5c1a
e4ed46a
95d673f
5823c97
58f36ce
98c51cc
52b2084
4570fc5
06eea91
ccc2358
8253e0b
4670db6
f0e87a7
db92d39
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,320 @@ | ||||||
defmodule Scholar.Linear.BayesianRidgeRegression do | ||||||
require Nx | ||||||
import Nx.Defn | ||||||
import Scholar.Shared | ||||||
|
||||||
@derive {Nx.Container, | ||||||
containers: [:coefficients, :intercept, :alpha, :lambda, :sigma, :rmse, :iterations, :scores]} | ||||||
defstruct [:coefficients, :intercept, :alpha, :lambda, :sigma, :rmse, :iterations, :scores] | ||||||
|
||||||
opts = [ | ||||||
iterations: [ | ||||||
type: :pos_integer, | ||||||
default: 300, | ||||||
doc: """ | ||||||
Maximum number of iterations before stopping the fitting algorithm. | ||||||
The number of iterations may be lower is parameters converge. | ||||||
""" | ||||||
], | ||||||
sample_weights: [ | ||||||
type: | ||||||
{:or, | ||||||
[ | ||||||
{:custom, Scholar.Options, :non_negative_number, []}, | ||||||
{:list, {:custom, Scholar.Options, :non_negative_number, []}}, | ||||||
{:custom, Scholar.Options, :weights, []} | ||||||
]}, | ||||||
doc: """ | ||||||
The weights for each observation. If not provided, | ||||||
all observations are assigned equal weight. | ||||||
""" | ||||||
], | ||||||
fit_intercept?: [ | ||||||
type: :boolean, | ||||||
default: true, | ||||||
doc: """ | ||||||
If set to `true`, a model will fit the intercept. Otherwise, | ||||||
the intercept is set to `0.0`. The intercept is an independent term | ||||||
in a linear model. Specifically, it is the expected mean value | ||||||
of targets for a zero-vector on input. | ||||||
""" | ||||||
], | ||||||
alpha_init: [ | ||||||
type: {:custom, Scholar.Options, :non_negative_number, []}, | ||||||
doc: ~S""" | ||||||
The initial value for alpha. This parameter influences the precision of the noise. | ||||||
`:alpha` must be a non-negative float i.e. in [0, inf). | ||||||
Defaults to 1/Var(y). | ||||||
""" | ||||||
], | ||||||
lambda_init: [ | ||||||
type: {:custom, Scholar.Options, :non_negative_number, []}, | ||||||
default: 1.0, | ||||||
doc: ~S""" | ||||||
The initial value for lambda. This parameter influences the precision of the weights. | ||||||
`:lambda` must be a non-negative float i.e. in [0, inf). | ||||||
Defaults to 1. | ||||||
""" | ||||||
], | ||||||
alpha_1: [ | ||||||
type: {:custom, Scholar.Options, :non_negative_number, []}, | ||||||
default: 1.0e-6, | ||||||
doc: ~S""" | ||||||
Hyper-parameter : shape parameter for the Gamma distribution prior | ||||||
over the alpha parameter. | ||||||
""" | ||||||
], | ||||||
alpha_2: [ | ||||||
type: {:custom, Scholar.Options, :non_negative_number, []}, | ||||||
default: 1.0e-6, | ||||||
doc: ~S""" | ||||||
Hyper-parameter : inverse scale (rate) parameter for the Gamma distribution prior | ||||||
over the alpha parameter. | ||||||
""" | ||||||
], | ||||||
lambda_1: [ | ||||||
type: {:custom, Scholar.Options, :non_negative_number, []}, | ||||||
default: 1.0e-6, | ||||||
doc: ~S""" | ||||||
Hyper-parameter : shape parameter for the Gamma distribution prior | ||||||
over the lambda parameter. | ||||||
""" | ||||||
], | ||||||
lambda_2: [ | ||||||
type: {:custom, Scholar.Options, :non_negative_number, []}, | ||||||
default: 1.0e-6, | ||||||
doc: ~S""" | ||||||
Hyper-parameter : inverse scale (rate) parameter for the Gamma distribution prior | ||||||
over the lambda parameter. | ||||||
""" | ||||||
], | ||||||
eps: [ | ||||||
type: :float, | ||||||
default: 1.0e-8, | ||||||
doc: | ||||||
"The convergence tolerance. When `Nx.sum(Nx.abs(coef - coef_new)) < :eps`, the algorithm is considered to have converged." | ||||||
] | ||||||
] | ||||||
|
||||||
@opts_schema NimbleOptions.new!(opts) | ||||||
deftransform fit(x, y, opts \\ []) do | ||||||
opts = NimbleOptions.validate!(opts, @opts_schema) | ||||||
|
||||||
opts = | ||||||
[ | ||||||
sample_weights_flag: opts[:sample_weights] != nil | ||||||
] ++ | ||||||
opts | ||||||
|
||||||
{sample_weights, opts} = Keyword.pop(opts, :sample_weights, 1.0) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would set default |
||||||
x_type = to_float_type(x) | ||||||
|
||||||
sample_weights = | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no need to check if
will give
|
||||||
if Nx.is_tensor(sample_weights), | ||||||
do: Nx.as_type(sample_weights, x_type), | ||||||
else: Nx.tensor(sample_weights, type: x_type) | ||||||
|
||||||
# handle vector types | ||||||
# handle default alpha value, add eps to avoid division by 0 | ||||||
eps = Nx.Constants.smallest_positive_normal(x_type) | ||||||
default_alpha = Nx.divide(1, Nx.add(Nx.variance(x), eps)) | ||||||
JoaquinIglesiasTurina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
alpha = Keyword.get(opts, :alpha_init, default_alpha) | ||||||
alpha = Nx.tensor(alpha, type: x_type) | ||||||
opts = Keyword.put(opts, :alpha_init, alpha) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tensors should never be passed as options. You should always pass all tensors as arguments to the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that model hyper parameters, which are plain scalars, are ok to be passed as opts to Please note that those hyperparameters need to be pased to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It depends. Everything that is passed as an option to a defn results in a different compilation. So, if you can convert something to a tensor and pass it as a tensor, that's ideally better. But that's not always possible. Sometimes the value is used as a shape, and different shapes always lead to different compilations.
Correct. |
||||||
|
||||||
{lambda, opts} = Keyword.pop!(opts, :lambda_init) | ||||||
lambda = Nx.tensor(lambda, type: x_type) | ||||||
opts = Keyword.put(opts, :lambda_init, lambda) | ||||||
zeros_list = for k <- 0..opts[:iterations], do: 0 | ||||||
JoaquinIglesiasTurina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
scores = Nx.tensor(zeros_list, type: x_type) | ||||||
JoaquinIglesiasTurina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
{coefficients, intercept, alpha, lambda, rmse, iterations, has_converged, scores, sigma} = | ||||||
fit_n(x, y, sample_weights, scores, opts) | ||||||
iterations = Nx.to_number(iterations) | ||||||
scores = scores | ||||||
|> Nx.to_list() | ||||||
|> Enum.take(iterations) | ||||||
|
||||||
if Nx.to_number(has_converged) == 1 do | ||||||
IO.puts("Convergence after #{Nx.to_number(iterations)} iterations") | ||||||
end | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You really can't call this here. You have to assume you never can really read the tensor values, even inside |
||||||
|
||||||
%__MODULE__{ | ||||||
coefficients: coefficients, | ||||||
intercept: intercept, | ||||||
alpha: Nx.to_number(alpha), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here the same as Jose mentioned, you cannot use |
||||||
lambda: Nx.to_number(lambda), | ||||||
sigma: sigma, | ||||||
rmse: Nx.to_number(rmse), | ||||||
iterations: iterations, | ||||||
scores: scores | ||||||
} | ||||||
end | ||||||
|
||||||
defnp fit_n(x, y, sample_weights, scores, opts) do | ||||||
x = to_float(x) | ||||||
y = to_float(y) | ||||||
|
||||||
{x_offset, y_offset} = | ||||||
if opts[:fit_intercept?] do | ||||||
preprocess_data(x, y, sample_weights, opts) | ||||||
else | ||||||
x_offset_shape = Nx.axis_size(x, 1) | ||||||
y_reshaped = if Nx.rank(y) > 1, do: y, else: Nx.reshape(y, {:auto, 1}) | ||||||
y_offset_shape = Nx.axis_size(y_reshaped, 1) | ||||||
|
||||||
{Nx.broadcast(Nx.tensor(0.0, type: Nx.type(x)), {x_offset_shape}), | ||||||
Nx.broadcast(Nx.tensor(0.0, type: Nx.type(y)), {y_offset_shape})} | ||||||
end | ||||||
|
||||||
{x, y} = {x - x_offset, y - y_offset} | ||||||
|
||||||
{x, y} = | ||||||
if opts[:sample_weights_flag] do | ||||||
rescale(x, y, sample_weights) | ||||||
else | ||||||
{x, y} | ||||||
end | ||||||
|
||||||
alpha = opts[:alpha_init] | ||||||
lambda = opts[:lambda_init] | ||||||
|
||||||
alpha_1 = opts[:alpha_1] | ||||||
alpha_2 = opts[:alpha_2] | ||||||
lambda_1 = opts[:lambda_1] | ||||||
lambda_2 = opts[:lambda_2] | ||||||
|
||||||
iterations = opts[:iterations] | ||||||
|
||||||
xt_y = Nx.dot(Nx.transpose(x), y) | ||||||
JoaquinIglesiasTurina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
{u, s, vh} = Nx.LinAlg.svd(x, full_matrices?: false) | ||||||
eigenvals = Nx.pow(s, 2) | ||||||
JoaquinIglesiasTurina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
{n_samples, n_features} = Nx.shape(x) | ||||||
{coef, rmse} = update_coef(x, y, n_samples, n_features, xt_y, u, vh, eigenvals, alpha, lambda) | ||||||
|
||||||
{{coef, alpha, lambda, rmse, iter, has_converged, scores}, _} = | ||||||
while {{coef, rmse, alpha, lambda, iter = 0, has_converged = Nx.u8(0), scores = scores}, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
JoaquinIglesiasTurina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
{x, y, xt_y, u, s, vh, eigenvals, alpha_1, alpha_2, lambda_1, lambda_2, iterations}}, | ||||||
iter <= iterations and not has_converged do | ||||||
new_score = | ||||||
log_marginal_likelihood( | ||||||
coef, | ||||||
rmse, | ||||||
n_samples, | ||||||
n_features, | ||||||
eigenvals, | ||||||
alpha, | ||||||
lambda, | ||||||
alpha_1, | ||||||
alpha_2, | ||||||
lambda_1, | ||||||
lambda_2 | ||||||
) | ||||||
scores = Nx.put_slice(scores, [iter], Nx.new_axis(new_score, -1)) | ||||||
|
||||||
gamma = Nx.sum(alpha * eigenvals / (lambda + alpha * eigenvals)) | ||||||
lambda = (gamma + 2 * lambda_1) / (Nx.sum(coef ** 2) + 2 * lambda_2) | ||||||
alpha = (n_samples - gamma + 2 * alpha_1) / (rmse + 2 * alpha_2) | ||||||
|
||||||
{coef_new, rmse} = | ||||||
update_coef(x, y, n_samples, n_features, xt_y, u, vh, eigenvals, alpha, lambda) | ||||||
|
||||||
has_converged = Nx.sum(Nx.abs(coef - coef_new)) < 1.0e-8 | ||||||
|
||||||
{{coef_new, alpha, lambda, rmse, iter + 1, has_converged, scores}, | ||||||
{x, y, xt_y, u, s, vh, eigenvals, alpha_1, alpha_2, lambda_1, lambda_2, iterations}} | ||||||
end | ||||||
|
||||||
intercept = set_intercept(coef, x_offset, y_offset, opts[:fit_intercept?]) | ||||||
scaled_sigma = Nx.dot(Nx.transpose(vh), vh / Nx.new_axis(eigenvals + lambda / alpha, -1)) | ||||||
JoaquinIglesiasTurina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
sigma = scaled_sigma / alpha | ||||||
{coef, intercept, alpha, lambda, rmse, iter, has_converged, scores, sigma} | ||||||
end | ||||||
|
||||||
defnp update_coef( | ||||||
x, | ||||||
y, | ||||||
n_samples, | ||||||
n_features, | ||||||
xt_y, | ||||||
u, | ||||||
vh, | ||||||
eigenvals, | ||||||
alpha, | ||||||
lambda | ||||||
) do | ||||||
scaled_eigens = eigenvals + lambda / alpha | ||||||
regularization = vh / Nx.new_axis(scaled_eigens, -1) | ||||||
reg_transpose = Nx.dot(regularization, xt_y) | ||||||
coef = Nx.dot(Nx.transpose(vh), reg_transpose) | ||||||
JoaquinIglesiasTurina marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
error = y - Nx.dot(x, coef) | ||||||
squared_error = error ** 2 | ||||||
rmse = Nx.sum(squared_error) | ||||||
|
||||||
{coef, rmse} | ||||||
end | ||||||
|
||||||
defnp log_marginal_likelihood( | ||||||
coef, | ||||||
rmse, | ||||||
n_samples, | ||||||
n_features, | ||||||
eigenvals, | ||||||
alpha, | ||||||
lambda, | ||||||
alpha_1, | ||||||
alpha_2, | ||||||
lambda_1, | ||||||
lambda_2 | ||||||
) do | ||||||
logdet_sigma = -1 * Nx.sum(Nx.log(lambda + alpha * eigenvals)) | ||||||
score_lambda = lambda_1 * Nx.log(lambda) - lambda_2 * lambda | ||||||
score_alpha = alpha_1 * Nx.log(alpha) - alpha_2 * alpha | ||||||
|
||||||
score_parameters = | ||||||
n_features * Nx.log(lambda) + n_samples * Nx.log(alpha) - alpha * rmse - | ||||||
lambda * Nx.sum(coef ** 2) | ||||||
|
||||||
score = | ||||||
0.5 * (score_parameters + logdet_sigma - n_samples * Nx.log(2 * Nx.Constants.pi())) | ||||||
|
||||||
score_alpha + score_lambda + score | ||||||
end | ||||||
|
||||||
deftransform predict(%__MODULE__{coefficients: coeff, intercept: intercept} = _model, x) do | ||||||
predict_n(coeff, intercept, x) | ||||||
end | ||||||
|
||||||
defnp predict_n(coeff, intercept, x), do: Nx.dot(x, [-1], coeff, [-1]) + intercept | ||||||
|
||||||
# Implements sample weighting by rescaling inputs and | ||||||
# targets by sqrt(sample_weight). | ||||||
defnp rescale(x, y, sample_weights) do | ||||||
case Nx.shape(sample_weights) do | ||||||
{} = scalar -> | ||||||
scalar = Nx.sqrt(scalar) | ||||||
{scalar * x, scalar * y} | ||||||
|
||||||
_ -> | ||||||
scale = sample_weights |> Nx.sqrt() |> Nx.make_diagonal() | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can be simplified by doing
Suggested change
and using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not believe this change is possible.
As far as I can tell, An alternative would be keeping 2 Please, let me know if you think of a better way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are correct, but we are able to use
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This works and I find it's a pretty clean solution. Thank you for your comments. |
||||||
{Nx.dot(scale, x), Nx.dot(scale, y)} | ||||||
end | ||||||
end | ||||||
|
||||||
defnp set_intercept(coeff, x_offset, y_offset, fit_intercept?) do | ||||||
if fit_intercept? do | ||||||
y_offset - Nx.dot(x_offset, coeff) | ||||||
else | ||||||
Nx.tensor(0.0, type: Nx.type(coeff)) | ||||||
end | ||||||
end | ||||||
|
||||||
defnp preprocess_data(x, y, sample_weights, opts) do | ||||||
if opts[:sample_weights_flag], | ||||||
do: | ||||||
{Nx.weighted_mean(x, sample_weights, axes: [0]), | ||||||
Nx.weighted_mean(y, sample_weights, axes: [0])}, | ||||||
else: {Nx.mean(x, axes: [0]), Nx.mean(y, axes: [0])} | ||||||
end | ||||||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would allow
sample_weights
to be onlytype: {:custom, Scholar.Options, :weights, []}
.