-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Felix/regression #3
Conversation
p = model.predict(data.X_test) # N_test, K | ||
p = model.predict(data.X_test) # [N_test x K] or [samples x N_test x K] | ||
|
||
assert len(p.shape) in {2, 3} # 3-dim in case of approximate predictions (multiple samples per each X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert len(p.shape) in {2, 3} # 3-dim in case of approximate predictions (multiple samples per each X) | |
assert p.ndim in {2, 3} # 3-dim in case of approximate predictions (multiple samples per each X) |
|
||
res['test_acc'] = np.average(np.array(pred == data.Y_test.flatten()).astype(float)) | ||
|
||
res['Y_test'] = data.Y_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to store res["Y_test"]
in the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you add Y_test and p to res for both p.shape == 2 and p.shape == 3. Better to take it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just followed Hugh's implementation. I don't know why he added it.
# evaluation metrics | ||
res = {} | ||
|
||
logp = multinomial.logpmf(Y_oh, n=1, p=p) | ||
if len(p.shape) == 2: # keep analysis as in the original code in case 2-dim predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using p.ndim
is more concise here
logp = multinomial.logpmf(Y_oh, n=1, p=p) | ||
if len(p.shape) == 2: # keep analysis as in the original code in case 2-dim predictions | ||
|
||
logp = multinomial.logpmf(Y_oh, n=1, p=p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding shapes here would be useful. I believe logp is [N] here?
|
||
pred = np.argmax(p, axis=-1) | ||
# Mixture test likelihood (mean over per data point evaluations) | ||
logp = logsumexp(res['test_loglik'], axis=0) - np.log(p.shape[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would create a helper function meansumexp
, you can use it in the other file as well
|
||
pred = np.argmax(p, axis=-1) | ||
p = np.mean(p, axis=0) | ||
pred = np.argmax(p, axis=-1) | ||
|
||
res['test_acc'] = np.average(np.array(pred == data.Y_test.flatten()).astype(float)) | ||
|
||
res['Y_test'] = data.Y_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth figuring out, and deleting them if we don't use them anywhere. Storing 2 arrays of length N in our database is not a good idea anyway.
bayesian_benchmarks/tasks/utils.py
Outdated
import numpy as np | ||
from scipy.special import logsumexp | ||
|
||
def meansumexp(logps: List[np.ndarray]) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a helper function I would make it more general by letting it accept an array (instead of a List of array's) and an axis on which to reduce. Namely,
meansumexp(a: np.array, axis)
log_eps = np.log(1e-12) # log probability threshold | ||
log_1_minus_eps = np.log(1.0 - 1e-12) | ||
|
||
if len(m.shape) == 2: # keep analysis as in the original code in case of 2-dim predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with ndim
m, v = model.predict(data.X_test) # both [data points x output dim] or [samples x data points x output dim] | ||
|
||
assert len(m.shape) == len(v.shape) | ||
assert len(m.shape) in {2, 3} # 3-dim in case of approximate predictions (multiple samples per each X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ndim
is cleaner here
tests/test_tasks.py
Outdated
def test_regression(tuple): | ||
data, model, correct_result = tuple | ||
result = run_regression(None, data=data, model=model, is_test=True) | ||
assert correct_result['test_loglik'] == pytest.approx(result['test_loglik'], 1e-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of us use np.testing.assert_almost_equal
instead of pytest.approx
. np.testing
gives nice and clear errors when the assert isn't fulfilled.
tests/test_tasks.py
Outdated
def test_regression(tuple): | ||
data, model, correct_result = tuple | ||
result = run_regression(None, data=data, model=model, is_test=True) | ||
assert correct_result['test_loglik'] == pytest.approx(result['test_loglik'], 1e-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the copying here I would write this as a for loop over
for (ref_key, ref_value), (actual_key, actual_value) in zip(correct_result.items(), result.items()):
np.testing.assert_equal(ref_key, actual_key)
np.testing.assert_almost_equal(ref_value, actual_value)
tests/test_tasks.py
Outdated
assert correct_result['test_loglik'] == pytest.approx(result['test_loglik'], 1e-3) | ||
assert correct_result['test_acc'] == pytest.approx(result['test_acc'], 1e-3) | ||
assert np.allclose(correct_result['Y_test'], result['Y_test'], rtol=0.0, atol=1e-3) | ||
assert np.allclose(correct_result['p_test'], result['p_test'], rtol=0.0, atol=1e-3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: you want to newline at the end of the file.
tests/test_tasks.py
Outdated
Approximate regression mock. | ||
""" | ||
def predict(self, X: np.ndarray) -> (np.ndarray, np.ndarray): | ||
mu = np.array([[[1., 2., 3.], [4., 5., 6.]], [[1.5, 2.5, 3.5], [4.5, 5.5, 6.5]]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain in the code why these classes return these specific shapes, please.
bayesian_benchmarks/tasks/utils.py
Outdated
:param axis: determines reduction | ||
:return: avg probability value [1] | ||
""" | ||
return np.mean(logsumexp(logps, axis=axis) - np.log(len(logps))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.log(len(logps))
doesnt work np.log(logp.shape[axis])
No description provided.