-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save predictions to sacc #349
Conversation
Apart from the missing test and mypy errors, this can be reviewed I think @marcpaterno @vitenti |
…y the type of its contents.
@tilmantroester , please, take a look on the modification to see if you agree with the current version. |
|
||
# Adding Gaussian noise defined by the covariance matrix. | ||
assert self.cholesky is not None | ||
predictions += np.dot(self.cholesky, np.random.randn(len(predictions))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding noise should be optional. A common use case is that we want a noise-free theory prediction as the data vector to test our models and pipelines.
Thanks for the help! I think adding noise to the prediction should be factored out of the functionality of saving the prediction. That way the functionality can also be used to save predictions+noise during sampling and would address #323. |
* Updating documentation. * More tests for Statistics.
* Noise on realization is now optional.
@tilmantroester , what about making it optional? Take a look on the last commits. |
I'd move |
…ta_vector and make_realization which generate a new sacc with or without noise.
I think that we came up with the same solution, I just implemented your suggestion. Please, take a look. |
@tilmantroester I think that it is now complete. Please tell me if you think there is something else to do here, otherwise we will review and merge it during our next Firecrown meeting. |
Looks good to me, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Firecrown team will make the changes we have requested.
self.cov = cov | ||
self.cholesky = scipy.linalg.cholesky(self.cov, lower=True) | ||
self.inv_cov = np.linalg.inv(cov) | ||
|
||
self.state = State.READY | ||
|
||
@final | ||
def get_cov(self) -> npt.NDArray[np.float64]: | ||
"""Gets the current covariance matrix.""" | ||
def get_cov(self, statistic: Optional[Statistic] = None) -> npt.NDArray[np.float64]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would prefer to pass the numpy array of indices that corresponds to the sub-matrix desired.
This would allow the caller to obtain the sub-matrix for two or more statistics, when that is desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The application I wrote this for was to get error bars when plotting the data vector. The idea was specifically to abstract away the indices and instead use the statistics, since that's what the user interacts with. I see the use of passing a list of statistics though, to get their corresponding sub-matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will expand the interface to have a single method get_cov
that can accept:
- a single
Statistic
- a list of
Statistic
- a single
np.ndarray
(the indices) - a list of
np.ndarray
(a list of indices)
In the case when a stat or list of stats is passed in, we need also to make sure that the stat (or all the stats) that have been passed are in the likelihood object on which we've called the get_cov
method. We will make the code verify this.
We also have to specify the order in which entries appear in the returned matrix. We propose to respect the order of the entries of the list of statistics (or of numpy arrays), so that the user-specified order of the list controls the ordering of the elements in the returned matrix result, rather than the order of the entires in the SACC data object controlling the ordering of the entries in the returned matrix.
For example, if we pass a list of stats of length 3 (note the order of the entries in this passed to get_cov
:
stats1 -> 0:9, stats2 -> 10:19, stats3 -> 20:29 => get_cov([stats1,stats3,stats2])
-> 0:9 + 20:29 + 10:19
|
||
assert len(sacc_indices) == len(new_data_vector) | ||
|
||
if strict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider collapsing the nested if
s into a single if
with multiple conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll deal with this in a latter issue.
@@ -15,3 +16,12 @@ def compute_loglike(self, tools: ModelingTools): | |||
"""Compute the log-likelihood.""" | |||
|
|||
return -0.5 * self.compute_chisq(tools) | |||
|
|||
def make_realization_vector(self) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be checking pre- and post-conditions on self.state in every method.
Consider introducing a decorator to solve any pylint complaints about duplicated code.
@@ -120,6 +120,8 @@ def __init__(self, parameter_prefix: Optional[str] = None): | |||
super().__init__(parameter_prefix=parameter_prefix) | |||
self.sacc_indices: Optional[npt.NDArray[np.int64]] | |||
self.ready = False | |||
self.computed_theory_vector = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider turning computed_theory_vector
into a method, which would test the nullity of theory_vector
, to remove the possibility of having an invalid pair of states.
@@ -298,7 +289,7 @@ def read(self, sacc_data: sacc.Sacc) -> None: | |||
# I don't think we need these copies, but being safe here. | |||
self._ell_or_theta = _ell_or_theta.copy() | |||
self.data_vector = DataVector.create(_stat) | |||
self.measured_statistic_ = self.data_vector | |||
self.data_vector = self.data_vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a needless assignment.
"This class does not implement make_realization_vector." | ||
) | ||
|
||
def make_realization( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy
misses that this method does not return the required sacc.Sacc
object.
This method should raise a NotImplementedError
, so that it is only an error to use a subclass that lacks this method if that method is called.
"""Set self[key] to value; raise TypeError if Value is not Updatable.""" | ||
if not isinstance(value, Updatable): | ||
raise TypeError( | ||
"Values inserted into an UpdatableCollection must be Updatable" | ||
"Only updatable items can be appended to an UpdatableCollection" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing 'updatable' -> 'Updatable'
assert np.all(theory_vector == np.array([1.0, 1.0, 1.0])) | ||
|
||
|
||
def test_chisquared_compute_vector_not_implemented( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming: test_compute_chisquared_works_if_compute_theory_vector_raises_not_implemented_error
likelihood.update(params) | ||
likelihood.compute_chisq(tools_with_vanilla_cosmology) | ||
|
||
new_sacc = likelihood.make_realization(sacc_data_for_trivial_stat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding pytest-rerunfailures to our list of required Conda packages, and decorating this test with @pytest.mark.flaky(reruns=2)
. This should reduce the random failure probability from 1 in a million to a more comfortable 1 in a trillion.
This decoration should probably be done to all the tests that are statistical in nature.
) | ||
|
||
|
||
def test_make_realization_no_noise( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should not be decorated for re-running failures; it is not a statistical test.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #349 +/- ##
==========================================
+ Coverage 95.22% 95.42% +0.20%
==========================================
Files 36 36
Lines 2490 2556 +66
==========================================
+ Hits 2371 2439 +68
+ Misses 119 117 -2 ☔ View full report in Codecov by Sentry. |
|
||
assert len(sacc_indices) == len(new_data_vector) | ||
|
||
if strict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll deal with this in a latter issue.
This PR addresses #346.
It writes the predicted theory vector into a sacc object, replacing the corresponding data point. By default the method raises an error if the sacc object contains data points that are not being overwritten by the predictions to avoid inconsistency.
Future additions to the functionality would be to draw a sample from the likelihood, instead of using the (zero-noise) theory prediction.
One discussion point is how to store the predictions. At the moment, only
TwoPoint
does this and usespredicted_statistic_
for this. I've usedpredicted_statistic_
for now for compatibility with existing code but I wouldn't mind renaming this totheory_vector
to match thedata_vector
.The test doesn't cover the checking of the prediction against the content of the sacc object yet.