-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Implements an absolute goodness-of-fit test to accept a change #21
[ML] Implements an absolute goodness-of-fit test to accept a change #21
Conversation
…ge-modelling-part-3
…ge-modelling-part-3
…ge-modelling-part-3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good Tom.
@@ -555,6 +627,16 @@ void CUnivariateTimeShiftModel::addSamples(std::size_t count, | |||
{ | |||
this->addLogLikelihood(logLikelihood); | |||
} | |||
for (const auto &weight : weights) | |||
{ | |||
double expectedLogLikelihood; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is safe in this context but I would prefer that this variable was initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend not to explicitly initialise variables which are immediately set by a subsequent function call. Also, we tend to run into this case more than we might otherwise because we mandated to use bool return types to indicate success/failure rather than exceptions.
I wonder, since C++11 cleaned up initialisation, whether we should mandate instead that variables are, as a minimum, value initialised, i.e. double expectedLogLikelihood{};
for example. Unfortunately, there are will be a lot of cases which violate this at present, but something we could think about adding to the coding standards and then fixing in a targeted change.
@@ -350,6 +366,9 @@ class MATHS_EXPORT CUnivariateTimeShiftModel final : public CUnivariateChangeMod | |||
//! The BIC of applying the time shift. | |||
virtual double bic() const; | |||
|
|||
//! The expected BIC of applying the change. | |||
virtual double expectedBic() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going forward I think it would be best practice to use the 'override' keyword in cases such as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Visual Studio 2013 doesn't support override
, so bear in mind this will create a backporting headache. I agree for 7.0-only code we should use it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed using override
on one method generates warnings for every other virtual function that is not marked override
(-Winconsistent-missing-override) in the compilation unit. It should probably be done in a single commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to agree. I think this is something we should adopt, but also let's make this in a single change and target at 7.0 only as there would be no need to back port.
Removing version label as this is a feature branch PR and it causes confusion when generating release notes. (For interest this was eventually merged to 6.4 and above in #92.) |
The key issue we had with change detection, prior to this PR, was that all the tests were relative, i.e. in terms of relative evidence to no change. This meant that if the time series changed in a way which was not well described by one of the possible changes we consider, it was quite possible to accept a change versus the hypothesis that the time series hadn't changed. This led to degraded adaption of the model: whose parameters should be rapidly relearnt as we do now in this case.
This PR implements an absolute "goodness-of-fit" test for each change, by additionally testing versus its expected BIC given the residual distribution. It means we will only accept changes which are a reasonably accurate description of the change currently occurring in the time series.