-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] First pass implementation of support functionality for change detection and modelling #9
[ML] First pass implementation of support functionality for change detection and modelling #9
Conversation
static CppUnit::Test *suite(); | ||
|
||
private: | ||
using TGenerator = double (*)(ml::core_t::TTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the (*)
necessary here with a using
? You didn't need it two lines below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're completely right. I think I may just be a conversion from typedef error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this has to be a function pointer type (otherwise I can't create a container of them). The other is just a function type. I'll switch all this code to use std::function since I think the intention is then clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
double logLikelihood; | ||
if (count >= 5 && m_ResidualModel->jointLogMarginalLikelihood( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deserves an explaination, why 5? If count < 5 do we need samples? Seems like not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mean can vary early on, so the effective number of parameters for small n is greater than 1 (the mean) for the case one assumes that there might be a level shift. I started off trying to correct for the bias, but decided in the end it was easier just to allow some updates for the mean to stabilise before first updating the log-likelihood. The exact value of this constant is not really important, but ideally one wants to choose it as small as possible without incurring a significant boost to the sum log-likelihood for this hypothesis. This was empirically a good choice, i.e. for the case that there is no level shift I found that the sum log-likelihood was very similar with and without an assumed mean shift for many independent runs. I commented on this in CUnivariateTimeSeriesLevelShiftModel
but will cross reference that comment here.
…tection and modelling (#9) This implements 1) a naive Bayes classifier, using our distribution models, which will be used for modelling the probability of a change, and 2) a change detector framework, currently supporting detecting level shifts and time shifts, which works by comparing BIC of the various possible hypotheses against one another and a null hypothesis that there is no change.
Removing version label as this is a feature branch PR and it causes confusion when generating release notes. (For interest this was eventually merged to 6.4 and above in #92.) |
Description
This implements 1) a naive Bayes classifier, using our distribution models, which will be used for modelling the probability of a change, and 2) a change detector framework, currently supporting detecting level shifts and time shifts, which works by comparing BIC of the various possible hypotheses against one another and a null hypothesis that there is no change. (Note that this work is going to initially be implemented on a feature branch to enable incremental review of the changes and so we can evaluate on all our QA data sets before merging to master.)
Effects
The functionality are not used in this change, so it has no effect on our results.