Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] First pass implementation of support functionality for change detection and modelling #9

Merged
merged 8 commits into from
Mar 12, 2018

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Mar 8, 2018

Description
This implements 1) a naive Bayes classifier, using our distribution models, which will be used for modelling the probability of a change, and 2) a change detector framework, currently supporting detecting level shifts and time shifts, which works by comparing BIC of the various possible hypotheses against one another and a null hypothesis that there is no change. (Note that this work is going to initially be implemented on a feature branch to enable incremental review of the changes and so we can evaluate on all our QA data sets before merging to master.)

Effects
The functionality are not used in this change, so it has no effect on our results.

static CppUnit::Test *suite();

private:
using TGenerator = double (*)(ml::core_t::TTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the (*) necessary here with a using? You didn't need it two lines below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're completely right. I think I may just be a conversion from typedef error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this has to be a function pointer type (otherwise I can't create a container of them). The other is just a function type. I'll switch all this code to use std::function since I think the intention is then clearer.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

double logLikelihood;
if (count >= 5 && m_ResidualModel->jointLogMarginalLikelihood(
Copy link

@hendrikmuhs hendrikmuhs Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deserves an explaination, why 5? If count < 5 do we need samples? Seems like not.

Copy link
Contributor Author

@tveasey tveasey Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mean can vary early on, so the effective number of parameters for small n is greater than 1 (the mean) for the case one assumes that there might be a level shift. I started off trying to correct for the bias, but decided in the end it was easier just to allow some updates for the mean to stabilise before first updating the log-likelihood. The exact value of this constant is not really important, but ideally one wants to choose it as small as possible without incurring a significant boost to the sum log-likelihood for this hypothesis. This was empirically a good choice, i.e. for the case that there is no level shift I found that the sum log-likelihood was very similar with and without an assumed mean shift for many independent runs. I commented on this in CUnivariateTimeSeriesLevelShiftModel but will cross reference that comment here.

@tveasey tveasey merged commit 5587587 into elastic:feature/forecast-enhancements-part-2 Mar 12, 2018
@sophiec20 sophiec20 added the :ml label Apr 4, 2018
droberts195 pushed a commit that referenced this pull request Apr 23, 2018
…tection and modelling (#9)

This implements 1) a naive Bayes classifier, using our distribution models, which will be used 
for modelling the probability of a change, and 2) a change detector framework, currently 
supporting detecting level shifts and time shifts, which works by comparing BIC of the various 
possible hypotheses against one another and a null hypothesis that there is no change.
@droberts195
Copy link
Contributor

Removing version label as this is a feature branch PR and it causes confusion when generating release notes. (For interest this was eventually merged to 6.4 and above in #92.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants