Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix influencer count and influence calculation #150

Merged
merged 5 commits into from
Jul 11, 2018

Conversation

hendrikmuhs
Copy link

@hendrikmuhs hendrikmuhs commented Jul 10, 2018

This fixes counting of influencer per bucket, prior this fix the count has always been set to 1.

Notes:

  • fix == 1st commit, the 2nd and 3rd add test coverage and improve doc/robustness
  • it turned out that this functionality had a test (testVarp) but test was forgotten to be added in the test suite
  • change affects results if influencers are used as influencer score are now changed

Fixes #24

Release note: Fixes influence count per bucket for metric population analyses, which was
wrong and lead to incorrect influencer scoring

params, influencedValue[0]);
if (computeInfluencedValue(value, count, i->second.first, i->second.second,
params, influencedValue[0]) == false) {
LOG_ERROR(<< "Failed to compute influencer value");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tveasey do you have a better idea for this error message?

Copy link
Contributor

@tveasey tveasey Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, at least for us, it is very useful to see the function arguments, i.e. value, count, i->second.first and i->second.second.

Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a couple of suggestions only.

}
};

//! \brief Computes the value of the variance statistic on a set difference.
class CVarianceDifference {
public:
//! Features.
void operator()(const TDouble1Vec& v,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe document the parameters to this function too? i.e. v == overall variance and mean, n == overall count, vi == influencer variance and mean, etc.

@@ -59,6 +59,7 @@ Age seasonal components in proportion to the fraction of values with which they'
Persist and restore was missing some of the trend model state ({pull}#99[#99])
Stop zero variance data generating a log error in the forecast confidence interval calculation ({pull}#107[#107])
Fix corner case failing to calculate lgamma values and the correspoinding log errors ({pull}#126[#126])
Influence count per bucket was wrong and lead to wrong influencer scoring ({pull}#150[#150])
Copy link
Contributor

@tveasey tveasey Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, I'd say the "Influence count per bucket for metric population analyses was wrong..."

@hendrikmuhs hendrikmuhs merged commit d41de34 into elastic:master Jul 11, 2018
hendrikmuhs pushed a commit to hendrikmuhs/ml-cpp that referenced this pull request Jul 11, 2018
Fix counting of influencer per bucket for metric population analyses, prior this fix the count has always 
been set to 1.

Fixes elastic#24
hendrikmuhs pushed a commit that referenced this pull request Jul 12, 2018
Fix counting of influencer per bucket for metric population analyses, prior this fix the count has always 
been set to 1.

Fixes #24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Bad values for the variance scale
2 participants