[ML] Bad values for the variance scale #24

tveasey · 2018-03-26T10:03:16Z

A user data set has shown up two issues with the variance scale calculation in version 6.2.2 of the analytics:

it is sometimes negative(!),
it is sometimes infinite.

In particular, we are seeing the following error messages logged:
Error calculating joint distribution: Bad variance scale -5.75
Error calculating joint distribution: Bad variance scale inf

There is no prospect of getting hold of the data set; however the data characteristics sound benign. There were two detectors:

detector high_mean(x) over y influencers y,z.
detector high_median(x) over y influencers y,z

For x we have min: 0, max: 4.34571, avg: 2.0736 and cardinality of y is 430.

This issue is to investigate routes by which this problem could occur. The initial areas to investigate are CTimeSeriesDecomposition::scale and the calculation of the count variance scale, particularly for influencers.

cc @LucaWintergerst.

The text was updated successfully, but these errors were encountered:

LucaWintergerst · 2018-03-27T09:08:02Z

The exceptions do not happen if influencer z is removed.
I also replaced z (which was the hostname) with z_ip and the same thing happened

hendrikmuhs · 2018-07-06T14:16:03Z

Update:

The root cause has been identified, counting influencer occurrences per bucket has a bug. The fix is simple (1LOC). The change affects results:

Hopefully to the better, I will analyze the diff to be sure.

I followed the code history back to version 5.5, so it's not a recent regression but likely has been introduced by PR 144 in the old repo.

As said, the fix is simple, but I plan to take some more time for related code improvements and test cases.

Fix counting of influencer per bucket for metric population analyses, prior this fix the count has always been set to 1. Fixes #24

Fix counting of influencer per bucket for metric population analyses, prior this fix the count has always been set to 1. Fixes elastic#24

Fix counting of influencer per bucket for metric population analyses, prior this fix the count has always been set to 1. Fixes #24

tveasey added >bug :ml labels Mar 26, 2018

sophiec20 changed the title ~~Bad values for the variance scale~~ [ML] Bad values for the variance scale Mar 28, 2018

hendrikmuhs self-assigned this Jun 27, 2018

hendrikmuhs added the affects-results label Jul 6, 2018

hendrikmuhs mentioned this issue Jul 10, 2018

[ML] Fix influencer count and influence calculation #150

Merged

hendrikmuhs closed this as completed in #150 Jul 11, 2018

hendrikmuhs pushed a commit that referenced this issue Jul 11, 2018

[ML] Fix influencer count and influence calculation (#150)

d41de34

Fix counting of influencer per bucket for metric population analyses, prior this fix the count has always been set to 1. Fixes #24

hendrikmuhs pushed a commit that referenced this issue Jul 12, 2018

[ML] Fix influencer count and influence calculation (#150)

7410d96

Fix counting of influencer per bucket for metric population analyses, prior this fix the count has always been set to 1. Fixes #24

davidkyle mentioned this issue Jun 20, 2023

[NLP] Catch exceptions thrown during inference and report as errors #2542

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Bad values for the variance scale #24

[ML] Bad values for the variance scale #24

tveasey commented Mar 26, 2018 •

edited by LucaWintergerst

Loading

LucaWintergerst commented Mar 27, 2018

hendrikmuhs commented Jul 6, 2018

[ML] Bad values for the variance scale #24

[ML] Bad values for the variance scale #24

Comments

tveasey commented Mar 26, 2018 • edited by LucaWintergerst Loading

LucaWintergerst commented Mar 27, 2018

hendrikmuhs commented Jul 6, 2018

tveasey commented Mar 26, 2018 •

edited by LucaWintergerst

Loading