Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error combining LocalOutlierFactor with AnomalyFilter #1329

Closed
MarekWadinger opened this issue Oct 3, 2023 · 4 comments
Closed

Error combining LocalOutlierFactor with AnomalyFilter #1329

MarekWadinger opened this issue Oct 3, 2023 · 4 comments

Comments

@MarekWadinger
Copy link
Contributor

Versions

river version: development
Python version: Python 3.10.1
Operating system: macOS Sonoma 14.0 (23A344)

Describe the bug

Hello 👋

anomaly.LocalOutlierFactor (LOF) cannot be combined with AnomalyFilter to classify samples. The error is due to evaluation of score_one function prior to learn_one, raising IndexError. The error is, therefore, related to score_one function implemented in LOF.

I traced the error down to inability of calculating _initial_calculations due to lack of sufficient samples seen. I found a solution which, to the best of my knowledge, might be correct. Changing line 345-346 in anomaly.LocalOutlierFactor HERE
from

if len(self.x_scores) == 0:
            return None

to

if len(self.x_scores) == 0 or len(self.x_list) == 0:
            return 0.5

The originally returned None must be also replaced due to further TypeError raised if unchanged. It might be 0.0. Due to constraints on both tails of score in my projects, I set it to 0.5 in this proposed solution.

Let me know if that makes sense, I'd be happy to elaborate on any issues or comments.

Thank you 🙏

Steps/code to reproduce

from river import anomaly

lof = anomaly.QuantileFilter(anomaly.LocalOutlierFactor(), q=0.95)

X = [{"a": 0.5, "b": 1}, {"a": 1, "b": 1}]
for x in X:
    lof.learn_one(x)

Full Backtrace of Exception

IndexError: list index out of range IndexError Traceback (most recent call last) /river/bug_lof.ipynb Cell 3 line 7 5 X = [{\"a\": 0.5, \"b\": 1}, {\"a\": 1, \"b\": 1}] 6 for x in X: ----> 7 lof.learn_one(x)

File ~/river/anomaly/filter.py:179, in QuantileFilter.learn_one(self, *args, **learn_kwargs)
178 def learn_one(self, *args, **learn_kwargs):
--> 179 score = self.score_one(*args)
180 if not self.protect_anomaly_detector or not self.classify(score):
181 self.anomaly_detector.learn_one(*args, **learn_kwargs)

File ~/river/anomaly/base.py:146, in AnomalyFilter.score_one(self, *args, **kwargs)
130 def score_one(self, *args, **kwargs):
131 """Return an outlier score.
132
133 A high score is indicative of an anomaly. A low score corresponds to a normal observation.
(...)
144
145 """
--> 146 return self.anomaly_detector.score_one(*args, **kwargs)

File ~/river/anomaly/lof.py:371, in LocalOutlierFactor.score_one(self, x)
348 x_list_copy = self.x_list.copy()
349 (
350 nm,
351 x_list_copy,
(...)
368 self.lof,
369 )
--> 371 neighborhoods, rev_neighborhoods, k_dist, dist_dict = self._initial_calculations(
372 x_list_copy, nm, neighborhoods, rev_neighborhoods, k_dist, dist_dict
373 )
374 (
375 set_new_points,
376 set_neighbors,
(...)
379 set_upd_lof,
380 ) = define_sets(nm, neighborhoods, rev_neighborhoods)
381 reach_dist = calc_reach_dist_new_points(
382 set_new_points, neighborhoods, rev_neighborhoods, reach_dist, dist_dict, k_dist
383 )

File ~/river/anomaly/lof.py:457, in LocalOutlierFactor._initial_calculations(self, x_list, nm, neighborhoods, rev_neighborhoods, k_distances, dist_dict)
455 # Calculate new k-dist for each particle
456 for i, inner_dict in enumerate(dist_dict.values()):
--> 457 k_distances[i] = sorted(inner_dict.values())[min(k, len(inner_dict.values())) - 1]
459 # Only keep particles that are neighbors in distance dictionary
460 dist_dict = {
461 k: {k2: v2 for k2, v2 in v.items() if v2 <= k_distances[k]}
462 for k, v in dist_dict.items()
463 }

@JustinKurland
Copy link

Don't want to open a new issue here as it may be that the issue is in fact related to some of the ongoing efforts to refactor code for LOF.
Screen Shot 2023-10-10 at 5 01 56 PM

@MaxHalford
Copy link
Member

@JustinKurland LocalOutlierFactor is only available in the dev version of River. Is that what you're using?

@JustinKurland
Copy link

@MaxHalford no I had not realised this was only available in the dev version.

@MarekWadinger
Copy link
Contributor Author

Thank you for your help on resolving this issue in #1330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants