Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection of skew/drift in distribution of numerical feature #101

Closed
wrapper228 opened this issue Jan 15, 2020 · 3 comments
Closed

Detection of skew/drift in distribution of numerical feature #101

wrapper228 opened this issue Jan 15, 2020 · 3 comments

Comments

@wrapper228
Copy link

wrapper228 commented Jan 15, 2020

Does .validate_statistics() really detect anomalies in distribution only for categorical features? "For now drift detection is only supported for categorical features." and "For now skew detection is only supported for categorical features." - doesn't it seem weird?
For example, I have a N(0,1) distributed numerical feature in my train data. Now in serving data this numerical feature has N(10,1) distribution. Any solutions from TFDV for this case?

@wrapper228 wrapper228 changed the title sa Detection of skew/drift in distribution of numerical feature Jan 15, 2020
@wrapper228 wrapper228 reopened this Jan 15, 2020
@rmothukuru rmothukuru self-assigned this Jan 20, 2020
@rmothukuru rmothukuru assigned caveness and unassigned rmothukuru Jan 20, 2020
@caveness
Copy link
Collaborator

That's correct -- as of now, TFDV supports drift and skew detection only for categorical features. So, unfortunately, we don't currently have a solution for finding such a distribution shift in numeric features. However, we are planning to add support for skew and drift detection for numeric features in the future.

@cah-aswini-jalla
Copy link

Do we have any update on getting skew/drift anomalies for numerical features?

@caveness
Copy link
Collaborator

Yes -- support for detecting drift and skew for numeric features has been added to TFDV, as of Version 0.25.0.

To detect drift or distribution skew in numeric features, specify a
jensen_shannon_divergence threshold in the drift_comparator or skew_comparator in your schema.

See the TFDV Get Started Guide for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants