Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable stability #198

Closed
natesh2310 opened this issue Jul 2, 2019 · 2 comments
Closed

Variable stability #198

natesh2310 opened this issue Jul 2, 2019 · 2 comments
Labels
feature request 💬 Requests for new features

Comments

@natesh2310
Copy link

Great tool to quickly check data quality!

A common problem in machine learning applications is when variable distributions change significantly from the training data to the test data. A metric like Population Stability Index (PSI) for each variable could be calculated between the training and test data to identify highly unstable variables and if found unstable, plot the distributions in each of training and test data. Preemptively dropping these unstable variables, model developers can make the model more stable and production ready.

@natesh2310 natesh2310 added the feature request 💬 Requests for new features label Jul 2, 2019
@sbrugman
Copy link
Collaborator

sbrugman commented Jul 3, 2019

Interesting suggestion indeed. Related to [#173].

Anyone who is interesting in implementing this, please feel free to create a pull request!

@github-actions
Copy link

Stale issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request 💬 Requests for new features
Projects
None yet
Development

No branches or pull requests

2 participants