Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature time-stability functionality, PSI #268

Closed
sgamezrdo opened this issue Sep 30, 2019 · 6 comments
Closed

Feature time-stability functionality, PSI #268

sgamezrdo opened this issue Sep 30, 2019 · 6 comments
Labels
feature request 💬 Requests for new features

Comments

@sgamezrdo
Copy link

Is your feature request related to a problem? Please describe.
I have sometimes used pandas profiling when having time-referenced data. One check that is usually run is about time stability. The main statistic that I use is PSI (link) and that gives a good idea about the stability of a given feature.

Describe the solution you'd like
I would like to a KPI like PSI to be estimated. That would probably require to add an extra column regarding the time dimension, and probably a date threshold (there must be two time different samples to be compared)

Describe alternatives you've considered
Some alternative statistical tests to check the difference between distributions could be also assessed.

@sgamezrdo sgamezrdo added the feature request 💬 Requests for new features label Sep 30, 2019
@sgamezrdo
Copy link
Author

If you think it is an interesting feature I could spend some time on it :)

@sbrugman
Copy link
Collaborator

Please do. Comparison of populations are high on the wishlist (see #198).

Just to give you some context. We are working on a collection of tools for data and model profiling under the Dylan project (https://github.com/dylan-profiler). This feature might be part of pandas-profiling or even a separate comparison tool.

@sgamezrdo
Copy link
Author

I have just created a PR (#272) covering most of the functionalities discussed, although there seem to be some problems when trying to build. Any input/feedback is very much welcome.

@neomatrix369
Copy link

Please do. Comparison of populations are high on the wishlist (see #198).

Just to give you some context. We are working on a collection of tools for data and model profiling under the Dylan project (https://github.com/dylan-profiler). This feature might be part of pandas-profiling or even a separate comparison tool.

How will the Dylan project get integrated with pandas-profiling? I looked at the Dylan project and it appears to just be couple of projects and one of them being https://github.com/dylan-profiler/tangled-up-in-unicode.

@sbrugman
Copy link
Collaborator

@neomatrix369 I will make an announcement soon. I promise there is a coherence between these packages...

@github-actions
Copy link

Stale issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request 💬 Requests for new features
Projects
None yet
Development

No branches or pull requests

3 participants