-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature time-stability functionality, PSI #268
Comments
If you think it is an interesting feature I could spend some time on it :) |
Please do. Comparison of populations are high on the wishlist (see #198). Just to give you some context. We are working on a collection of tools for data and model profiling under the Dylan project (https://github.com/dylan-profiler). This feature might be part of pandas-profiling or even a separate comparison tool. |
I have just created a PR (#272) covering most of the functionalities discussed, although there seem to be some problems when trying to build. Any input/feedback is very much welcome. |
How will the Dylan project get integrated with pandas-profiling? I looked at the Dylan project and it appears to just be couple of projects and one of them being https://github.com/dylan-profiler/tangled-up-in-unicode. |
@neomatrix369 I will make an announcement soon. I promise there is a coherence between these packages... |
Stale issue |
Is your feature request related to a problem? Please describe.
I have sometimes used pandas profiling when having time-referenced data. One check that is usually run is about time stability. The main statistic that I use is PSI (link) and that gives a good idea about the stability of a given feature.
Describe the solution you'd like
I would like to a KPI like PSI to be estimated. That would probably require to add an extra column regarding the time dimension, and probably a date threshold (there must be two time different samples to be compared)
Describe alternatives you've considered
Some alternative statistical tests to check the difference between distributions could be also assessed.
The text was updated successfully, but these errors were encountered: