-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support another structural output(xml/json)? #224
Comments
Some quick comments. Regarding the first point: Anyone can contribute by providing open datasets that are representative for larger number of columns (e.g. 40-100 and 100+). Regarding the second point: Providing the output als xml/json is definitely an option. This is a great place for beginning contributors to start. Just call:
|
Thanks for comments. I have a rough idea and welcome to make corrections and other inspirational ideas. We can recover the raw standalone html report from a single description file. It will generate the current output html.
Furthormore, we could implement a compare api.
The diff_profile could be a class similar to profile. Or diff_profile is also a "profile" class. Then we need to extend the api/methods of current profile class to support list/array format and provide more complex comparing methods in this class. Thus, we can recover the current html report from xml/json, and also provide a more flexible tool to compare between data sets such as training datatsets and testing datasets in ml. |
Hello @sbrugman , I'd like to understand better how the method to generate the output as XML/JSON would be. I have some questions :
Any more details about this functionality would be appreciated. Thanks for your attention |
- Feature as requested in #224 - Many thanks @marco-cardoso for your initial implementation #225
* Progress bar implementation - Feature as requested in #224 - Test for #282 - Many thanks @marco-cardoso for your initial implementation #225 - Display no progress bar for disabled modules (e.g. individual correlations). - Update requirements, notebooks, docs, examples, linting * Decouple notebooks and notebook tests. One test hangs on issue in nbval: computationalmodelling/nbval#136 * Disable missing plots in minimal mode * Create additional demo with Chicago employees data * Compartmentalize column sorting in describe module
- Progress bar added (#224) - Character analysis for Text/NLP (#278) - Themes: configuration and demo's (Orange, Dark) - Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling. - Toggle descriptions at correlations. Deprecation: - This is the last version to support Python 3.5. Stability: - The order of columns changed when sort="None" (#377, fixed). - Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1) - Improved mixed type detection (#351) - Refactor of report structures. - Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329). - Distinct counts exclude NaNs. - Fixed alerts in notebooks. Other improvements: - Warnings are now sorted. - Links to Binder and Google Colab are added for notebooks (#349) - The overview section is tabbed.
- Progress bar added (#224) - Character analysis for Text/NLP (#278) - Themes: configuration and demo's (Orange, Dark) - Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling. - Toggle descriptions at correlations. Deprecation: - This is the last version to support Python 3.5. Stability: - The order of columns changed when sort="None" (#377, fixed). - Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1) - Improved mixed type detection (#351) - Refactor of report structures. - Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329). - Distinct counts exclude NaNs. - Fixed alerts in notebooks. Other improvements: - Warnings are now sorted. - Links to Binder and Google Colab are added for notebooks (#349) - The overview section is tabbed.
- Progress bar added (#224) - Character analysis for Text/NLP (#278) - Themes: configuration and demo's (Orange, Dark) - Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling. - Toggle descriptions at correlations. Deprecation: - This is the last version to support Python 3.5. Stability: - The order of columns changed when sort="None" (#377, fixed). - Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1) - Improved mixed type detection (#351) - Refactor of report structures. - Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329). - Distinct counts exclude NaNs. - Fixed alerts in notebooks. Other improvements: - Warnings are now sorted. - Links to Binder and Google Colab are added for notebooks (#349) - The overview section is tabbed. * Commit for pandas-profiling v2.5.0 - Progress bar added (#224) - Character analysis for Text/NLP (#278) - Themes: configuration and demo's (Orange, Dark) - Tutorial on modifying the report's structure (#362; #281, #259, #253, #234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling. - Toggle descriptions at correlations. Deprecation: - This is the last version to support Python 3.5. Stability: - The order of columns changed when sort="None" (#377, fixed). - Pandas v1.0.X is not yet supported (#367, #366, #363, #353, pinned pandas to < 1) - Improved mixed type detection (#351) - Refactor of report structures. - Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, #329). - Distinct counts exclude NaNs. - Fixed alerts in notebooks. Other improvements: - Warnings are now sorted. - Links to Binder and Google Colab are added for notebooks (#349) - The overview section is tabbed.
|
* Progress bar implementation - Feature as requested in ydataai#224 - Test for ydataai#282 - Many thanks @marco-cardoso for your initial implementation ydataai#225 - Display no progress bar for disabled modules (e.g. individual correlations). - Update requirements, notebooks, docs, examples, linting * Decouple notebooks and notebook tests. One test hangs on issue in nbval: computationalmodelling/nbval#136 * Disable missing plots in minimal mode * Create additional demo with Chicago employees data * Compartmentalize column sorting in describe module
- Progress bar added (ydataai#224) - Character analysis for Text/NLP (ydataai#278) - Themes: configuration and demo's (Orange, Dark) - Tutorial on modifying the report's structure (ydataai#362; ydataai#281, ydataai#259, ydataai#253, ydataai#234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling. - Toggle descriptions at correlations. Deprecation: - This is the last version to support Python 3.5. Stability: - The order of columns changed when sort="None" (ydataai#377, fixed). - Pandas v1.0.X is not yet supported (ydataai#367, ydataai#366, ydataai#363, ydataai#353, pinned pandas to < 1) - Improved mixed type detection (ydataai#351) - Refactor of report structures. - Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, ydataai#329). - Distinct counts exclude NaNs. - Fixed alerts in notebooks. Other improvements: - Warnings are now sorted. - Links to Binder and Google Colab are added for notebooks (ydataai#349) - The overview section is tabbed. * Commit for pandas-profiling v2.5.0 - Progress bar added (ydataai#224) - Character analysis for Text/NLP (ydataai#278) - Themes: configuration and demo's (Orange, Dark) - Tutorial on modifying the report's structure (ydataai#362; ydataai#281, ydataai#259, ydataai#253, ydataai#234). This jupyter notebook also demonstrates how to use the Kaggle api together with pandas-profiling. - Toggle descriptions at correlations. Deprecation: - This is the last version to support Python 3.5. Stability: - The order of columns changed when sort="None" (ydataai#377, fixed). - Pandas v1.0.X is not yet supported (ydataai#367, ydataai#366, ydataai#363, ydataai#353, pinned pandas to < 1) - Improved mixed type detection (ydataai#351) - Refactor of report structures. - Correlations are more stable (e.g. Phi_k color scale now from 0-1, rows and columns with NaN values are dropped, ydataai#329). - Distinct counts exclude NaNs. - Fixed alerts in notebooks. Other improvements: - Warnings are now sorted. - Links to Binder and Google Colab are added for notebooks (ydataai#349) - The overview section is tabbed.
I use pandas_profiling to check my data every day to get knowlegde of my new prodcution data.
It greaterly improve the effenciency of data quality checking. Thanks to contributers for saving my life! Life is short, use Pandas Profiling!
However, I have found that there are still 2 problems which can not be handled perfectly right now.
For example, I have one dataframe, and one column in this dataframe called A.
The missing rate of A is in the range of 25% to 35% on usual . If the missing rate of A of new producted data is out of this range, I wanna generate a warning.
Both of the problems require recording the statictics info using file formats such as xml/json on a daily basis. Using html, it is not convenient to get the statictics info.
However, I do not find other outputs that panas_profiling but html.
The html is great. But on the one hand, sometimes the html report is too big to open. If I could store the data and chose some important columns to present, it will generate a report only contains these columns and could be opened since the scale of html is relatively small.
Besides, the analysis in problem 2 can be accomplished using a higher layer program by comparing the xml/json generated in two days.
I think the proposal will makes pandas_profiling greater!
The text was updated successfully, but these errors were encountered: