-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* ci: check for flake8 comprehensions * fix(config): configuration order is now respected * fix: index is no longer automatically added to dataframe * feat: correlation alerts show the name of the correlation * fix: strip tags from the title of the web report * feat: comparing two or more datasets (see docs) * docs(comparison): feature description * docs(readme): include reference to the dataset comparison use case * refactor: config private attribute * refactor: config update, exclude defaults * refactor: include style attribute in timeseries code * refactor: include style attribute in templates * test(comparisons): add tests for report comparison * refactor: overall correlation lowercase * refactor: frequency table kwargs * refactor: frequency table styling * refactor: fixing renderable tests * refactor: fixing renderable tests * style: formatting * refactor: senstive test * refactor: pass style argument * feat: check for empty dataframe * refactor: namespace invariant type check * refactor: ipywidgets fixes * refactor: ipywidgets no comparison support yet * refactor: process feedback * fix: comparison bugs (#1137) * fix: refactoring bugs * fix: update protected var labels for comparison * fix: add support to timeseries comparison * fix: style changes for readability * test: add simple run test * fix: reword comparison report doc (#1136) * fix: rewording Co-authored-by: Aarni Koskela <[email protected]> * feat: add comparison validations (#1143) * feat: add comparison validations * feat: replace missing plots to avoid dependencies' confilicts (#1148) * feat: add new missing histogram plot * feat: add new missing matrix plot * feat: add new missing heatmap plot * feat: remove dendrogram * feat: ignore columns not present on the base report (#1150) * feat: select only the left side of the comparison * chore: pre-commit fixes * fix: not intersection of columns * [skip ci] Code formatting * fix: missing plots columns order * [skip ci] Code formatting * fix: interactions/missing plot colors * fix: code formatting Co-authored-by: Aarni Koskela <[email protected]> Co-authored-by: Azory YData Bot <[email protected]> Co-authored-by: alexbarros <[email protected]>
- Loading branch information
1 parent
c226052
commit 66bf75b
Showing
77 changed files
with
1,920 additions
and
568 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,6 @@ | |
|
||
get_font_size | ||
plot_missing_bar | ||
plot_missing_dendrogram | ||
plot_missing_heatmap | ||
plot_missing_matrix | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,4 @@ | ||
Parameter,Type,Default,Description | ||
``missing_diagrams.bar``,boolean,``True``,"Display a bar chart with counts of missing values for each column." | ||
``missing_diagrams.matrix``,boolean,``True``,"Display a matrix of missing values. Similar to the bar chart, but might provide overview of the co-occurrence of missing values in rows." | ||
``missing_diagrams.heatmap``,boolean,``True``,"Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another)." | ||
``missing_diagrams.dendrogram``,boolean,``True``,"Display a dendrogram. Provides insight in the co-occurrence of missing values (i.e. columns that are both filled or both none)." | ||
``missing_diagrams.heatmap``,boolean,``True``,"Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another)." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
================== | ||
Dataset Comparison | ||
================== | ||
|
||
``pandas-profiling`` can be used to compare multiple version of the same dataset. | ||
This is useful when comparing data from multiple time periods, such as two years. | ||
Another common scenario is to view the dataset profile for training, validation and test sets in machine learning. | ||
|
||
The following syntax can be used to compare two datasets: | ||
|
||
.. code-block:: python | ||
from pandas_profiling import ProfileReport | ||
train_df = pd.read_csv("train.csv") | ||
train_report = ProfileReport(train_df, title="Train") | ||
test_df = pd.read_csv("test.csv") | ||
test_report = ProfileReport(test_df, title="Test") | ||
comparison_report = train_report.compare(test_report) | ||
comparison_report.to_file("comparison.html") | ||
The comparison report uses the ``title`` attribute out of ``Settings`` as a label throughout. | ||
The colors are configured in ``settings.html.style.primary_colors``. | ||
The numeric precision parameter ``settings.report.precision`` can be played with to obtain some additional space in reports. | ||
|
||
|
||
In order to compare more than two reports, the following syntax can be used: | ||
|
||
.. code-block:: python | ||
from pandas_profiling import ProfileReport, compare | ||
comparison_report = compare([train_report, validation_report, test_report]) | ||
# Obtain merged statistics | ||
statistics = comparison_report.get_description() | ||
# Save report to file | ||
comparison_report.to_file("comparison.html") | ||
Note that this functionality only ensures the support report comparison of two datasets. | ||
It is possible to obtain the statistics - the report may have formatting issues. | ||
One of the settings that can be changed is ``settings.report.precision``. | ||
As a rule of thumb, the value 10 can be used for a single report and 8 for comparing two reports. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.