Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replace missing plots to avoid dependencies' confilicts #1148

Merged
merged 4 commits into from
Nov 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ For each column, the following information (whenever relevant for the column typ
- **Most frequent and extreme values**
- **Histograms**: categorical and numerical
- **Correlations**: high correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik)
- **Missing values**: through counts, matrix, heatmap and dendrograms
- **Missing values**: through counts, matrix and heatmap
- **Duplicate rows**: list of the most common duplicated rows
- **Text analysis**: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
- **File and Image analysis**: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata
Expand Down
5 changes: 1 addition & 4 deletions docsrc/source/pages/advanced_usage/available_settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,18 +62,15 @@ Settings related with the missing data section and the visualizations it can inc
:header-rows: 1

.. code-block:: python
:caption: Configuration example: disable heatmap and dendrogram for large datasets
:caption: Configuration example: disable heatmap for large datasets

profile = df.profile_report(
missing_diagrams={
"heatmap": False,
"dendrogram": False,
}
)
profile.to_file("report.html")

The missing data diagrams are generated by the `missingno <https://github.com/ResidentMario/missingno>`_ package.

Correlations
------------

Expand Down
2 changes: 1 addition & 1 deletion docsrc/source/pages/getting_started/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ For each column, the following information (whenever relevant for the column typ
* **Most frequent and extreme values**
* **Histograms:** categorical and numerical
* **Correlations**: high correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér's V, Phik)
* **Missing values**: through counts, matrix, heatmap and dendrograms
* **Missing values**: through counts, matrix and heatmap
* **Duplicate rows**: list of the most common duplicated rows
* **Text analysis**: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
* **File and Image analysis**: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@

get_font_size
plot_missing_bar
plot_missing_dendrogram
plot_missing_heatmap
plot_missing_matrix

Expand Down
3 changes: 1 addition & 2 deletions docsrc/source/pages/tables/config_missing.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
Parameter,Type,Default,Description
``missing_diagrams.bar``,boolean,``True``,"Display a bar chart with counts of missing values for each column."
``missing_diagrams.matrix``,boolean,``True``,"Display a matrix of missing values. Similar to the bar chart, but might provide overview of the co-occurrence of missing values in rows."
``missing_diagrams.heatmap``,boolean,``True``,"Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another)."
``missing_diagrams.dendrogram``,boolean,``True``,"Display a dendrogram. Provides insight in the co-occurrence of missing values (i.e. columns that are both filled or both none)."
``missing_diagrams.heatmap``,boolean,``True``,"Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another)."
2 changes: 0 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ numpy>=1.16.0,<1.24
# Could be optional
# Related to HTML report
htmlmin==0.1.12
# Missing values
missingno>=0.4.2, <0.6
# Correlations
phik>=0.11.1,<0.13
# Text analysis
Expand Down
3 changes: 0 additions & 3 deletions src/pandas_profiling/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,6 @@ class Univariate(BaseModel):

class MissingPlot(BaseModel):
# Force labels when there are > 50 variables
# https://github.com/ResidentMario/missingno/issues/93#issuecomment-513322615
force_labels: bool = True
cmap: str = "RdBu"

Expand Down Expand Up @@ -298,7 +297,6 @@ class Config:
missing_diagrams: Dict[str, bool] = {
"bar": True,
"matrix": True,
"dendrogram": True,
"heatmap": True,
}

Expand Down Expand Up @@ -390,7 +388,6 @@ class Config:
"bar": False,
"matrix": False,
"heatmap": False,
"dendrogram": False,
},
"correlations": {
"pearson": {"calculate": False},
Expand Down
1 change: 0 additions & 1 deletion src/pandas_profiling/config_default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,6 @@ missing_diagrams:
bar: true
matrix: true
heatmap: true
dendrogram: true

correlations:
pearson:
Expand Down
2 changes: 0 additions & 2 deletions src/pandas_profiling/config_minimal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,6 @@ missing_diagrams:
bar: false
matrix: false
heatmap: false
dendrogram: false

correlations:
pearson:
Expand Down Expand Up @@ -139,7 +138,6 @@ plot:
missing:
cmap: 'RdBu'
# Force labels when there are > 50 variables
# https://github.com/ResidentMario/missingno/issues/93#issuecomment-513322615
force_labels: true

cat_frequency:
Expand Down
13 changes: 1 addition & 12 deletions src/pandas_profiling/model/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,6 @@ def missing_heatmap(config: Settings, df: Any) -> str:
raise NotImplementedError()


@multimethod
def missing_dendrogram(config: Settings, df: Any) -> str:
raise NotImplementedError()


def get_missing_active(config: Settings, table_stats: dict) -> Dict[Any, Any]:
"""

Expand Down Expand Up @@ -56,12 +51,6 @@ def get_missing_active(config: Settings, table_stats: dict) -> Dict[Any, Any]:
"caption": "The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.",
"function": missing_heatmap,
},
"dendrogram": {
"min_missing": 1,
"name": "Dendrogram",
"caption": "The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.",
"function": missing_dendrogram,
},
}

missing_map = {
Expand Down Expand Up @@ -114,7 +103,7 @@ def get_missing_diagram(
settings: missing diagram name, caption and function

Returns:
A dictionary containing the base64 encoded plots for each diagram that is active in the config (matrix, bar, heatmap, dendrogram).
A dictionary containing the base64 encoded plots for each diagram that is active in the config (matrix, bar, heatmap).
"""

if len(df) == 0:
Expand Down
7 changes: 0 additions & 7 deletions src/pandas_profiling/model/pandas/missing_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,11 @@
from pandas_profiling.config import Settings
from pandas_profiling.model.missing import (
missing_bar,
missing_dendrogram,
missing_heatmap,
missing_matrix,
)
from pandas_profiling.visualisation.missing import (
plot_missing_bar,
plot_missing_dendrogram,
plot_missing_heatmap,
plot_missing_matrix,
)
Expand All @@ -28,8 +26,3 @@ def pandas_missing_matrix(config: Settings, df: pd.DataFrame) -> str:
@missing_heatmap.register
def pandas_missing_heatmap(config: Settings, df: pd.DataFrame) -> str:
return plot_missing_heatmap(config, df)


@missing_dendrogram.register
def pandas_missing_dendrogram(config: Settings, df: pd.DataFrame) -> str:
return plot_missing_dendrogram(config, df)
30 changes: 8 additions & 22 deletions src/pandas_profiling/visualisation/missing.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
"""Plotting functions for the missing values diagrams"""
import pandas as pd
from matplotlib import pyplot as plt
from missingno import missingno

from pandas_profiling.config import Settings
from pandas_profiling.visualisation.context import manage_matplotlib_context
from pandas_profiling.visualisation.plot import (
missing_bar,
missing_heatmap,
missing_matrix,
)
from pandas_profiling.visualisation.utils import hex_to_rgb, plot_360_n0sc0pe


Expand Down Expand Up @@ -44,11 +48,10 @@ def plot_missing_matrix(config: Settings, data: pd.DataFrame) -> str:
The resulting missing values matrix encoded as a string.
"""

missingno.matrix(
missing_matrix(
data,
figsize=(10, 4),
fontsize=get_font_size(data) / 20 * 16,
sparkline=False,
color=hex_to_rgb(config.html.style.primary_colors[0]),
labels=config.plot.missing.force_labels,
)
Expand All @@ -67,7 +70,7 @@ def plot_missing_bar(config: Settings, data: pd.DataFrame) -> str:
Returns:
The resulting missing values bar plot encoded as a string.
"""
missingno.bar(
missing_bar(
data,
figsize=(10, 5),
fontsize=get_font_size(data),
Expand Down Expand Up @@ -102,7 +105,7 @@ def plot_missing_heatmap(config: Settings, data: pd.DataFrame) -> str:
if len(data.columns) > 40:
font_size /= 1.4

missingno.heatmap(
missing_heatmap(
data,
figsize=(10, height),
fontsize=font_size,
Expand All @@ -116,20 +119,3 @@ def plot_missing_heatmap(config: Settings, data: pd.DataFrame) -> str:
plt.subplots_adjust(left=0.2, right=0.9, top=0.8, bottom=0.3)

return plot_360_n0sc0pe(config)


@manage_matplotlib_context()
def plot_missing_dendrogram(config: Settings, data: pd.DataFrame) -> str:
"""Generate a dendrogram plot for missing values.

Args:
config: report Settings object
data: Pandas DataFrame to generate missing values dendrogram plot from.

Returns:
The resulting missing values dendrogram plot encoded as a string.

"""
missingno.dendrogram(data, fontsize=get_font_size(data) * 2.0)
plt.subplots_adjust(left=0.1, right=0.9, top=0.7, bottom=0.2)
return plot_360_n0sc0pe(config)
Loading