diff --git a/MANIFEST.in b/MANIFEST.in index fac060ccc..ecdbb5edb 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,9 +1,6 @@ include LICENSE include pandas_profiling/view/*.mplstyle -include pandas_profiling/view/templates/*.html -include pandas_profiling/view/templates/variables/*.html -include pandas_profiling/view/templates/assets/*.js -include pandas_profiling/view/templates/assets/*.css -include pandas_profiling/view/templates/*.css +recursive-include pandas_profiling/view/templates *.html +recursive-include pandas_profiling/view/templates/assets *.js *.css include pandas_profiling/config_default.yaml include README.md diff --git a/README.md b/README.md index 17db53f2b..d68aae603 100644 --- a/README.md +++ b/README.md @@ -142,7 +142,7 @@ Read more on getting involved in the [Contribution Guide](https://github.com/pan ## Dependencies -You need Python 3 to run this package. Other dependencies can be found in the requirements files: +You need [Python 3](https://python3statement.org/) to run this package. Other dependencies can be found in the requirements files: | Filename | Requirements| |----------|-------------| diff --git a/docs/index.html b/docs/index.html index 2e1ac75f5..7cc746038 100644 --- a/docs/index.html +++ b/docs/index.html @@ -128,7 +128,7 @@

How to contribute

If you would like to be a industry partner or sponsor, please drop us a line.

Read more on getting involved in the Contribution Guide.

Dependencies

-

You need Python 3 to run this package. Other dependencies can be found in the requirements files:

+

You need Python 3 to run this package. Other dependencies can be found in the requirements files:

diff --git a/examples/meteorites/meteorites_report.html b/examples/meteorites/meteorites_report.html index d09564b82..b78359c4c 100644 --- a/examples/meteorites/meteorites_report.html +++ b/examples/meteorites/meteorites_report.html @@ -291,7 +291,7 @@ #overview-content td, #overview-content th{ border-top: 0; line-height: 1; -}

Overview

Dataset info

Number of variables14
Number of observations45726
Missing cells29703 (< 0.1%)
Duplicate rows0 (0.0%)
Total size in memory4.6 MiB
Average record size in memory105.0 B

Variables types

Numeric4
Categorical5
Boolean1
Date1
URL0
Text (Unique)1
Rejected2
Unsupported0

Warnings

GeoLocation has a high cardinality: 17101 distinct values Warning
GeoLocation has 7315 (16.0%) missing values Missing
mass_(g) is highly skewed (γ1 = 76.918) Skewed
recclass has a high cardinality: 466 distinct values Warning
reclat has 6438 (14.1%) zeros Zeros
reclat has 7315 (16.0%) missing values Missing
reclat_city is highly correlated with reclat (ρ = 0.99422) Rejected
reclong has 6214 (13.6%) zeros Zeros
reclong has 7315 (16.0%) missing values Missing
source has constant value "NASA" Rejected

Variables

boolean
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
True
22889
False
22837
Toggle details
ValueCountFrequency (%) 
True 22889 50.1%
 
False 22837 49.9%
 

fall
Categorical

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Found
44609
Fell
 
1117
Toggle details
ValueCountFrequency (%) 
Found 44609 > 99.9%
 
Fell 1117 < 0.1%
 
Max length5
Mean length4.9756
Min length4
Contains charsTrue
Contains digitsFalse
Contains spacesFalse
Contains non-wordsFalse

GeoLocation
Categorical

Distinct count17101
Unique (%)37.4%
Missing (%)16.0%
Missing (n)7315
(0.0, 0.0)
6214
(-71.5, 35.66667)
 
4761
(-84.0, 168.0)
 
3040
Other values (17097)
24396
(Missing)
7315
Toggle details
ValueCountFrequency (%) 
(0.0, 0.0) 6214 13.6%
 
(-71.5, 35.66667) 4761 10.4%
 
(-84.0, 168.0) 3040 6.6%
 
(-72.0, 26.0) 1505 < 0.1%
 
(-79.68333, 159.75) 657 < 0.1%
 
(-76.71667, 159.66667) 637 < 0.1%
 
(-76.18333, 157.16667) 539 < 0.1%
 
(-79.68333, 155.75) 473 < 0.1%
 
(-84.21667, 160.5) 263 < 0.1%
 
(-86.36667, -70.0) 226 < 0.1%
 
Other values (17090) 20096 43.9%
 
(Missing) 7315 16.0%
 
Max length24
Mean length15.016
Min length3
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

id
Numeric

Distinct count45716
Unique (%)> 99.9%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean26884
Minimum1
Maximum57458
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum1
5-th percentile2388.8
Q112681
Median24256
Q340654
95-th percentile54891
Maximum57458
Range57457
Interquartile range27972

Descriptive statistics

Standard deviation16863
Coef of variation0.62727
Kurtosis-1.1601
Mean26884
MAD14490
Skewness0.26653
Sum1.2293e+09
Variance2.8438e+08
Memory size357.3 KiB
Histogram
ValueCountFrequency (%) 
417 2 < 0.1%
 
398 2 < 0.1%
 
1 2 < 0.1%
 
6 2 < 0.1%
 
392 2 < 0.1%
 
370 2 < 0.1%
 
379 2 < 0.1%
 
2 2 < 0.1%
 
390 2 < 0.1%
 
10 2 < 0.1%
 
Other values (45706) 45706 > 99.9%
 

Minimum 5 values

ValueCountFrequency (%) 
1 2 < 0.1%
 
2 2 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 
6 2 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
57458 1 < 0.1%
 
57457 1 < 0.1%
 
57456 1 < 0.1%
 
57455 1 < 0.1%
 
57454 1 < 0.1%
 

mass_(g)
Numeric

Distinct count12577
Unique (%)27.5%
Missing (%)< 0.1%
Missing (n)131
Infinite (%)0.0%
Infinite (n)0
Mean13278
Minimum0
Maximum6e+07
Zeros (%)< 0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile1.1
Q17.2
Median32.61
Q3202.9
95-th percentile4000
Maximum6e+07
Range6e+07
Interquartile range195.7

Descriptive statistics

Standard deviation5.7493e+05
Coef of variation43.298
Kurtosis6798.4
Mean13278
MAD25113
Skewness76.918
Sum6.0543e+08
Variance3.3054e+11
Memory size357.3 KiB
Histogram
ValueCountFrequency (%) 
1.3 171 < 0.1%
 
1.2 140 < 0.1%
 
1.4 138 < 0.1%
 
2.1 130 < 0.1%
 
2.4 126 < 0.1%
 
1.6 120 < 0.1%
 
0.5 119 < 0.1%
 
1.1 116 < 0.1%
 
3.8 114 < 0.1%
 
0.7 111 < 0.1%
 
Other values (12566) 44310 > 99.9%
 
(Missing) 131 < 0.1%
 

Minimum 5 values

ValueCountFrequency (%) 
0 19 < 0.1%
 
0.01 2 < 0.1%
 
0.013 1 < 0.1%
 
0.02 1 < 0.1%
 
0.03 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
6e+07 1 < 0.1%
 
5.82e+07 1 < 0.1%
 
5e+07 1 < 0.1%
 
3e+07 1 < 0.1%
 
2.8e+07 1 < 0.1%
 

mixed
Categorical

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
1
23060
A
22666
Toggle details
ValueCountFrequency (%) 
1 23060 50.4%
 
A 22666 49.6%
 
Max length1
Mean length1
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

name
Categorical, Unique

First 5 values
Aachen
Aachen copy
Aarhus
Aarhus copy
Abajo
Last 5 values
Österplana 062
Österplana 063
Österplana 064
Łowicz
Święcany

First 5 values

ValueCountFrequency (%) 
Aachen 1 < 0.1%
 
Aachen copy 1 < 0.1%
 
Aarhus 1 < 0.1%
 
Aarhus copy 1 < 0.1%
 
Abajo 1 < 0.1%
 

Last 5 values

ValueCountFrequency (%) 
Święcany 1 < 0.1%
 
Łowicz 1 < 0.1%
 
Österplana 064 1 < 0.1%
 
Österplana 063 1 < 0.1%
 
Österplana 062 1 < 0.1%
 

nametype
Categorical

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Valid
45651
Relict
 
75
Toggle details
ValueCountFrequency (%) 
Valid 45651 > 99.9%
 
Relict 75 < 0.1%
 
Max length6
Mean length5.0016
Min length5
Contains charsTrue
Contains digitsFalse
Contains spacesFalse
Contains non-wordsFalse

recclass
Categorical

Distinct count466
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
L6
8287
H5
7143
L5
 
4797
Other values (463)
25499
Toggle details
ValueCountFrequency (%) 
L6 8287 18.1%
 
H5 7143 15.6%
 
L5 4797 10.5%
 
H6 4529 9.9%
 
H4 4211 9.2%
 
LL5 2766 6.0%
 
LL6 2043 < 0.1%
 
L4 1253 < 0.1%
 
H4/5 428 < 0.1%
 
CM2 416 < 0.1%
 
Other values (456) 9853 21.5%
 
Max length26
Mean length3.0525
Min length1
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

reclat
Numeric

Distinct count12739
Unique (%)27.9%
Missing (%)16.0%
Missing (n)7315
Infinite (%)0.0%
Infinite (n)0
Mean-39.107
Minimum-87.367
Maximum81.167
Zeros (%)14.1%
Mini histogram

Quantile statistics

Minimum-87.367
5-th percentile-84.355
Q1-76.714
Median-71.5
Q30
95-th percentile34.494
Maximum81.167
Range168.53
Interquartile range76.714

Descriptive statistics

Standard deviation46.386
Coef of variation-1.1861
Kurtosis-1.4769
Mean-39.107
MAD43.937
Skewness0.49132
Sum-1.5021e+06
Variance2151.7
Memory size357.3 KiB
Histogram
ValueCountFrequency (%) 
0 6438 14.1%
 
-71.5 4761 10.4%
 
-84 3040 6.6%
 
-72 1506 < 0.1%
 
-79.683 1130 < 0.1%
 
-76.717 680 < 0.1%
 
-76.183 539 < 0.1%
 
-84.217 263 < 0.1%
 
-86.367 226 < 0.1%
 
-86.717 217 < 0.1%
 
Other values (12728) 19611 42.9%
 
(Missing) 7315 16.0%
 

Minimum 5 values

ValueCountFrequency (%) 
-87.367 4 < 0.1%
 
-87.033 3 < 0.1%
 
-86.933 3 < 0.1%
 
-86.717 217 < 0.1%
 
-86.567 17 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
81.167 1 < 0.1%
 
76.533 1 < 0.1%
 
76.133 1 < 0.1%
 
72.883 1 < 0.1%
 
72.683 1 < 0.1%
 

reclat_city
Highly correlated

This variable is highly correlated with reclat and should be ignored for analysis

Correlation0.99422

reclong
Numeric

Distinct count14641
Unique (%)32.0%
Missing (%)16.0%
Missing (n)7315
Infinite (%)0.0%
Infinite (n)0
Mean61.053
Minimum-165.43
Maximum354.47
Zeros (%)13.6%
Mini histogram

Quantile statistics

Minimum-165.43
5-th percentile-90.427
Q10
Median35.667
Q3157.17
95-th percentile168
Maximum354.47
Range519.91
Interquartile range157.17

Descriptive statistics

Standard deviation80.655
Coef of variation1.3211
Kurtosis-0.73139
Mean61.053
MAD67.606
Skewness-0.17438
Sum2.3451e+06
Variance6505.3
Memory size357.3 KiB
Histogram
ValueCountFrequency (%) 
0 6214 13.6%
 
35.667 4985 10.9%
 
168 3040 6.6%
 
26 1506 < 0.1%
 
159.75 657 < 0.1%
 
159.67 637 < 0.1%
 
157.17 542 < 0.1%
 
155.75 473 < 0.1%
 
160.5 263 < 0.1%
 
-70 228 < 0.1%
 
Other values (14630) 19866 43.4%
 
(Missing) 7315 16.0%
 

Minimum 5 values

ValueCountFrequency (%) 
-165.43 9 < 0.1%
 
-165.12 17 < 0.1%
 
-163.17 1 < 0.1%
 
-162.55 1 < 0.1%
 
-157.87 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
354.47 1 < 0.1%
 
178.2 1 < 0.1%
 
178.08 1 < 0.1%
 
175.73 1 < 0.1%
 
175.13 1 < 0.1%
 

source
Constant

This variable is constant and should be ignored for analysis

Constant valueNASA

year
Date

Distinct count246
Unique (%)< 0.1%
Missing (%)< 0.1%
Missing (n)312
Infinite (%)0.0%
Infinite (n)0
Minimum1688-01-01 00:00:00
Maximum2101-01-01 00:00:00
Mini histogram
Histogram

Correlations

Pearson's r
Spearman's ρ
Kendall's τ
Phik (φ<sub><em>k</em></sub>)
Cramér's V (φ<sub><em>c</em></sub>)
Recoded

Missing values

Count
Matrix
Heatmap
Dendrogram

Sample

First rows

booleanfallGeoLocationidmass_(g)mixednamenametyperecclassreclatreclat_cityreclongsourceyear
0TrueFell(50.775, 6.08333)121.0AAachenValidL550.7750045.8449176.08333NASA1880-01-01
1FalseFell(56.18333, 10.23333)2720.01AarhusValidH656.1833361.40137810.23333NASA1951-01-01
2TrueFell(54.21667, -113.0)6107000.01AbeeValidEH454.2166756.665445-113.00000NASA1952-01-01
3TrueFell(16.88333, -99.9)101914.0AAcapulcoValidAcapulcoite16.8833313.980564-99.90000NASA1976-01-01
4FalseFell(-33.16667, -64.95)370780.01AchirasValidL6-33.16667-31.246833-64.95000NASA1902-01-01
5FalseFell(32.1, 71.8)3794239.01Adhi KotValidEH432.1000030.16807171.80000NASA1919-01-01
6TrueFell(44.83333, 95.16667)390910.01Adzhi-Bogdo (stone)ValidLL3-644.8333341.82370195.16667NASA1949-01-01
7FalseFell(44.21667, 0.61667)39230000.0AAgenValidH544.2166745.6918890.61667NASA1814-01-01
8FalseFell(-31.6, -65.23333)3981620.0AAguadaValidL6-31.60000-27.353326-65.23333NASA1930-01-01
9TrueFell(-30.86667, -64.55)4171440.01Aguila BlancaValidL-30.86667-27.320248-64.55000NASA1920-01-01

Last rows

booleanfallGeoLocationidmass_(g)mixednamenametyperecclassreclatreclat_cityreclongsourceyear
45716TrueFell(50.775, 6.08333)121.0AAachen copyValidL550.7750045.8449176.08333NASA1880-01-01
45717FalseFell(56.18333, 10.23333)2720.01Aarhus copyValidH656.1833361.40137810.23333NASA1951-01-01
45718TrueFell(54.21667, -113.0)6107000.01Abee copyValidEH454.2166756.665445-113.00000NASA1952-01-01
45719TrueFell(16.88333, -99.9)101914.0AAcapulco copyValidAcapulcoite16.8833313.980564-99.90000NASA1976-01-01
45720FalseFell(-33.16667, -64.95)370780.01Achiras copyValidL6-33.16667-31.246833-64.95000NASA1902-01-01
45721FalseFell(32.1, 71.8)3794239.01Adhi Kot copyValidEH432.1000030.16807171.80000NASA1919-01-01
45722TrueFell(44.83333, 95.16667)390910.01Adzhi-Bogdo (stone) copyValidLL3-644.8333341.82370195.16667NASA1949-01-01
45723FalseFell(44.21667, 0.61667)39230000.0AAgen copyValidH544.2166745.6918890.61667NASA1814-01-01
45724FalseFell(-31.6, -65.23333)3981620.0AAguada copyValidL6-31.60000-27.353326-65.23333NASA1930-01-01
45725TrueFell(-30.86667, -64.55)4171440.01Aguila Blanca copyValidL-30.86667-27.320248-64.55000NASA1920-01-01