-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically Describe DFs Going Into Data Explorer #24
Comments
Fantastic idea! I think we can include it as part of the We need to impact:
@alexandercbooth and I prototyped a version that captures the summary statistics just now and this is the code we came up with for diff --git a/pandas/io/json/table_schema.py b/pandas/io/json/table_schema.py
index 2dc176648..0460868c1 100644
--- a/pandas/io/json/table_schema.py
+++ b/pandas/io/json/table_schema.py
@@ -113,6 +113,10 @@ def convert_pandas_type_to_json_field(arr, dtype=None):
field['tz'] = arr.dt.tz.zone
else:
field['tz'] = arr.tz.zone
+
+ # TODO: get this to be part of the spec for https://frictionlessdata.io/specs/table-schema/
+ if hasattr(arr, 'describe'):
+ field['summary'] = arr.describe(include="all").to_dict()
return field Admittedly, I don't know what the performance implications are. 😬 Perhaps this is fine if it's already being serialized. Notebook that uses this and will be useful for debugging: https://gist.github.com/rgbkrk/e1b477641128213db71efa34cfdbb8a7 @alexandercbooth wants to take on bringing this into pandas. |
This issue hasn't had any activity on it in the last 90 days. Unfortunately we don't get around to dealing with every issue that is opened. Instead of leaving issues open we're seeking to be transparent by closing issues that aren't being prioritized. If no other activity happens on this issue in one week, it will be closed.
Thank you! |
A related project for ideas around what sorts of summary statistics could be piped into the table: https://github.com/pandas-profiling/pandas-profiling |
For Hacktoberfest 2019 participants: resolving this issue will require changes across the pandas and nteract repos. @rgbkrk's comment above is a great place to start on the changes required on the Pythons side -- which is the first place to start with this modification. |
df.describe(include="all")
should run and be included as metadata for any dataframe that's being sent to the Data Explorer component.The text was updated successfully, but these errors were encountered: