-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Update is_sparse docstring #19983
Conversation
The is_sparse function checks to see if an array-like is spare by checking to see if it is an instance of ABCSparseArray or ABCSparseSeries. This commit adds ABCSparseDataFrame to that list -- so it can detect that a DataFrame (which is an array-like object) is sparse. Added a test for this.
why is this necessary? |
I was looking at this method for the documentation sprint, and I thought it was odd that it would return False for a for |
a doc-string update would be great. |
This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method.
Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some ideas that IMO would make the docstring better and more accurate.
pandas/core/dtypes/common.py
Outdated
@@ -120,17 +120,26 @@ def is_object_dtype(arr_or_dtype): | |||
|
|||
|
|||
def is_sparse(arr): | |||
""" | |||
Check whether an array-like is a pandas sparse array. | |||
"""Check whether an array-like is a pandas sparse array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's not 100% correct, technically speaking. For what it's explained later, I think the function checks whether an array-like is a 1-D pandas sparse array.
pandas/core/dtypes/common.py
Outdated
|
||
Parameters | ||
---------- | ||
arr : array-like | ||
The array-like to check. | ||
arr : array-like (1-D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not wrong, and based on your example, arr
can be actually of more dimensions, but is being checked its dimensionality. Isn't it?
pandas/core/dtypes/common.py
Outdated
|
||
Returns | ||
------- | ||
boolean : Whether or not the array-like is a pandas sparse array. | ||
boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some discussion about it, but we should be using Python types, so bool
instead of boolean
.
|
||
Check that the one-dimensional array-like is a pandas sparse array. | ||
Returns True if it is a pandas sparse array, not another type of | ||
sparse array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will take some research, but I think it would be useful to explain what are the use cases of this function, besides what it does.
pandas/core/dtypes/common.py
Outdated
|
||
This function checks that 1 dimensional arrays are sparse. | ||
It will not identify that a `SparseDataFrame` as sparse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the documentation, but we finally decided to just use "real-world" examples when they help understand the example. In my opinion, in this cases it would be more consistent to use data looking more sparse, for example SparseSeries([0, 0, 1, 0])
.
Also, I think it'll make things easier for the user if this is written step-by-step. Instead of a longer paragraph explaining all the examples first, having short explanations for each check, for example:
- It returns
True
for if the parameter is a 1D sparse array - It returns
False
if the parameter is not sparse - It returns
False
if the parameter has more than 1 dimensions.
It's just an opinion, but personally I think this should be clearer for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista this is almost an internal function, exposed because we needed a function that says: hey are you a pandas sparse array.. we don't have a real sparse type at the moment so this is not a definitive guide. would be ok with improving the doc-string, maybe would should deprecate from the public api.
pandas/core/dtypes/common.py
Outdated
|
||
This function checks that 1 dimensional arrays are sparse. | ||
It will not identify that a `SparseDataFrame` as sparse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista this is almost an internal function, exposed because we needed a function that says: hey are you a pandas sparse array.. we don't have a real sparse type at the moment so this is not a definitive guide. would be ok with improving the doc-string, maybe would should deprecate from the public api.
Responding to pull request comments.
Hello @sechilds! Thanks for updating the PR.
Comment last updated on November 03, 2018 at 05:36 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good. Added couple of comments to improve clarity, and some changes to See Also based on the other proposed changes.
False | ||
|
||
This function checks that 1 dimensional arrays are sparse. | ||
It will not identify that a `SparseDataFrame` as sparse. | ||
Returns `False` if the parameter has more than one dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be explicit here in saying, return false if it has more than one dimension, even if it is sparse. Minor change, but I think it's a bit more clear.
@@ -120,45 +120,52 @@ def is_object_dtype(arr_or_dtype): | |||
|
|||
|
|||
def is_sparse(arr): | |||
"""Check whether an array-like is a pandas sparse array. | |||
""" | |||
Check whether an array-like is a 1-D pandas sparse array. | |||
|
|||
Check that the one-dimensional array-like is a pandas sparse array. | |||
Returns True if it is a pandas sparse array, not another type of | |||
sparse array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I find this paragraph not so clear. I think something like this would be shorter and also clearer: "Return True
if arr
is pandas.SparseArray
or pandas.SparseSeries
, and False
for any other type."
Whether or not the array-like is a pandas sparse array. | ||
|
||
See Also | ||
-------- | ||
DataFrame.to_sparse : Convert DataFrame to a SparseDataFrame. | ||
Series.to_sparse : Convert Series to SparseSeries. | ||
Series.to_dense : Return dense representation of a Series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my comment above, I think we want to include in the see also the mentioned methods pandas.SparseArray
and pandas.SparseSeries
. Also, may be it makes sense to get rid of DataFrame.to_sparse
, as it's not so directly related.
@datapythonista I think this is ready to be merged. |
@sechilds can you rebase? Sorry your PR was forgotten. |
No problem. I will rebase it later today. |
Codecov Report
@@ Coverage Diff @@
## master #19983 +/- ##
=======================================
Coverage 92.24% 92.24%
=======================================
Files 161 161
Lines 51317 51317
=======================================
Hits 47338 47338
Misses 3979 3979
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #19983 +/- ##
==========================================
- Coverage 92.22% 91.71% -0.51%
==========================================
Files 161 150 -11
Lines 51187 49112 -2075
==========================================
- Hits 47209 45045 -2164
- Misses 3978 4067 +89
Continue to review full report at Codecov.
|
Rebased, merging on green. |
Pushed one more rebase and merging on green. |
Thanks @sechilds and @datapythonista |
* BUG: Identify SparseDataFrame as sparse The is_sparse function checks to see if an array-like is spare by checking to see if it is an instance of ABCSparseArray or ABCSparseSeries. This commit adds ABCSparseDataFrame to that list -- so it can detect that a DataFrame (which is an array-like object) is sparse. Added a test for this. * Revert "BUG: Identify SparseDataFrame as sparse" This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method. * DOC: Revise is_sparce docstring Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like. * DOC: Adjust is_sparse docstring. Responding to pull request comments.
* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...
…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
* BUG: Identify SparseDataFrame as sparse The is_sparse function checks to see if an array-like is spare by checking to see if it is an instance of ABCSparseArray or ABCSparseSeries. This commit adds ABCSparseDataFrame to that list -- so it can detect that a DataFrame (which is an array-like object) is sparse. Added a test for this. * Revert "BUG: Identify SparseDataFrame as sparse" This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method. * DOC: Revise is_sparce docstring Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like. * DOC: Adjust is_sparse docstring. Responding to pull request comments.
* BUG: Identify SparseDataFrame as sparse The is_sparse function checks to see if an array-like is spare by checking to see if it is an instance of ABCSparseArray or ABCSparseSeries. This commit adds ABCSparseDataFrame to that list -- so it can detect that a DataFrame (which is an array-like object) is sparse. Added a test for this. * Revert "BUG: Identify SparseDataFrame as sparse" This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method. * DOC: Revise is_sparce docstring Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like. * DOC: Adjust is_sparse docstring. Responding to pull request comments.
* BUG: Identify SparseDataFrame as sparse The is_sparse function checks to see if an array-like is spare by checking to see if it is an instance of ABCSparseArray or ABCSparseSeries. This commit adds ABCSparseDataFrame to that list -- so it can detect that a DataFrame (which is an array-like object) is sparse. Added a test for this. * Revert "BUG: Identify SparseDataFrame as sparse" This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method. * DOC: Revise is_sparce docstring Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like. * DOC: Adjust is_sparse docstring. Responding to pull request comments.
is_sparse(pd.SparseDataFrame([1,2,3]))
returnsFalse
.The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.
Added a test for this.
git diff upstream/master -u -- "*.py" | flake8 --diff