DOC: Update is_sparse docstring #19983

sechilds · 2018-03-04T14:57:49Z

is_sparse(pd.SparseDataFrame([1,2,3])) returns False.

The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.

Added a test for this.

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

The is_sparse function checks to see if an array-like is spare by checking to see if it is an instance of ABCSparseArray or ABCSparseSeries. This commit adds ABCSparseDataFrame to that list -- so it can detect that a DataFrame (which is an array-like object) is sparse. Added a test for this.

jreback · 2018-03-04T15:57:29Z

why is this necessary? is_sparse is for checking 1-d, a DataFrame is 2-d.

sechilds · 2018-03-04T16:21:35Z

I was looking at this method for the documentation sprint, and I thought it was odd that it would return False for a for SparseDataFrame. I wasn't too sure if array-like just referred to 1-d arrays or 2-d as well. A note could be added that it expects a 1-d array - and that would resolve it as well.

jreback · 2018-03-05T11:43:14Z

a doc-string update would be great.

This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method.

Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like.

datapythonista

Looks good, just some ideas that IMO would make the docstring better and more accurate.

datapythonista · 2018-03-06T16:34:41Z

pandas/core/dtypes/common.py

@@ -120,17 +120,26 @@ def is_object_dtype(arr_or_dtype):


 def is_sparse(arr):
-    """
-    Check whether an array-like is a pandas sparse array.
+    """Check whether an array-like is a pandas sparse array.


I think that's not 100% correct, technically speaking. For what it's explained later, I think the function checks whether an array-like is a 1-D pandas sparse array.

datapythonista · 2018-03-06T16:35:39Z

pandas/core/dtypes/common.py


    Parameters
    ----------
-    arr : array-like
-        The array-like to check.
+    arr : array-like (1-D)


If I'm not wrong, and based on your example, arr can be actually of more dimensions, but is being checked its dimensionality. Isn't it?

datapythonista · 2018-03-06T16:36:48Z

pandas/core/dtypes/common.py


    Returns
    -------
-    boolean : Whether or not the array-like is a pandas sparse array.
+    boolean


I think there is some discussion about it, but we should be using Python types, so bool instead of boolean.

datapythonista · 2018-03-06T16:38:23Z

pandas/core/dtypes/common.py

+
+    Check that the one-dimensional array-like is a pandas sparse array.
+    Returns True if it is a pandas sparse array, not another type of
+    sparse array.


It will take some research, but I think it would be useful to explain what are the use cases of this function, besides what it does.

datapythonista · 2018-03-06T16:44:32Z

pandas/core/dtypes/common.py


+    This function checks that 1 dimensional arrays are sparse.
+    It will not identify that a `SparseDataFrame` as sparse.


We need to update the documentation, but we finally decided to just use "real-world" examples when they help understand the example. In my opinion, in this cases it would be more consistent to use data looking more sparse, for example SparseSeries([0, 0, 1, 0]).

Also, I think it'll make things easier for the user if this is written step-by-step. Instead of a longer paragraph explaining all the examples first, having short explanations for each check, for example:

It returns True for if the parameter is a 1D sparse array

It returns False if the parameter is not sparse

It returns False if the parameter has more than 1 dimensions.

It's just an opinion, but personally I think this should be clearer for the user.

@datapythonista this is almost an internal function, exposed because we needed a function that says: hey are you a pandas sparse array.. we don't have a real sparse type at the moment so this is not a definitive guide. would be ok with improving the doc-string, maybe would should deprecate from the public api.

pandas/core/dtypes/common.py

jreback · 2018-03-07T14:27:56Z

pandas/core/dtypes/common.py


+    This function checks that 1 dimensional arrays are sparse.
+    It will not identify that a `SparseDataFrame` as sparse.


@datapythonista this is almost an internal function, exposed because we needed a function that says: hey are you a pandas sparse array.. we don't have a real sparse type at the moment so this is not a definitive guide. would be ok with improving the doc-string, maybe would should deprecate from the public api.

Responding to pull request comments.

pep8speaks · 2018-03-08T22:45:33Z

Hello @sechilds! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/dtypes/common.py !

Comment last updated on November 03, 2018 at 05:36 Hours UTC

datapythonista

Looks quite good. Added couple of comments to improve clarity, and some changes to See Also based on the other proposed changes.

datapythonista · 2018-03-08T22:49:57Z

pandas/core/dtypes/common.py

    False

-    This function checks that 1 dimensional arrays are sparse.
-    It will not identify that a `SparseDataFrame` as sparse.
+    Returns `False` if the parameter has more than one dimension.


I'd be explicit here in saying, return false if it has more than one dimension, even if it is sparse. Minor change, but I think it's a bit more clear.

datapythonista · 2018-03-08T22:53:55Z

pandas/core/dtypes/common.py

@@ -120,45 +120,52 @@ def is_object_dtype(arr_or_dtype):


 def is_sparse(arr):
-    """Check whether an array-like is a pandas sparse array.
+    """
+    Check whether an array-like is a 1-D pandas sparse array.

    Check that the one-dimensional array-like is a pandas sparse array.
    Returns True if it is a pandas sparse array, not another type of
    sparse array.


Personally I find this paragraph not so clear. I think something like this would be shorter and also clearer: "Return True if arr is pandas.SparseArray or pandas.SparseSeries, and False for any other type."

datapythonista · 2018-03-08T22:55:46Z

pandas/core/dtypes/common.py

        Whether or not the array-like is a pandas sparse array.

    See Also
    --------
    DataFrame.to_sparse : Convert DataFrame to a SparseDataFrame.
    Series.to_sparse : Convert Series to SparseSeries.
+    Series.to_dense : Return dense representation of a Series.


Based on my comment above, I think we want to include in the see also the mentioned methods pandas.SparseArray and pandas.SparseSeries. Also, may be it makes sense to get rid of DataFrame.to_sparse, as it's not so directly related.

CalmDownKarm · 2018-10-08T07:49:15Z

@datapythonista I think this is ready to be merged.

datapythonista · 2018-10-10T07:30:10Z

@sechilds can you rebase? Sorry your PR was forgotten.

sechilds · 2018-10-10T13:41:21Z

No problem. I will rebase it later today.

codecov · 2018-11-03T05:37:00Z

Codecov Report

Merging #19983 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #19983   +/-   ##
=======================================
  Coverage   92.24%   92.24%           
=======================================
  Files         161      161           
  Lines       51317    51317           
=======================================
  Hits        47338    47338           
  Misses       3979     3979

Flag	Coverage Δ
#multiple	`90.63% <ø> (ø)`	⬆️
#single	`42.31% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/common.py	`94.37% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20bdb3e...5a78a39. Read the comment docs.

codecov · 2018-11-03T05:37:00Z

Codecov Report

Merging #19983 into master will decrease coverage by 0.5%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #19983      +/-   ##
==========================================
- Coverage   92.22%   91.71%   -0.51%     
==========================================
  Files         161      150      -11     
  Lines       51187    49112    -2075     
==========================================
- Hits        47209    45045    -2164     
- Misses       3978     4067      +89

Flag	Coverage Δ
#multiple	`90.1% <ø> (-0.52%)`	⬇️
#single	`41.86% <ø> (-0.4%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/dtypes/common.py	`94.64% <ø> (+0.27%)`	⬆️
pandas/plotting/_compat.py	`62% <0%> (-25.5%)`	⬇️
pandas/core/arrays/base.py	`74.35% <0%> (-23.66%)`	⬇️
pandas/io/s3.py	`72.72% <0%> (-13.64%)`	⬇️
pandas/util/_doctools.py	`0% <0%> (-12.88%)`	⬇️
pandas/core/reshape/util.py	`90.32% <0%> (-9.68%)`	⬇️
pandas/util/_decorators.py	`82.4% <0%> (-8.95%)`	⬇️
pandas/core/dtypes/base.py	`91.66% <0%> (-8.34%)`	⬇️
pandas/errors/__init__.py	`92.3% <0%> (-7.7%)`	⬇️
pandas/core/common.py	`92.04% <0%> (-6.38%)`	⬇️
... and 142 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f71755...227007a. Read the comment docs.

datapythonista · 2018-11-03T05:38:07Z

Rebased, merging on green.

mroeschke · 2018-11-13T06:28:39Z

Pushed one more rebase and merging on green.

mroeschke · 2018-11-13T17:52:59Z

Thanks @sechilds and @datapythonista

* BUG: Identify SparseDataFrame as sparse The is_sparse function checks to see if an array-like is spare by checking to see if it is an instance of ABCSparseArray or ABCSparseSeries. This commit adds ABCSparseDataFrame to that list -- so it can detect that a DataFrame (which is an array-like object) is sparse. Added a test for this. * Revert "BUG: Identify SparseDataFrame as sparse" This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method. * DOC: Revise is_sparce docstring Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like. * DOC: Adjust is_sparse docstring. Responding to pull request comments.

* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...

…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)

* BUG: Identify SparseDataFrame as sparse The is_sparse function checks to see if an array-like is spare by checking to see if it is an instance of ABCSparseArray or ABCSparseSeries. This commit adds ABCSparseDataFrame to that list -- so it can detect that a DataFrame (which is an array-like object) is sparse. Added a test for this. * Revert "BUG: Identify SparseDataFrame as sparse" This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method. * DOC: Revise is_sparce docstring Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like. * DOC: Adjust is_sparse docstring. Responding to pull request comments.

jreback added Docs Sparse Sparse Data Type labels Mar 5, 2018

sechilds added 2 commits March 5, 2018 19:02

Revert "BUG: Identify SparseDataFrame as sparse"

4857b0b

This reverts commit 10dffd1. The previous commit's change was not necessary. Will add a docstring to clarify the behaviour of the method.

DOC: Revise is_sparce docstring

7eb5dec

Clean up the docstring for is_sparse so it confirms to the documentation style guide. Add additional examples and clarify that is_sparse expect a 1-dimensional array-like.

sechilds changed the title ~~BUG: Identify SparseDataFrame as sparse~~ DOC: Update is_sparse docstring Mar 6, 2018

datapythonista reviewed Mar 6, 2018

View reviewed changes

jreback requested changes Mar 7, 2018

View reviewed changes

DOC: Adjust is_sparse docstring.

541fa1c

Responding to pull request comments.

datapythonista reviewed Mar 8, 2018

View reviewed changes

Merging from master

227007a

Merge remote-tracking branch 'upstream/master' into fix_is_sparse

5a78a39

mroeschke merged commit a197837 into pandas-dev:master Nov 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Update is_sparse docstring #19983

DOC: Update is_sparse docstring #19983

sechilds commented Mar 4, 2018 •

edited

Loading

jreback commented Mar 4, 2018

sechilds commented Mar 4, 2018

jreback commented Mar 5, 2018

datapythonista left a comment

datapythonista Mar 6, 2018

datapythonista Mar 6, 2018

datapythonista Mar 6, 2018

datapythonista Mar 6, 2018

datapythonista Mar 6, 2018

jreback Mar 7, 2018

jreback Mar 7, 2018

pep8speaks commented Mar 8, 2018 •

edited

Loading

datapythonista left a comment

datapythonista Mar 8, 2018

datapythonista Mar 8, 2018

datapythonista Mar 8, 2018

CalmDownKarm commented Oct 8, 2018

datapythonista commented Oct 10, 2018

sechilds commented Oct 10, 2018

codecov bot commented Nov 3, 2018 •

edited

Loading

codecov bot commented Nov 3, 2018

datapythonista commented Nov 3, 2018

mroeschke commented Nov 13, 2018

mroeschke commented Nov 13, 2018


		This function checks that 1 dimensional arrays are sparse.
		It will not identify that a `SparseDataFrame` as sparse.

DOC: Update is_sparse docstring #19983

DOC: Update is_sparse docstring #19983

Conversation

sechilds commented Mar 4, 2018 • edited Loading

jreback commented Mar 4, 2018

sechilds commented Mar 4, 2018

jreback commented Mar 5, 2018

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Mar 8, 2018 • edited Loading

Comment last updated on November 03, 2018 at 05:36 Hours UTC

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CalmDownKarm commented Oct 8, 2018

datapythonista commented Oct 10, 2018

sechilds commented Oct 10, 2018

codecov bot commented Nov 3, 2018 • edited Loading

Codecov Report

codecov bot commented Nov 3, 2018

Codecov Report

datapythonista commented Nov 3, 2018

mroeschke commented Nov 13, 2018

mroeschke commented Nov 13, 2018

sechilds commented Mar 4, 2018 •

edited

Loading

pep8speaks commented Mar 8, 2018 •

edited

Loading

codecov bot commented Nov 3, 2018 •

edited

Loading