Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Update is_sparse docstring #19983

Merged
merged 6 commits into from
Nov 13, 2018
Merged

Conversation

sechilds
Copy link
Contributor

@sechilds sechilds commented Mar 4, 2018

is_sparse(pd.SparseDataFrame([1,2,3])) returns False.

The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.

Added a test for this.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.

Added a test for this.
@jreback
Copy link
Contributor

jreback commented Mar 4, 2018

why is this necessary? is_sparse is for checking 1-d, a DataFrame is 2-d.

@sechilds
Copy link
Contributor Author

sechilds commented Mar 4, 2018

I was looking at this method for the documentation sprint, and I thought it was odd that it would return False for a for SparseDataFrame. I wasn't too sure if array-like just referred to 1-d arrays or 2-d as well. A note could be added that it expects a 1-d array - and that would resolve it as well.

@jreback jreback added Docs Sparse Sparse Data Type labels Mar 5, 2018
@jreback
Copy link
Contributor

jreback commented Mar 5, 2018

a doc-string update would be great.

sechilds added 2 commits March 5, 2018 19:02
This reverts commit 10dffd1.

The previous commit's change was not necessary.
Will add a docstring to clarify the behaviour of the method.
Clean up the docstring for is_sparse so it confirms to the
documentation style guide.

Add additional examples and clarify that is_sparse
expect a 1-dimensional array-like.
@sechilds sechilds changed the title BUG: Identify SparseDataFrame as sparse DOC: Update is_sparse docstring Mar 6, 2018
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some ideas that IMO would make the docstring better and more accurate.

@@ -120,17 +120,26 @@ def is_object_dtype(arr_or_dtype):


def is_sparse(arr):
"""
Check whether an array-like is a pandas sparse array.
"""Check whether an array-like is a pandas sparse array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's not 100% correct, technically speaking. For what it's explained later, I think the function checks whether an array-like is a 1-D pandas sparse array.


Parameters
----------
arr : array-like
The array-like to check.
arr : array-like (1-D)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, and based on your example, arr can be actually of more dimensions, but is being checked its dimensionality. Isn't it?


Returns
-------
boolean : Whether or not the array-like is a pandas sparse array.
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some discussion about it, but we should be using Python types, so bool instead of boolean.


Check that the one-dimensional array-like is a pandas sparse array.
Returns True if it is a pandas sparse array, not another type of
sparse array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will take some research, but I think it would be useful to explain what are the use cases of this function, besides what it does.


This function checks that 1 dimensional arrays are sparse.
It will not identify that a `SparseDataFrame` as sparse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the documentation, but we finally decided to just use "real-world" examples when they help understand the example. In my opinion, in this cases it would be more consistent to use data looking more sparse, for example SparseSeries([0, 0, 1, 0]).

Also, I think it'll make things easier for the user if this is written step-by-step. Instead of a longer paragraph explaining all the examples first, having short explanations for each check, for example:

  • It returns True for if the parameter is a 1D sparse array
  • It returns False if the parameter is not sparse
  • It returns False if the parameter has more than 1 dimensions.

It's just an opinion, but personally I think this should be clearer for the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista this is almost an internal function, exposed because we needed a function that says: hey are you a pandas sparse array.. we don't have a real sparse type at the moment so this is not a definitive guide. would be ok with improving the doc-string, maybe would should deprecate from the public api.

pandas/core/dtypes/common.py Show resolved Hide resolved

This function checks that 1 dimensional arrays are sparse.
It will not identify that a `SparseDataFrame` as sparse.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista this is almost an internal function, exposed because we needed a function that says: hey are you a pandas sparse array.. we don't have a real sparse type at the moment so this is not a definitive guide. would be ok with improving the doc-string, maybe would should deprecate from the public api.

Responding to pull request comments.
@pep8speaks
Copy link

pep8speaks commented Mar 8, 2018

Hello @sechilds! Thanks for updating the PR.

Comment last updated on November 03, 2018 at 05:36 Hours UTC

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good. Added couple of comments to improve clarity, and some changes to See Also based on the other proposed changes.

False

This function checks that 1 dimensional arrays are sparse.
It will not identify that a `SparseDataFrame` as sparse.
Returns `False` if the parameter has more than one dimension.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be explicit here in saying, return false if it has more than one dimension, even if it is sparse. Minor change, but I think it's a bit more clear.

@@ -120,45 +120,52 @@ def is_object_dtype(arr_or_dtype):


def is_sparse(arr):
"""Check whether an array-like is a pandas sparse array.
"""
Check whether an array-like is a 1-D pandas sparse array.

Check that the one-dimensional array-like is a pandas sparse array.
Returns True if it is a pandas sparse array, not another type of
sparse array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find this paragraph not so clear. I think something like this would be shorter and also clearer: "Return True if arr is pandas.SparseArray or pandas.SparseSeries, and False for any other type."

Whether or not the array-like is a pandas sparse array.

See Also
--------
DataFrame.to_sparse : Convert DataFrame to a SparseDataFrame.
Series.to_sparse : Convert Series to SparseSeries.
Series.to_dense : Return dense representation of a Series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my comment above, I think we want to include in the see also the mentioned methods pandas.SparseArray and pandas.SparseSeries. Also, may be it makes sense to get rid of DataFrame.to_sparse, as it's not so directly related.

@CalmDownKarm
Copy link
Contributor

@datapythonista I think this is ready to be merged.

@datapythonista
Copy link
Member

@sechilds can you rebase? Sorry your PR was forgotten.

@sechilds
Copy link
Contributor Author

No problem. I will rebase it later today.

@codecov
Copy link

codecov bot commented Nov 3, 2018

Codecov Report

Merging #19983 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #19983   +/-   ##
=======================================
  Coverage   92.24%   92.24%           
=======================================
  Files         161      161           
  Lines       51317    51317           
=======================================
  Hits        47338    47338           
  Misses       3979     3979
Flag Coverage Δ
#multiple 90.63% <ø> (ø) ⬆️
#single 42.31% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/dtypes/common.py 94.37% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20bdb3e...5a78a39. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 3, 2018

Codecov Report

Merging #19983 into master will decrease coverage by 0.5%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19983      +/-   ##
==========================================
- Coverage   92.22%   91.71%   -0.51%     
==========================================
  Files         161      150      -11     
  Lines       51187    49112    -2075     
==========================================
- Hits        47209    45045    -2164     
- Misses       3978     4067      +89
Flag Coverage Δ
#multiple 90.1% <ø> (-0.52%) ⬇️
#single 41.86% <ø> (-0.4%) ⬇️
Impacted Files Coverage Δ
pandas/core/dtypes/common.py 94.64% <ø> (+0.27%) ⬆️
pandas/plotting/_compat.py 62% <0%> (-25.5%) ⬇️
pandas/core/arrays/base.py 74.35% <0%> (-23.66%) ⬇️
pandas/io/s3.py 72.72% <0%> (-13.64%) ⬇️
pandas/util/_doctools.py 0% <0%> (-12.88%) ⬇️
pandas/core/reshape/util.py 90.32% <0%> (-9.68%) ⬇️
pandas/util/_decorators.py 82.4% <0%> (-8.95%) ⬇️
pandas/core/dtypes/base.py 91.66% <0%> (-8.34%) ⬇️
pandas/errors/__init__.py 92.3% <0%> (-7.7%) ⬇️
pandas/core/common.py 92.04% <0%> (-6.38%) ⬇️
... and 142 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f71755...227007a. Read the comment docs.

@datapythonista
Copy link
Member

Rebased, merging on green.

@mroeschke
Copy link
Member

Pushed one more rebase and merging on green.

@mroeschke mroeschke merged commit a197837 into pandas-dev:master Nov 13, 2018
@mroeschke
Copy link
Member

Thanks @sechilds and @datapythonista

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
* BUG: Identify SparseDataFrame as sparse

The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.

Added a test for this.

* Revert "BUG: Identify SparseDataFrame as sparse"

This reverts commit 10dffd1.

The previous commit's change was not necessary.
Will add a docstring to clarify the behaviour of the method.

* DOC: Revise is_sparce docstring

Clean up the docstring for is_sparse so it confirms to the
documentation style guide.

Add additional examples and clarify that is_sparse
expect a 1-dimensional array-like.

* DOC: Adjust is_sparse docstring.

Responding to pull request comments.
thoo added a commit to thoo/pandas that referenced this pull request Nov 15, 2018
* upstream/master: (25 commits)
  DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651)
  DOC: Change release and whatsnew (pandas-dev#21599)
  DOC: Fix format of the See Also descriptions (pandas-dev#23654)
  DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374)
  ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692)
  CLN: Remove unnecessary code (pandas-dev#23696)
  Pin flake8-rst version (pandas-dev#23699)
  Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)
  CI: raise clone depth limit on CI
  BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688)
  REF: Move Excel names parameter handling to CSV (pandas-dev#23690)
  DOC: Accessing files from a S3 bucket. (pandas-dev#23639)
  Fix errorbar visualization (pandas-dev#23674)
  DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)
  DOC: Update is_sparse docstring (pandas-dev#19983)
  BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661)
  Add to_flat_index method to MultiIndex (pandas-dev#22866)
  CLN: Move to_excel to generic.py (pandas-dev#23656)
  TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
  CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655)
  ...
thoo added a commit to thoo/pandas that referenced this pull request Nov 15, 2018
…fixed

* upstream/master:
  DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651)
  DOC: Change release and whatsnew (pandas-dev#21599)
  DOC: Fix format of the See Also descriptions (pandas-dev#23654)
  DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374)
  ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692)
  CLN: Remove unnecessary code (pandas-dev#23696)
  Pin flake8-rst version (pandas-dev#23699)
  Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643)
  CI: raise clone depth limit on CI
  BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688)
  REF: Move Excel names parameter handling to CSV (pandas-dev#23690)
  DOC: Accessing files from a S3 bucket. (pandas-dev#23639)
  Fix errorbar visualization (pandas-dev#23674)
  DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678)
  DOC: Update is_sparse docstring (pandas-dev#19983)
  BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661)
  Add to_flat_index method to MultiIndex (pandas-dev#22866)
  CLN: Move to_excel to generic.py (pandas-dev#23656)
  TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
* BUG: Identify SparseDataFrame as sparse

The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.

Added a test for this.

* Revert "BUG: Identify SparseDataFrame as sparse"

This reverts commit 10dffd1.

The previous commit's change was not necessary.
Will add a docstring to clarify the behaviour of the method.

* DOC: Revise is_sparce docstring

Clean up the docstring for is_sparse so it confirms to the
documentation style guide.

Add additional examples and clarify that is_sparse
expect a 1-dimensional array-like.

* DOC: Adjust is_sparse docstring.

Responding to pull request comments.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* BUG: Identify SparseDataFrame as sparse

The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.

Added a test for this.

* Revert "BUG: Identify SparseDataFrame as sparse"

This reverts commit 10dffd1.

The previous commit's change was not necessary.
Will add a docstring to clarify the behaviour of the method.

* DOC: Revise is_sparce docstring

Clean up the docstring for is_sparse so it confirms to the
documentation style guide.

Add additional examples and clarify that is_sparse
expect a 1-dimensional array-like.

* DOC: Adjust is_sparse docstring.

Responding to pull request comments.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* BUG: Identify SparseDataFrame as sparse

The is_sparse function checks to see if
an array-like is spare by checking to see
if it is an instance of ABCSparseArray or
ABCSparseSeries. This commit adds
ABCSparseDataFrame to that list -- so it
can detect that a DataFrame (which is an
array-like object) is sparse.

Added a test for this.

* Revert "BUG: Identify SparseDataFrame as sparse"

This reverts commit 10dffd1.

The previous commit's change was not necessary.
Will add a docstring to clarify the behaviour of the method.

* DOC: Revise is_sparce docstring

Clean up the docstring for is_sparse so it confirms to the
documentation style guide.

Add additional examples and clarify that is_sparse
expect a 1-dimensional array-like.

* DOC: Adjust is_sparse docstring.

Responding to pull request comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants