Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Update is_sparse docstring #19983

Merged
merged 6 commits into from
Nov 13, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 25 additions & 6 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,17 +120,26 @@ def is_object_dtype(arr_or_dtype):


def is_sparse(arr):
"""
Check whether an array-like is a pandas sparse array.
"""Check whether an array-like is a pandas sparse array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's not 100% correct, technically speaking. For what it's explained later, I think the function checks whether an array-like is a 1-D pandas sparse array.


Check that the one-dimensional array-like is a pandas sparse array.
Returns True if it is a pandas sparse array, not another type of
sparse array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will take some research, but I think it would be useful to explain what are the use cases of this function, besides what it does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find this paragraph not so clear. I think something like this would be shorter and also clearer: "Return True if arr is pandas.SparseArray or pandas.SparseSeries, and False for any other type."


Parameters
----------
arr : array-like
The array-like to check.
arr : array-like (1-D)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, and based on your example, arr can be actually of more dimensions, but is being checked its dimensionality. Isn't it?

Array-like to check.

Returns
-------
boolean : Whether or not the array-like is a pandas sparse array.
boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some discussion about it, but we should be using Python types, so bool instead of boolean.

Whether or not the array-like is a pandas sparse array.

See Also
--------
DataFrame.to_sparse : Convert DataFrame to a SparseDataFrame.
Series.to_sparse : Convert Series to SparseSeries.
datapythonista marked this conversation as resolved.
Show resolved Hide resolved

Examples
--------
Expand All @@ -147,8 +156,18 @@ def is_sparse(arr):
>>> from scipy.sparse import bsr_matrix
>>> is_sparse(bsr_matrix([1, 2, 3]))
False
"""

This function checks that 1 dimensional arrays are sparse.
It will not identify that a `SparseDataFrame` as sparse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the documentation, but we finally decided to just use "real-world" examples when they help understand the example. In my opinion, in this cases it would be more consistent to use data looking more sparse, for example SparseSeries([0, 0, 1, 0]).

Also, I think it'll make things easier for the user if this is written step-by-step. Instead of a longer paragraph explaining all the examples first, having short explanations for each check, for example:

  • It returns True for if the parameter is a 1D sparse array
  • It returns False if the parameter is not sparse
  • It returns False if the parameter has more than 1 dimensions.

It's just an opinion, but personally I think this should be clearer for the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista this is almost an internal function, exposed because we needed a function that says: hey are you a pandas sparse array.. we don't have a real sparse type at the moment so this is not a definitive guide. would be ok with improving the doc-string, maybe would should deprecate from the public api.


>>> df = pd.SparseDataFrame([389., 24., 80.5, np.nan],
columns=['max_speed'],
index=['falcon', 'parrot', 'lion', 'monkey'])
>>> is_sparse(df)
False
>>> is_sparse(df.max_speed)
True
"""
return isinstance(arr, (ABCSparseArray, ABCSparseSeries))


Expand Down