Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: series.is_constant #54033

Closed
1 of 3 tasks
sbrugman opened this issue Jul 7, 2023 · 3 comments · Fixed by #54064
Closed
1 of 3 tasks

ENH: series.is_constant #54033

sbrugman opened this issue Jul 7, 2023 · 3 comments · Fixed by #54064
Labels
Enhancement Series Series data structure

Comments

@sbrugman
Copy link
Contributor

sbrugman commented Jul 7, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Pandas Series have a property is_unique that returns true iff all values are distinct.

In practice, one often needs to know if a Series is constant. Getting a functional result when series do not contain nans is straight forward: (series.values[0] == series.values).all(). However, in the wild we observe that users often simply write series.nunique() == 1. This approach is more costly for larger series, as it first computes the number of unique values before comparing.

My assumption is that introducing a dedicated is_constant property will help users choose the more performant option. Another assumption is that its acceptable that all-nan series are somewhat slower.

perf_short_circuit

def setup(n):
    return pd.Series(list(range(n)))

perf_worst

def setup(n):
    return pd.Series([1] * (n - 1) + [2])

Feature Description

@property
def is_constant():
    if v.shape[0] == 0:
        return False
    return (v[0] == v).all()

(based on is_unique and nunique, in the absence of nans)

To extend to NA values:
If dropna=True: add v = remove_na_arraylike(v)
If dropna=False: add or not pd.notna(v).any() (note: differs from np.unique for pd.NA and np.nan mixed series by lack of disambiguation between the two)

Alternative Solutions

Recommend the approach above for series.nunique() == 1 patterns in the documentation (here)

Additional Context

No response

@sbrugman sbrugman added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 7, 2023
@lithomas1
Copy link
Member

Is checking for a constant ever a bottleneck in a workflow?

This seems like something a user could easily do themselves.

Given the heavy usage of nunique()==1, maybe this would be better as an example to add to the pandas cookbook?

@lithomas1 lithomas1 added Series Series data structure and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 8, 2023
@sbrugman
Copy link
Contributor Author

Adding an example to the pandas cookbook seems a good way of informing users to consider implementing a check that short-circuits. I will open a PR.

For completeness: Implementing is_constant (i.e. performant version of nunique() == 1) would be consistent with is_unique (i.e. nunique() == len(series)). The latter is also simple to achieve by the user and not expected to be a bottleneck.

sbrugman added a commit to sbrugman/pandas that referenced this issue Jul 10, 2023
@lithomas1
Copy link
Member

For completeness: Implementing is_constant (i.e. performant version of nunique() == 1) would be consistent with is_unique (i.e. nunique() == len(series)). The latter is also simple to achieve by the user and not expected to be a bottleneck.

I'll leave this open to see if others want this then.

mroeschke pushed a commit that referenced this issue Jul 12, 2023
* Document constant check in series

Closes #54033

* Update cookbook.rst

* Update cookbook.rst

* empty series is constant

* Update cookbook.rst

* Update cookbook.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Series Series data structure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants