ENH: Add a Series method which checks whether a Series is constant #58806

nathanjmcdougall · 2024-05-22T00:15:58Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

In the cookbook, a recipe is given for checking that a Series only contains constant values in a performant way:

https://pandas.pydata.org/docs/user_guide/cookbook.html#constant-series

is_constant = v.shape[0] == 0 or (s[0] == s).all()

To me, this has poor readability and is difficult to learn as an idiom because it requires the programmer to remember to check the edge case of .shape[0] == [0], and to remember to check the cases of missing values / NaN values, which need to be handled differently (as explained in the cookbook).

Feature Description

It would be nice to have a convenience function which provided a performant is_constant check on a Series.

It could have optional arguments to configure how missing values are handled.

Alternative Solutions

The alternative is just to require the user to detect the poorly performant code, possibly automatically with a linter (see below), and come up with a performant solution for their case, possibly using the cookbook. Otherwise, the simple .nunique(dropna=...) <= 1 solution is convenient enough for when performance is not a concern.

Additional Context

I came across this when using a pandas-vet rule via ruff: PD101

I like the linter to detect performance issues like this one; but I prefer that they don't harm readability if possible.

The text was updated successfully, but these errors were encountered:

Aloqeely · 2024-05-22T11:57:25Z

I don't think we should create a function that can be achieved by 1 line of code just because that line of code is not readable.
Code readability is subjective, but you can use an if statement to make it more readable (although it's a bit redundant):

if v.shape[0] != 0:
    is_constant = (s[0] == s).all()
else:
    is_constant = True

There was an issue suggesting the same feature (#54033) but got closed without any discussion, we can continue the discussion here. I'm ok with adding this after reading @sbrugman's valid points in the original issue.

PushpitSB · 2024-05-30T09:32:14Z

This will be a great addition

miguelpgarcia · 2024-06-16T19:46:56Z

I don't think we should create a function that can be achieved by 1 line of code just because that line of code is not readable. Code readability is subjective, but you can use an if statement to make it more readable (although it's a bit redundant):
if v.shape[0] != 0:
    is_constant = (s[0] == s).all()
else:
    is_constant = True
There was an issue suggesting the same feature (#54033) but got closed without any discussion, we can continue the discussion here. I'm ok with adding this after reading @sbrugman's valid points in the original issue.

The is_unique function is also concise, consisting of just one line of code. Adding this as an official method, rather than leaving it as a recipe, may enhance code consistency (having both is_unique and is_constant methods) and guide users towards a more performant option.

randolf-scholz · 2024-06-17T19:40:50Z

The proposed (s[0] == s).all() is error-prone in edge cases (What if s[0] is NaN? What if s is empty?), and actually slower for small Series. Going this route, one should do a .dropna() and .values/.array first.

array = s.dropna().values
is_constant = array.shape[0] == 0 or (array[0] == array).all()

I posted my finding here: astral-sh/ruff#11910. However, this solution is still O(N) and not short-circuiting. For large Series that are non-constant with high likelihood, naive python code can be orders of magnitude faster.

import pandas as pd
import numpy as np

def is_constant(array):
    if len(array) <= 1:
        return True
    first = array[0]
    return all(item == first for item in array)

const = pd.Series(np.ones(1_000_000)).values
irreg = pd.Series(np.random.randn(1_000_000)).values

%timeit is_constant(const)         # 72.7 ms ± 1.66 ms 
%timeit (const[0] == const).all()  # 144 µs ± 2.3 µs
%timeit is_constant(irreg)         # 968 ns ± 6.7 ns
%timeit (irreg[0] == irreg).all()  # 129 µs ± 132 ns

With numba-jit we can further drastically improve the performance

import pandas as pd
import numpy as np
import numba

@numba.njit
def is_constant(array):
    if len(array) <= 1:
        return True
    first = array[0]
    for item in array:
        if item != first:
            return False
    return True

const = pd.Series(np.ones(1_000_000)).values
irreg = pd.Series(np.random.randn(1_000_000)).values

%timeit is_constant(const)         # 457 µs ± 5.42 µs  (instead of 72 ms)
%timeit (const[0] == const).all()  # 136 µs ± 311 ns
%timeit is_constant(irreg)         # 242 ns ± 1.52 ns  (instead of 968 ns)
%timeit (irreg[0] == irreg).all()  # 128 µs ± 2.15 µs

nathanjmcdougall added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 22, 2024

Aloqeely added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 23, 2024

pedrocariellof mentioned this issue Jul 1, 2024

ENH: Add a Series method which checks whether a Series is constant #59152

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add a Series method which checks whether a Series is constant #58806

ENH: Add a Series method which checks whether a Series is constant #58806

nathanjmcdougall commented May 22, 2024 •

edited

Loading

Aloqeely commented May 22, 2024

PushpitSB commented May 30, 2024

miguelpgarcia commented Jun 16, 2024 •

edited

Loading

randolf-scholz commented Jun 17, 2024 •

edited

Loading

ENH: Add a Series method which checks whether a Series is constant #58806

ENH: Add a Series method which checks whether a Series is constant #58806

Comments

nathanjmcdougall commented May 22, 2024 • edited Loading

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Aloqeely commented May 22, 2024

PushpitSB commented May 30, 2024

miguelpgarcia commented Jun 16, 2024 • edited Loading

randolf-scholz commented Jun 17, 2024 • edited Loading

nathanjmcdougall commented May 22, 2024 •

edited

Loading

miguelpgarcia commented Jun 16, 2024 •

edited

Loading

randolf-scholz commented Jun 17, 2024 •

edited

Loading