inconsistent name handling in value_counts, part 2 #11579

corr724 · 2015-11-12T07:26:24Z

I was happy to see in the release notes for 0.17.0 that value_counts no longer discards the series name, but the implementation wasn't what I expected.

0.17.0 gives

>>> series = pd.Series([1731, 364, 813, 1731], name='user_id')
>>> series.value_counts()
1731    2
813     1
364     1
Name: user_id, dtype: int64

which doesn't set the index name.

In my opinion the old series name belongs in the index, not in the series name:

>>> series.value_counts()
user_id
1731    2
813     1
364     1
dtype: int64

Why:

It's logical: the user_id has moved to the index, and the values now represent occurrence counts
This would be consistent with how .groupby().size() behaves
Adding a missing index name is cumbersome and requires creating a temporary variable
In many cases the series name is discarded, while index names tend to stick around: for example, pd.DataFrame({'n': series.value_counts(), 'has_duplicates': series.value_counts() > 1}) should really have user_id as an index name

There are three options:

result.name = None and result.index.name = series.name
result.name = series.name and result.index.name = series.name
result.name = 'size' or 'count' and result.index.name = series.name

The first option seems more elegant to me but @sinhrks, who reported #10150, apparently expected result.name to be filled, so perhaps there are use cases where the second option is useful.

The text was updated successfully, but these errors were encountered:

jreback · 2015-11-12T12:35:08Z

Can you show where you think .groupby preserves? AFAICT it does not.

In [111]: s = pd.Series([1731, 364, 813, 1731], name='user_id', index=Index(range(4),name='foo'))

In [112]: s
Out[112]: 
foo
0    1731
1     364
2     813
3    1731
Name: user_id, dtype: int64

In [113]: s.groupby(s.values).count()           
Out[113]: 
364     1
813     1
1731    2
dtype: int64

In [114]: s.value_counts()
Out[114]: 
1731    2
813     1
364     1
Name: user_id, dtype: int64

sinhrks · 2015-11-12T13:57:39Z

@corr724 #10150 isn't the same as your second option. It is:

result.name = series.name and result.index.name = None

Agreed to first option is consistent considering groupby().count().

s.groupby(s).count().name
# None
s.groupby(s).count().index.name
# 'user_id'

corr724 · 2015-11-12T16:31:49Z

Yes, the three options are all ways to change the current behavior. The second option was intended as a compromise between the first option and the change in #10150.

@jreback: That's because s.values discards all Pandas metadata, but using either s.groupby(s) or df.groupby('column') will set the index name.

jreback · 2015-11-13T16:24:02Z

ok, that seems reasonable then.

I think the option 1 is fine (though ok with name=size/count as well, your option 3), just should make sure that groupby.size/count is the same.

lukemanley · 2023-02-03T23:51:14Z

closing as this was implemented in #49912

jreback added Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Nov 12, 2015

jreback mentioned this issue Dec 9, 2017

Groupby with function returning a tuple fails if index has a name #18696

Closed

mroeschke added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug and removed API Design Groupby Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 21, 2021

lukemanley closed this as completed Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistent name handling in value_counts, part 2 #11579

inconsistent name handling in value_counts, part 2 #11579

corr724 commented Nov 12, 2015

jreback commented Nov 12, 2015

sinhrks commented Nov 12, 2015

corr724 commented Nov 12, 2015

jreback commented Nov 13, 2015

lukemanley commented Feb 3, 2023

inconsistent name handling in value_counts, part 2 #11579

inconsistent name handling in value_counts, part 2 #11579

Comments

corr724 commented Nov 12, 2015

jreback commented Nov 12, 2015

sinhrks commented Nov 12, 2015

corr724 commented Nov 12, 2015

jreback commented Nov 13, 2015

lukemanley commented Feb 3, 2023