GroupBy.count() returns the grouping column as both index and column #5610

jorisvandenbossche · 2013-11-28T11:52:46Z

GroupBy.count() (with the default as_index=True) return the grouping column both as index and as column, while other methods as first and sum keep it only as the index (which is most logical I think). This seems a minor inconsistency to me:

In [41]: data = pd.DataFrame({'name' : ['a', 'a', 'b', 'd'], 'counts' : [3,4,3,2]})
In [42]: data
Out[42]:
   counts name
0       3    a
1       4    a
2       3    b
3       2    d

In [43]: g = data.groupby('name')
In [45]: g.count()
Out[45]:
      counts  name
name
a          2     2
b          1     1
d          1     1

In [46]: g.first()
Out[46]:
      counts
name
a          3
b          3
d          2

In [47]: g.sum()
Out[47]:
      counts
name
a          7
b          3
d          2

The text was updated successfully, but these errors were encountered:

jreback · 2014-03-22T21:18:18Z

hmm....

@hayd @TomAugspurger

these look right? are first/last just different?

In [2]: df= pd.DataFrame({'name' : ['a', 'a', 'b', 'd'], 'counts' : [3,4,3,2]})

In [3]: g = df.groupby('name')

In [4]: g.count()
Out[4]: 
      counts  name
name              
a          2     2
b          1     1
d          1     1

[3 rows x 2 columns]

In [5]: g.first()
Out[5]: 
      counts
name        
a          3
b          3
d          2

[3 rows x 1 columns]

In [6]: g.head()
Out[6]: 
   counts name
0       3    a
1       4    a
2       3    b
3       2    d

[4 rows x 2 columns]

In [7]: g.tail()
Out[7]: 
   counts name
0       3    a
1       4    a
2       3    b
3       2    d

[4 rows x 2 columns]

In [8]: g.last()
Out[8]: 
      counts
name        
a          4
b          3
d          2

[3 rows x 1 columns]

hayd · 2014-03-22T21:50:25Z

IMO first should do the same as g.nth(0) and last as g.nth(-1), since as mentioned in the larger PR they are not aggregations (I think breaking these are on the roadmap for 0.14?). First and last are implemented as aggs atm.

Original issue is that count includes name, I also don't think it should. Will have a look at this, may be simple fix. Related to cumsum etc. including the grouped by columns (so may be a generic fix in agg).

jreback · 2014-03-22T21:54:06Z

hmm...though you already redefined first/last to be nth's...oh well...that sounds right to me

hayd · 2014-03-24T04:19:46Z

duh, count's applying when it should be agg-ing. PR shortly.

jreback · 2014-04-28T23:08:53Z

http://stackoverflow.com/questions/23352418/unexpected-behavior-in-pandas-mad-with-groupby/23352706#23352706

mad is the same problem (as are prob the non-cythonized calls).

jreback · 2014-04-28T23:09:35Z

@hayd you are doing a PR for this one? or is it already out there but missing a reference?

hayd · 2014-04-29T00:19:52Z

apparently I did on my machine, rebased and pushed will see if it passes... mañana

jreback · 2014-04-29T00:31:38Z

gr8!

jreback · 2014-04-29T12:54:57Z

These seem completely wrong (I haven't changed anything yet to exlucde the 'A' column
when testing these)

In [1]: df = DataFrame([[1, 2, 'foo'], [1, nan, 'bar',], [3, nan, 'baz']], columns=['A', 'B','C'])

In [2]: df
Out[2]: 
   A   B    C
0  1   2  foo
1  1 NaN  bar
2  3 NaN  baz

[3 rows x 3 columns]

In [3]: df.groupby('A').shift(1)
Out[3]: 
    A   B    C
0 NaN NaN  NaN
1   1   2  foo
2 NaN NaN  NaN

[3 rows x 3 columns]

In [4]: df.groupby('A').fillna(-1)
Out[4]: 
   A  B    C
0  1  2  foo
1  1 -1  bar
2  3 -1  baz

[3 rows x 3 columns]

In [5]: df.groupby('A').apply(lambda x: x.fillna(-1))
Out[5]: 
   A  B    C
0  1  2  foo
1  1 -1  bar
2  3 -1  baz

[3 rows x 3 columns]

jorisvandenbossche · 2014-04-29T12:57:51Z

What is wrong with these?

jreback · 2014-04-29T13:03:55Z

they are not grouping (indices should be 1, 3)

jorisvandenbossche · 2014-04-29T13:05:06Z

but it are non-aggregating functions?

jreback · 2014-04-29T13:06:30Z

hmm...you are right, then these should raise (rather than silenty 'work').

hayd · 2014-04-29T13:09:43Z

Is fillna in the white list?

I think shift is correct here though, it is shifting within the groups.

jorisvandenbossche · 2014-04-29T13:10:17Z

Why should they raise? Eg df.groupby('A').shift(1) seems correct to me (shift within each group).
It's a bit like the transform functions?

The use of fillna is less clear, but it seems this is added explicitely in the whitelist.

jreback · 2014-04-29T13:11:25Z

I guess these do effectively a transform by default (which is ok)

JayDDD · 2016-08-26T21:13:28Z

How could I count the number of 'a' under the column name? I only need this number. what command should I use?

jreback added API Design labels Mar 22, 2014

jreback added this to the 0.14.0 milestone Mar 22, 2014

hayd mentioned this issue Apr 29, 2014

make grouping column an agg #6997

Closed

jreback added Bug and removed API Design labels Apr 29, 2014

jreback mentioned this issue Apr 29, 2014

ENH/BUG: add count to grouper / ensure that grouper keys are not included in the returned #7000

Merged

jreback closed this as completed in #7000 Apr 29, 2014

rhshadrach mentioned this issue Apr 20, 2024

BUG: GroupBy.apply with as_index=False still produces a MultiIndex #58291

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupBy.count() returns the grouping column as both index and column #5610

GroupBy.count() returns the grouping column as both index and column #5610

jorisvandenbossche commented Nov 28, 2013

jreback commented Mar 22, 2014

hayd commented Mar 22, 2014

jreback commented Mar 22, 2014

hayd commented Mar 24, 2014

jreback commented Apr 28, 2014

jreback commented Apr 28, 2014

hayd commented Apr 29, 2014

jreback commented Apr 29, 2014

jreback commented Apr 29, 2014

jorisvandenbossche commented Apr 29, 2014

jreback commented Apr 29, 2014

jorisvandenbossche commented Apr 29, 2014

jreback commented Apr 29, 2014

hayd commented Apr 29, 2014

jorisvandenbossche commented Apr 29, 2014

jreback commented Apr 29, 2014

JayDDD commented Aug 26, 2016

GroupBy.count() returns the grouping column as both index and column #5610

GroupBy.count() returns the grouping column as both index and column #5610

Comments

jorisvandenbossche commented Nov 28, 2013

jreback commented Mar 22, 2014

hayd commented Mar 22, 2014

jreback commented Mar 22, 2014

hayd commented Mar 24, 2014

jreback commented Apr 28, 2014

jreback commented Apr 28, 2014

hayd commented Apr 29, 2014

jreback commented Apr 29, 2014

jreback commented Apr 29, 2014

jorisvandenbossche commented Apr 29, 2014

jreback commented Apr 29, 2014

jorisvandenbossche commented Apr 29, 2014

jreback commented Apr 29, 2014

hayd commented Apr 29, 2014

jorisvandenbossche commented Apr 29, 2014

jreback commented Apr 29, 2014

JayDDD commented Aug 26, 2016