Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: agg() function on groupby dataframe changes dtype of datetime64[ns] column to float64 #12992

Closed
wants to merge 4 commits into from

Conversation

facaiy
Copy link
Contributor

@facaiy facaiy commented Apr 26, 2016

closes #12821
closes #12941

@jreback jreback added Bug Datetime Datetime data dtype Groupby labels Apr 26, 2016
@jreback
Copy link
Contributor

jreback commented May 7, 2016

can you rebase / update

@facaiy
Copy link
Contributor Author

facaiy commented May 8, 2016

Sorry, @jreback.

This month I was far busy than expected, and I'm planning to fix the bug on next Sunday.

@facaiy facaiy force-pushed the agg_time_dtype branch 2 times, most recently from 40922cb to 1236422 Compare May 12, 2016 15:35
@codecov-io
Copy link

codecov-io commented May 12, 2016

Current coverage is 85.30% (diff: 100%)

Merging #12992 into master will decrease coverage by <.01%

@@             master     #12992   diff @@
==========================================
  Files           139        139          
  Lines         50138      50140     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42769      42770     +1   
- Misses         7369       7370     +1   
  Partials          0          0          

Powered by Codecov. Last update 3186fef...607a170

@facaiy facaiy changed the title WIP: agg() function on groupby dataframe changes dtype of datetime64[ns] column to float64 agg() function on groupby dataframe changes dtype of datetime64[ns] column to float64 May 13, 2016
@facaiy facaiy changed the title agg() function on groupby dataframe changes dtype of datetime64[ns] column to float64 BUG: agg() function on groupby dataframe changes dtype of datetime64[ns] column to float64 May 13, 2016
@jreback
Copy link
Contributor

jreback commented May 25, 2016

can you rebase / update?

@@ -139,3 +139,5 @@ Bug Fixes
- Bug in ``NaT`` - ``Period`` raises ``AttributeError`` (:issue:`13071`)
- Bug in ``Period`` addition raises ``TypeError`` if ``Period`` is on right hand side (:issue:`13069`)
- Bug in ``pd.set_eng_float_format()`` that would prevent NaN's from formatting (:issue:`11981`)

- Bug in ``agg()`` function on groupby dataframe changes dtype of ``datetime64[ns]`` column to ``float64`` (:issue:`12821`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in .groupby(..).agg(..) could change the dtype of ......

@jreback jreback added this to the 0.18.2 milestone May 27, 2016
@jreback
Copy link
Contributor

jreback commented May 27, 2016

does this also address #12941 ?

pls rebase / small update as well. ping on green.

@facaiy
Copy link
Contributor Author

facaiy commented May 29, 2016

Hi, I have fixed #12941 and rebase again.

pd.Timestamp('2012-05-01')]})

res = df.min()
tm.assert_attr_equal('dtype', df['foo'], res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct an expected result and use assert_series_equal

@@ -318,3 +319,7 @@ Bug Fixes


- Bug in ``Categorical.remove_unused_categories()`` changes ``.codes`` dtype to platform int (:issue:`13261`)

- Bug in ``agg()`` function on groupby dataframe changes dtype of ``datetime64[ns]`` column to ``float64`` (:issue:`12821`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in ``.agg()` on a groupby, changes the dtype of .....

@jreback
Copy link
Contributor

jreback commented May 30, 2016

ok looks good. pls rebase / squash (and build failed).

@facaiy
Copy link
Contributor Author

facaiy commented Jun 1, 2016

rebase again.

@jreback
Copy link
Contributor

jreback commented Jun 30, 2016

can you rebase and i'll review

@jreback
Copy link
Contributor

jreback commented Jul 6, 2016

can you rebase. ping when green.

@jreback jreback removed this from the 0.18.2 milestone Jul 6, 2016
@facaiy
Copy link
Contributor Author

facaiy commented Jul 24, 2016

@jreback, hi, agg function will eat up Name, is it a bug?

@jreback
Copy link
Contributor

jreback commented Jul 24, 2016

not sure can u show s complete example

@facaiy
Copy link
Contributor Author

facaiy commented Jul 24, 2016

In [12]: df = pd.Series(range(5)).to_frame()

In [13]: df
Out[13]:
   0
0  0
1  1
2  2
3  3
4  4

In [14]: df.min()
Out[14]:
0    0
dtype: int64

In [15]: df.loc[0]
Out[15]:
0    0
Name: 0, dtype: int64

@jreback
Copy link
Contributor

jreback commented Jul 24, 2016

Giving a slightly more clear example. I suppose this could be done. Though its only for a very limited number of reducers, prob just min/max. For example, anything that does an aggregation like .sum/.mean this by-definition obliterates the index. Pls make a new issue and list the functions you think should be changed here.

In [19]: df = pd.Series(range(5),index=list('abcde')).to_frame(name='foo')

In [20]: df
Out[20]: 
   foo
a    0
b    1
c    2
d    3
e    4

In [21]: df.loc['a']
Out[21]: 
foo    0
Name: a, dtype: int64

In [22]: df.min()
Out[22]: 
foo    0
dtype: int64

@facaiy
Copy link
Contributor Author

facaiy commented Jul 28, 2016

@jreback Hi, I'll submit a issue later. And all checks have passed, so do you think that the pr is ok?

# test for `first` function
exp = df.loc[[0, 3, 4, 6]].set_index('class')

res = df.groupby('class').first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to test

df.groupby('class').time.first() and so on (e.g. repeat with .agg both forms as well)

if making a loop helps to simplify testing, pls do so

@facaiy
Copy link
Contributor Author

facaiy commented Jul 30, 2016

@jreback all checks have passed.

exp = pd.Series([pd.Timestamp('2012-05-01')], index=["foo"])
tm.assert_series_equal(res, exp)

# GH12941
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a small comment here (1-line) that explains what you are testing

@jreback jreback added this to the 0.19.0 milestone Aug 3, 2016
@jreback
Copy link
Contributor

jreback commented Aug 3, 2016

lgtm. pls rebase and ping on green.

@@ -854,4 +854,9 @@ Bug Fixes
- Bugs in ``Index.difference`` and ``DataFrame.join`` raise in Python3 when using mixed-integer indexes (:issue:`13432`, :issue:`12814`)

- Bug in ``.to_excel()`` when DataFrame contains a MultiIndex which contains a label with a NaN value (:issue:`13511`)

- Bug in ``pd.read_csv`` in Python 2.x with non-UTF8 encoded, multi-character separated data (:issue:`3404`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put bug fixes in blanks spaces (or create them) in-between other fixes to avoid conflicts (IOW, don't put them at the end)

@facaiy
Copy link
Contributor Author

facaiy commented Aug 6, 2016

@jreback Both #12821 and #12941 are fixed in the PR. The two bugs behave similarity, while they are located in different code. And all checks have passed.

@jreback jreback closed this in 63e8f68 Aug 6, 2016
@jreback
Copy link
Contributor

jreback commented Aug 6, 2016

thanks!

@jreback
Copy link
Contributor

jreback commented Aug 6, 2016

@ningchi can you have a look at these, see if they are also closed (so maybe just need some validation tests).

xref #8617 (might be a duplicate)
xref #4147

@facaiy
Copy link
Contributor Author

facaiy commented Aug 7, 2016

Ok

@facaiy facaiy deleted the agg_time_dtype branch August 7, 2016 08:57
@facaiy
Copy link
Contributor Author

facaiy commented Aug 13, 2016

It seems that #8617 has been fixed, while #4147 behaves differently now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Groupby
Projects
None yet
3 participants