Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Max with null in datetime columns gives incorrect results #6963

Closed
VibhuJawa opened this issue Dec 10, 2020 · 0 comments · Fixed by #7010
Closed

[BUG] Max with null in datetime columns gives incorrect results #6963

VibhuJawa opened this issue Dec 10, 2020 · 0 comments · Fixed by #7010
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@VibhuJawa
Copy link
Member

Describe the bug
Max with null in one of the datetime columns gives nulls instead of actual results

Steps/Code to reproduce bug

import cudf
import pandas as pd
from datetime import datetime

p_df = pd.DataFrame({
    "datetime_0": [datetime(2011, 4, 9, 10, 30, 15)]*4,
    "datetime_1": [None]*4})
p_df['datetime_1']=p_df['datetime_1'].astype('datetime64[ns]')

df = cudf.from_pandas(p_df)
print(df.max(axis=1))
0    <NA>
1    <NA>
2    <NA>
3    <NA>
dtype: datetime64[ns]

Expected behavior

import cudf
import pandas as pd
from datetime import datetime

p_df = pd.DataFrame({
    "datetime_0": [datetime(2011, 4, 9, 10, 30, 15)]*4,
    "datetime_1": [None]*4})
p_df['datetime_1']=p_df['datetime_1'].astype('datetime64[ns]')

print(p_df.max(axis=1))
0   2011-04-09 10:30:15
1   2011-04-09 10:30:15
2   2011-04-09 10:30:15
3   2011-04-09 10:30:15
dtype: datetime64[ns]

Environment overview (please complete the following information)

  • Method of cuDF install: [conda]
cudf                      0.17.0a201209   cuda_10.2_py37_gbd321d1e93_382    rapidsai-nightly
libcudf                   0.17.0a201209   cuda10.2_gbd321d1e93_382    rapidsai-nightly
@VibhuJawa VibhuJawa added bug Something isn't working Needs Triage Need team to review and classify labels Dec 10, 2020
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Dec 10, 2020
@galipremsagar galipremsagar self-assigned this Dec 10, 2020
@rapids-bot rapids-bot bot closed this as completed in #7010 Feb 4, 2021
rapids-bot bot pushed a commit that referenced this issue Feb 4, 2021
Fixes: #6963 

This PR introduces a "Working with missing data" doc page where we clearly outline how we can work with missing data in cudf. 

The behavior shown in #6963 is correct due to the fact that cudf treats `NaT` as `<NA>` values. Hence highlighted the difference in behavior of having `NaT` in datetime/timedelta values between pandas and cudf.

Authors:
  - GALI PREM SAGAR (@galipremsagar)

Approvers:
  - Ram (Ramakrishna Prabhu) (@rgsl888prabhu)

URL: #7010
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants