[BUG] `as_column` of pandas timestamps delivers different resolution datetime depending on whether we pass a scalar or list #14627

wence- · 2023-12-14T11:33:07Z

Describe the bug

import pandas as pd
from cudf.core.column import as_column

data = pd.Timestamp("2000-01-01")

from_scalar = as_column(data)
from_list = as_column([data])

assert from_scalar.dtype == from_list.dtype # False

Expected behavior

The resolution should be inferred consistently. Note that cudf.Scalar(data) infers the same (nanosecond) resolution as as_column([data]).

The text was updated successfully, but these errors were encountered:

wence- · 2023-12-14T11:58:42Z

This is because scalar values get handled through:

from_arrow(pa.array(pd.Series([data]), from_pandas=True))

Whereas a list is handled by

from_arrow(pa.array([data]))

And pyarrow infers a different resolution for the timestamp compared to pandas.

This appears to be a bug in pyarrow, which does not pick up the correct nanosecond resolution of pandas timestamp objects, treating them like builtin datetime objects which have microsecond resolution.

How do we want to treat this case?

shwina · 2023-12-14T14:46:24Z

Can we supply an explicit data type to the pa.array() call in the latter case?

wence- · 2023-12-14T16:36:33Z

That requires another round of introspection. I do not know the history of as_column. In the case that we don't hit an "easily handled" path (arrow/numpy/pandas/cudf/cupy), is there a reason we don't just always go via pandas.Series?

wence- added bug Something isn't working Needs Triage Need team to review and classify labels Dec 14, 2023

mroeschke mentioned this issue Dec 14, 2023

Clean up special casing in as_column for non-typed input #14636

Closed

3 tasks

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `as_column` of pandas timestamps delivers different resolution datetime depending on whether we pass a scalar or list #14627

[BUG] `as_column` of pandas timestamps delivers different resolution datetime depending on whether we pass a scalar or list #14627

wence- commented Dec 14, 2023

wence- commented Dec 14, 2023

shwina commented Dec 14, 2023

wence- commented Dec 14, 2023

[BUG] as_column of pandas timestamps delivers different resolution datetime depending on whether we pass a scalar or list #14627

[BUG] as_column of pandas timestamps delivers different resolution datetime depending on whether we pass a scalar or list #14627

Comments

wence- commented Dec 14, 2023

wence- commented Dec 14, 2023

shwina commented Dec 14, 2023

wence- commented Dec 14, 2023

[BUG] `as_column` of pandas timestamps delivers different resolution datetime depending on whether we pass a scalar or list #14627

[BUG] `as_column` of pandas timestamps delivers different resolution datetime depending on whether we pass a scalar or list #14627