-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unable to write timedelta64[s]
type correctly with parquet writer
#13409
Comments
Maybe a related PR previously worked on similar issue: #11854 |
This seems to be a problem where libcudf is not writing
output |
Investigation Notes:
|
Update: Support for For Question is: Should we do it or leave it be as the notion of def datetime_interop():
for type in [
"timestamp[s]",
"timestamp[ms]",
"timestamp[us]",
]:
times = pa.array(
[1234, 3456, 32442], type=type
)
names = ["d"]
pa_table = pa.Table.from_arrays([times], names=names)
buf = BytesIO()
pq.write_table(pa_table, buf)
df2 = cudf.read_parquet(buf)
df3 = pq.read_table(buf)
# prints the same values (ignore units)
print("Original table (pa)\n", pa_table)
print("cudf read parquet\n", df2)
print("pyarrow read parquet\n", df3)
# convert all to pd.Timestamp without caring about column units
value1 = pd.Timestamp(pa_table["d"][0].as_py())
value2 = pd.Timestamp(df2["d"][0])
value3 = pd.Timestamp(df3["d"][0].as_py())
# check equality
assert value1 == value2
assert value1 == value3
# redundant but anyway
assert value2 == value3 |
Closing this issue for now as units are meaningless for timestamp types as they are treated and displayed in absolute values. Please see the last comment with updates. |
Describe the bug
Only when we have
timedelta64[s]
dtype for a column, the parquet writer seems to be writing it as atimedelta64[ms]
column which is leading both cudf & pyarrow parquet readers to pickup the column type incorrectly.Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.
Expected behavior
We are writing all other
timedelta
resolutions(ns
,ms
,us
) correctly. It's a problem only being seen withs
. We should be able to round-trip this type correctly if writer can correctly write this type.Environment overview (please complete the following information)
Environment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: