You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Both cudf.to_datetime() and cudf.DataFrame.astype('datetime64[]') produce wrong results for a given fraction of second by adding leading zeros instead of append zeros by non-aligned (to the number of digits) datetime strings.
Also the result depends on the order of the given datetime strings in a dataframe column. If the columns first entry is not aligned to nine digits both functions come up with different results and astype returns all values in the column in the same format as the first one.
Expected behavior
For the values of the t-column in rows 1,3,4,5 both methods returns fractions of seconds with leading zeros: "...:37.00123...". Correct result would be "...:37.123...". In row 2 the fractions are replace by zeros complete.
Columns tx_astype and tx_to_datetime represents the expected return values. These results are identical to what the pandas function would return by the given values from 't'.
Environment overview (please complete the following information)
Technically we do not support varying formats for a column a timestamp strings. The %f specifier expects to see 6 digits.
Regardless, the code has logic to not read past the end of the string and so less digits are handled gracefully and more digits are ignored (though the 'Z' would not be processed correctly in this case).
The use case here uncovered a bug in this logic that is accessing invalid memory and I'll put a fix in a PR soon.
Describe the bug
Both
cudf.to_datetime()
andcudf.DataFrame.astype('datetime64[]')
produce wrong results for a given fraction of second by adding leading zeros instead of append zeros by non-aligned (to the number of digits) datetime strings.Also the result depends on the order of the given datetime strings in a dataframe column. If the columns first entry is not aligned to nine digits both functions come up with different results and
astype
returns all values in the column in the same format as the first one.Steps/Code to reproduce bug
Expected behavior
For the values of the
t
-column in rows 1,3,4,5 both methods returns fractions of seconds with leading zeros: "...:37.00123...". Correct result would be "...:37.123...". In row 2 the fractions are replace by zeros complete.Columns
tx_astype
andtx_to_datetime
represents the expected return values. These results are identical to what the pandas function would return by the given values from 't'.Environment overview (please complete the following information)
Environment details
CUDF from rapids docker image "22.06-cuda11.5-runtime-ubuntu18.04-py3.9"
Host environment: "5.4.0-121-generic #137~18.04.1-Ubuntu SMP"
GPU: NVIDIA 1080ti
Additional context
To minimise I/O, APIs reduce the number of fractions for ms, us, and ns often (i.e. alpaca Stock API and IoT APIs).
The observed behaviour results in time drifts up to one second backwards which compromise the order of a time-series.
The text was updated successfully, but these errors were encountered: