Skip to content

Commit

Permalink
Fix to_timestamps to support Z for %z format specifier (#10617)
Browse files Browse the repository at this point in the history
Closes #10609 

This adds support for 'Z' in a timestamp string for the `%z` specifier. Normally, the `%z` specifier expects an hour-minute format like `+HHMM` but strptime and other libraries also accept a single 'Z' here. The following two strings should result in the same timestamp value: `"2022-04-07 09:15:00Z" and "2022-04-07 09:15:00+0000"`

The `cudf::strings::is_timestamp` and `cudf::strings::to_timestamps` have been updated to support this behavior. A gtest was updated to include this as a testcase.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #10617
  • Loading branch information
davidwendt authored Apr 11, 2022
1 parent 10b26b2 commit c8ffece
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 8 deletions.
16 changes: 10 additions & 6 deletions cpp/src/strings/convert/convert_datetime.cu
Original file line number Diff line number Diff line change
Expand Up @@ -298,12 +298,14 @@ struct parse_datetime {
}
case 'z': {
// 'z' format is +hh:mm -- single sign char and 2 chars each for hour and minute
auto const sign = *ptr == '-' ? 1 : -1;
auto const [hh, lh] = parse_int(ptr + 1, 2);
auto const [mm, lm] = parse_int(ptr + 3, 2);
// revert timezone back to UTC
timeparts.tz_minutes = sign * ((hh * 60) + mm);
bytes_read -= lh + lm;
if (item.length == 5) {
auto const sign = *ptr == '-' ? 1 : -1;
auto const [hh, lh] = parse_int(ptr + 1, 2);
auto const [mm, lm] = parse_int(ptr + 3, 2);
// revert timezone back to UTC
timeparts.tz_minutes = sign * ((hh * 60) + mm);
bytes_read -= lh + lm;
}
break;
}
case 'Z': break; // skip
Expand Down Expand Up @@ -574,6 +576,8 @@ struct check_datetime_format {
auto const cvm = check_value(ptr + 3, 2, 0, 59);
result = (*ptr == '-' || *ptr == '+') && cvh.first && cvm.first;
bytes_read -= cvh.second + cvm.second;
} else if (item.length == 1) {
result = *ptr == 'Z';
}
break;
}
Expand Down
5 changes: 3 additions & 2 deletions cpp/tests/strings/datetime_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -144,16 +144,17 @@ TEST_F(StringsDatetimeTest, ToTimestampTimezone)
"2019-07-17 02:34:56-0300",
"2019-03-20 12:34:56+1030",
"2020-02-29 12:00:00-0500",
"2022-04-07 09:15:00Z",
"1938-11-23 10:28:49+0700"};
auto strings_view = cudf::strings_column_view(strings);
auto results = cudf::strings::to_timestamps(
strings_view, cudf::data_type{cudf::type_id::TIMESTAMP_SECONDS}, "%Y-%m-%d %H:%M:%S%z");
cudf::test::fixed_width_column_wrapper<cudf::timestamp_s, cudf::timestamp_s::rep> expected{
131243025, 1563341696, 1553047496, 1582995600, -981664271};
131243025, 1563341696, 1553047496, 1582995600, 1649322900, -981664271};
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, expected);

results = cudf::strings::is_timestamp(strings_view, "%Y-%m-%d %H:%M:%S%z");
cudf::test::fixed_width_column_wrapper<bool> is_expected({1, 1, 1, 1, 1});
cudf::test::fixed_width_column_wrapper<bool> is_expected({1, 1, 1, 1, 1, 1});
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*results, is_expected);
}

Expand Down

0 comments on commit c8ffece

Please sign in to comment.