Skip to content

Commit

Permalink
Make table compatible with text output
Browse files Browse the repository at this point in the history
  • Loading branch information
vyasr committed Dec 16, 2023
1 parent 15942d4 commit 0663d2e
Show file tree
Hide file tree
Showing 3 changed files with 123 additions and 98 deletions.
5 changes: 4 additions & 1 deletion cpp/include/cudf/io/json.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,10 @@ enum class json_recovery_mode_t {
*
* Parameters in PANDAS that are unavailable or in cudf:
*
*
* +----------------------+--------------------------------------------------+
* | Name | Description |
* | -------------------- | ------------------------------------------------ |
* +======================+==================================================+
* | `orient` | currently fixed-format |
* | `typ` | data is always returned as a cudf::table |
* | `convert_axes` | use column functions for axes operations instead |
Expand All @@ -84,6 +86,7 @@ enum class json_recovery_mode_t {
* | `date_unit` | only millisecond units are supported |
* | `encoding` | only ASCII-encoded data is supported |
* | `chunksize` | use `byte_range_xxx` for chunking instead |
* +----------------------+--------------------------------------------------+
*/
class json_reader_options {
source_info _source;
Expand Down
134 changes: 73 additions & 61 deletions cpp/include/cudf/strings/convert/convert_datetime.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,30 +31,33 @@ namespace strings {
* @file
*/

// clang-format off
/**
* @brief Returns a new timestamp column converting a strings column into
* timestamps using the provided format pattern.
*
* The format pattern can include the following specifiers: "%Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z"
*
* | Specifier | Description |
* | :-------: | ----------- |
* | \%d | Day of the month: 01-31 |
* | \%m | Month of the year: 01-12 |
* | \%y | Year without century: 00-99. [0,68] maps to [2000,2068] and [69,99] maps to [1969,1999] |
* | \%Y | Year with century: 0001-9999 |
* | \%H | 24-hour of the day: 00-23 |
* | \%I | 12-hour of the day: 01-12 |
* | \%M | Minute of the hour: 00-59 |
* | \%S | Second of the minute: 00-59. Leap second is not supported. |
* | \%f | 6-digit microsecond: 000000-999999 |
* | \%z | UTC offset with format ±HHMM Example +0500 |
* | \%j | Day of the year: 001-366 |
* | \%p | Only 'AM', 'PM' or 'am', 'pm' are recognized |
* | \%W | Week of the year with Monday as the first day of the week: 00-53 |
* | \%w | Day of week: 0-6 = Sunday-Saturday |
* | \%U | Week of the year with Sunday as the first day of the week: 00-53 |
* | \%u | Day of week: 1-7 = Monday-Sunday |
* +-----------+-----------------------------------------------------------------------------------------+
* | Specifier | Description |
* +===========+=========================================================================================+
* | ``%d`` | Day of the month: 01-31 |
* | ``%m`` | Month of the year: 01-12 |
* | ``%y`` | Year without century: 00-99. [0,68] maps to [2000,2068] and [69,99] maps to [1969,1999] |
* | ``%Y`` | Year with century: 0001-9999 |
* | ``%H`` | 24-hour of the day: 00-23 |
* | ``%I`` | 12-hour of the day: 01-12 |
* | ``%M`` | Minute of the hour: 00-59 |
* | ``%S`` | Second of the minute: 00-59. Leap second is not supported. |
* | ``%f`` | 6-digit microsecond: 000000-999999 |
* | ``%z`` | UTC offset with format ±HHMM Example +0500 |
* | ``%j`` | Day of the year: 001-366 |
* | ``%p`` | Only 'AM', 'PM' or 'am', 'pm' are recognized |
* | ``%W`` | Week of the year with Monday as the first day of the week: 00-53 |
* | ``%w`` | Day of week: 0-6 = Sunday-Saturday |
* | ``%U`` | Week of the year with Sunday as the first day of the week: 00-53 |
* | ``%u`` | Day of week: 1-7 = Monday-Sunday |
* +-----------+-----------------------------------------------------------------------------------------+
*
* Other specifiers are not currently supported.
*
Expand Down Expand Up @@ -84,37 +87,41 @@ namespace strings {
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New datetime column
*/
// clang-format on
std::unique_ptr<column> to_timestamps(
strings_column_view const& input,
data_type timestamp_type,
std::string_view format,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

// clang-format off
/**
* @brief Verifies the given strings column can be parsed to timestamps using the provided format
* pattern.
*
* The format pattern can include the following specifiers: "%Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z"
*
* | Specifier | Description |
* | :-------: | ----------- |
* | \%d | Day of the month: 01-31 |
* | \%m | Month of the year: 01-12 |
* | \%y | Year without century: 00-99. [0,68] maps to [2000,2068] and [69,99] maps to [1969,1999] |
* | \%Y | Year with century: 0001-9999 |
* | \%H | 24-hour of the day: 00-23 |
* | \%I | 12-hour of the day: 01-12 |
* | \%M | Minute of the hour: 00-59|
* | \%S | Second of the minute: 00-59. Leap second is not supported. |
* | \%f | 6-digit microsecond: 000000-999999 |
* | \%z | UTC offset with format ±HHMM Example +0500 |
* | \%j | Day of the year: 001-366 |
* | \%p | Only 'AM', 'PM' or 'am', 'pm' are recognized |
* | \%W | Week of the year with Monday as the first day of the week: 00-53 |
* | \%w | Day of week: 0-6 = Sunday-Saturday |
* | \%U | Week of the year with Sunday as the first day of the week: 00-53 |
* | \%u | Day of week: 1-7 = Monday-Sunday |
* +-----------+-----------------------------------------------------------------------------------------+
* | Specifier | Description |
* +===========+=========================================================================================+
* | ``%d`` | Day of the month: 01-31 |
* | ``%m`` | Month of the year: 01-12 |
* | ``%y`` | Year without century: 00-99. [0,68] maps to [2000,2068] and [69,99] maps to [1969,1999] |
* | ``%Y`` | Year with century: 0001-9999 |
* | ``%H`` | 24-hour of the day: 00-23 |
* | ``%I`` | 12-hour of the day: 01-12 |
* | ``%M`` | Minute of the hour: 00-59 |
* | ``%S`` | Second of the minute: 00-59. Leap second is not supported. |
* | ``%f`` | 6-digit microsecond: 000000-999999 |
* | ``%z`` | UTC offset with format ±HHMM Example +0500 |
* | ``%j`` | Day of the year: 001-366 |
* | ``%p`` | Only 'AM', 'PM' or 'am', 'pm' are recognized |
* | ``%W`` | Week of the year with Monday as the first day of the week: 00-53 |
* | ``%w`` | Day of week: 0-6 = Sunday-Saturday |
* | ``%U`` | Week of the year with Sunday as the first day of the week: 00-53 |
* | ``%u`` | Day of week: 1-7 = Monday-Sunday |
* +-----------+-----------------------------------------------------------------------------------------+
*
* Other specifiers are not currently supported.
* The "%f" supports a precision value to read the numeric digits. Specify the
Expand All @@ -132,43 +139,47 @@ std::unique_ptr<column> to_timestamps(
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New BOOL8 column
*/
// clang-format on
std::unique_ptr<column> is_timestamp(
strings_column_view const& input,
std::string_view format,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

// clang-format off
/**
* @brief Returns a new strings column converting a timestamp column into
* strings using the provided format pattern.
*
* The format pattern can include the following specifiers: "%Y,%y,%m,%d,%H,%I,%p,%M,%S,%f,%z,%Z"
*
* | Specifier | Description |
* | :-------: | ----------- |
* | \%d | Day of the month: 01-31 |
* | \%m | Month of the year: 01-12 |
* | \%y | Year without century: 00-99 |
* | \%Y | Year with century: 0001-9999 |
* | \%H | 24-hour of the day: 00-23 |
* | \%I | 12-hour of the day: 01-12 |
* | \%M | Minute of the hour: 00-59|
* | \%S | Second of the minute: 00-59 |
* | \%f | 6-digit microsecond: 000000-999999 |
* | \%z | Always outputs "+0000" |
* | \%Z | Always outputs "UTC" |
* | \%j | Day of the year: 001-366 |
* | \%u | ISO weekday where Monday is 1 and Sunday is 7 |
* | \%w | Weekday where Sunday is 0 and Saturday is 6 |
* | \%U | Week of the year with Sunday as the first day: 00-53 |
* | \%W | Week of the year with Monday as the first day: 00-53 |
* | \%V | Week of the year per ISO-8601 format: 01-53 |
* | \%G | Year based on the ISO-8601 weeks: 0000-9999 |
* | \%p | AM/PM from `timestamp_names::am_str/pm_str` |
* | \%a | Weekday abbreviation from the `names` parameter |
* | \%A | Weekday from the `names` parameter |
* | \%b | Month name abbreviation from the `names` parameter |
* | \%B | Month name from the `names` parameter |
* +-----------+-----------------------------------------------------------------------------------------+
* | Specifier | Description |
* +===========+=========================================================================================+
* | ``%d`` | Day of the month: 01-31 |
* | ``%m`` | Month of the year: 01-12 |
* | ``%y`` | Year without century: 00-99. [0,68] maps to [2000,2068] and [69,99] maps to [1969,1999] |
* | ``%Y`` | Year with century: 0001-9999 |
* | ``%H`` | 24-hour of the day: 00-23 |
* | ``%I`` | 12-hour of the day: 01-12 |
* | ``%M`` | Minute of the hour: 00-59 |
* | ``%S`` | Second of the minute: 00-59. Leap second is not supported. |
* | ``%f`` | 6-digit microsecond: 000000-999999 |
* | ``%z`` | Always outputs "+0000" |
* | ``%Z`` | Always outputs "UTC" |
* | ``%j`` | Day of the year: 001-366 |
* | ``%u`` | ISO weekday where Monday is 1 and Sunday is 7 |
* | ``%w`` | Weekday where Sunday is 0 and Saturday is 6 |
* | ``%U`` | Week of the year with Sunday as the first day: 00-53 |
* | ``%W`` | Week of the year with Monday as the first day: 00-53 |
* | ``%V`` | Week of the year per ISO-8601 format: 01-53 |
* | ``%G`` | Year based on the ISO-8601 weeks: 0000-9999 |
* | ``%p`` | AM/PM from `timestamp_names::am_str/pm_str` |
* | ``%a`` | Weekday abbreviation from the `names` parameter |
* | ``%A`` | Weekday from the `names` parameter |
* | ``%b`` | Month name abbreviation from the `names` parameter |
* | ``%B`` | Month name from the `names` parameter |
* +-----------+-----------------------------------------------------------------------------------------+
*
* Additional descriptions can be found here:
* https://en.cppreference.com/w/cpp/chrono/system_clock/formatter
Expand Down Expand Up @@ -244,6 +255,7 @@ std::unique_ptr<column> is_timestamp(
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New strings column with formatted timestamps
*/
// clang-format on
std::unique_ptr<column> from_timestamps(
column_view const& timestamps,
std::string_view format = "%Y-%m-%dT%H:%M:%SZ",
Expand Down
Loading

0 comments on commit 0663d2e

Please sign in to comment.