Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make timezone table independent from ORC #12805

Merged
merged 48 commits into from
Mar 22, 2023

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Feb 18, 2023

Description

Modifies the creation of timezone transition tables to make it usable outside of the ORC context:

  • Remove the ORC epoch offset from the transition table.
  • Use timestamp_s and duration_s instead of integral values in the table.
  • Return table as cudf::table instead of the special timezone_table struct.
  • Move the implementation/header out of cudf::io::orc.
  • Split the header into C++ and CUDA parts. C++ API has public and detail counterparts.

Other:
Adapt ORC reader to changes to the timezone transition table.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added feature request New feature or request cuIO cuIO issue labels Feb 18, 2023
@vuule vuule self-assigned this Feb 18, 2023
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 18, 2023
@vuule vuule added the non-breaking Non-breaking change label Feb 18, 2023
@ttnghia
Copy link
Contributor

ttnghia commented Feb 23, 2023

I see that the timezone stuff is in the cudf/orc/ folder? I believe that it is very independent from ORC file format, so please consider refactoring it out into somewhere else not in cudf/orc/, like cudf/utilities/. This would be useful for implementing other timezone-aware operations.

@vuule
Copy link
Contributor Author

vuule commented Feb 28, 2023

I see that the timezone stuff is in the cudf/orc/ folder? I believe that it is very independent from ORC file format, so please consider refactoring it out into somewhere else not in cudf/orc/, like cudf/utilities/. This would be useful for implementing other timezone-aware operations.

This PR was a first draft just to unblock @shwina on the Python side. I've since split the headers to separate C++ and CUDA code and moved them to /include/detail. The namespace was also changed to cudf::detail.

@github-actions github-actions bot added the conda label Feb 28, 2023
@vuule vuule marked this pull request as ready for review March 1, 2023 01:08
Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from Cython side!

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Thank you for addressing my questions.

cpp/src/io/orc/timezone.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. One minor question about the far, far future.

cpp/include/cudf/timezone.hpp Show resolved Hide resolved
@github-actions github-actions bot added the CMake CMake build issue label Mar 21, 2023
@vuule vuule requested a review from bdice March 21, 2023 20:40
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo fix. Otherwise LGTM!

cpp/include/cudf/timezone.hpp Outdated Show resolved Hide resolved
Co-authored-by: Bradley Dice <[email protected]>
@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Mar 21, 2023
@vuule
Copy link
Contributor Author

vuule commented Mar 22, 2023

/merge

@rapids-bot rapids-bot bot merged commit bf18cea into rapidsai:branch-23.04 Mar 22, 2023
@vuule vuule deleted the timezone-refactor branch March 22, 2023 05:16
rapids-bot bot pushed a commit that referenced this pull request Apr 17, 2023
This PR adds bindings to the TZiF reader that was added in the libcudf API in #12805.

No tests are being added as these bindings are just for internal-use. In follow-up PRs, I will add a timezone-aware datetime type and timezone-aware operations to the public API, along with tests for those operations.

The bindings can be used as follows:

```python
>>> transition_times, offsets = make_timezone_transition_table("/usr/share/zoneinfo", "America/New_York")
                                            
>>> transition_times
<cudf.core.column.datetime.DatetimeColumn object at 0x7f95cd6ac840>
[
  1883-11-18 17:00:00,
  1883-11-18 17:00:00,
  1918-03-31 07:00:00,
  1918-10-27 06:00:00,
  1919-03-30 07:00:00,
  1919-10-26 06:00:00,
  1920-03-28 07:00:00,
  1920-10-31 06:00:00,
  1921-04-24 07:00:00,
  1921-09-25 06:00:00,
  ...
  2365-03-14 07:00:00,
  2365-11-07 06:00:00,
  2366-03-13 07:00:00,
  2366-11-06 06:00:00,
  2367-03-12 07:00:00,
  2367-11-05 06:00:00,
  2368-03-10 07:00:00,
  2368-11-03 06:00:00,
  2369-03-09 07:00:00,
  2369-11-02 06:00:00
]
dtype: datetime64[s]

>>> offsets
<cudf.core.column.timedelta.TimeDeltaColumn object at 0x7f94e69bad40>
[
  -18000,
  -18000,
  -14400,
  -18000,
  -14400,
  -18000,
  -14400,
  -18000,
  -14400,
  -18000,
  ...
  -14400,
  -18000,
  -14400,
  -18000,
  -14400,
  -18000,
  -14400,
  -18000,
  -14400,
  -18000
]
dtype: timedelta64[s]
```

Authors:
  - Ashwin Srinath (https://github.com/shwina)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #12826
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge CMake CMake build issue cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants