-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nanosecond timestamps in parquet #10063
Support nanosecond timestamps in parquet #10063
Conversation
This PR has been labeled |
6af6cc7
to
2467672
Compare
98d241b
to
3113ede
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Only one blocking concern about statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake changes LGTM
@vuule requesting your review as well. |
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10063 +/- ##
================================================
- Coverage 86.13% 86.01% -0.13%
================================================
Files 139 139
Lines 22438 22435 -3
================================================
- Hits 19328 19298 -30
- Misses 3110 3137 +27
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, especially changes to compact protocol RW.
Couple of nitpicks and a suspected bug:
@gpucibot merge |
…0483) Includes `<cstddef>` for `ptrdiff_t` in `parquet/compact_protocol_reader.hpp`. Compilation fails on GCC 11 without this include. Targeting 22.04 since this was broken yesterday in #10063. Error output: ``` cudf/cpp/src/io/parquet/compact_protocol_reader.hpp:51:17: error: 'ptrdiff_t' does not name a type 51 | [[nodiscard]] ptrdiff_t bytecount() const noexcept { return m_cur - m_base; } | cudf/cpp/src/io/parquet/compact_protocol_reader.hpp:22:1: note: 'ptrdiff_t' is defined in header '<cstddef>'; did you forget to '#include <cstddef>'? ``` Also includes `<optional>` in `cpp/include/cudf/table/experimental/row_operators.cuh`, which was broken by #10164. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Conor Hoekstra (https://github.com/codereport) - Yunsong Wang (https://github.com/PointKernel) URL: #10483
Closes #9393
This PR was intended to support nanoseconds for both duration and timestamp types in parquet. It introduces
LogicalType
-handling on both reader and writer sides. This PR also includes code cleanups like movingCompactProtocolReader
to its own file. Finally, nanosecond durations remain unchanged since it's not fully supported bypyarrow
i.e. nanosecond durations are truncated to microseconds (see here).