Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize ORC reader into multiple files and perform some small fixes to cuIO code #14665

Merged
merged 47 commits into from
Jan 17, 2024

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Dec 21, 2023

This refactors the ORC reader, moving ORC code around to facilitate the upcoming support for chunked reading of the input files.

No new functionality/implementation is added in this PR. Only the existing code is moving around, except that some small issues of the related ORC/cuIO code are also fixed.

@ttnghia ttnghia added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change Reliability labels Dec 21, 2023
@ttnghia ttnghia self-assigned this Dec 21, 2023
@github-actions github-actions bot added the CMake CMake build issue label Dec 21, 2023
ttnghia and others added 16 commits December 30, 2023 12:50
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Co-authored-by: Vukasin Milovanovic <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that most of the changes are just from moving code. If there are any significant blocks of new code, please mark them with comments before requesting further review.

cpp/src/io/orc/aggregate_orc_metadata.cpp Show resolved Hide resolved
cpp/src/io/orc/reader_impl.hpp Outdated Show resolved Hide resolved
Comment on lines +80 to +83
case orc::STRING:
case orc::BINARY:
case orc::VARCHAR:
case orc::CHAR:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add explicit [[fallthrough]] to all fallthrough cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only use [[fallthrough]] for non-empty cases?
I previously overused it, at least based on
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-break
https://en.cppreference.com/w/cpp/language/attributes/fallthrough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember that @vuule has some argument against using [[fallthrough]]?

Copy link
Contributor

@bdice bdice Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how [[fallthrough]] might feel overused from that example. I'm usually in favor of using it but I will defer to you on choosing whether to use it here.

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
* Copyright (c) 2021-2024, NVIDIA CORPORATION.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has some small fixes with compiler markup.

@@ -0,0 +1,42 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new files contain extracted code from reader_impl.hpp/cu.

Signed-off-by: Nghia Truong <[email protected]>
@harrism
Copy link
Member

harrism commented Jan 16, 2024

@ttnghia I think this needs a more descriptive PR title.

@ttnghia ttnghia changed the title Refactor ORC reader Refactor ORC reader and some small fixes to cuIO code Jan 16, 2024
@ttnghia
Copy link
Contributor Author

ttnghia commented Jan 16, 2024

@ttnghia I think this needs a more descriptive PR title.

I've changed the title a little bit. Please feel free to recommend any better one.

@harrism
Copy link
Member

harrism commented Jan 16, 2024

@ttnghia I think this needs a more descriptive PR title.

I've changed the title a little bit. Please feel free to recommend any better one.

I guess I meant the "refactor ORC reader" part is a bit vague and is likely to occur again.

@ttnghia ttnghia changed the title Refactor ORC reader and some small fixes to cuIO code Rewrite ORC reader and perform some small fixes to cuIO code Jan 16, 2024
@ttnghia ttnghia changed the title Rewrite ORC reader and perform some small fixes to cuIO code Reorganize ORC reader into multiple files and perform some small fixes to cuIO code Jan 17, 2024
@ttnghia ttnghia requested a review from bdice January 17, 2024 02:02
@ttnghia
Copy link
Contributor Author

ttnghia commented Jan 17, 2024

/merge

@rapids-bot rapids-bot bot merged commit 42e946f into rapidsai:branch-24.02 Jan 17, 2024
67 checks passed
@ttnghia ttnghia deleted the refactor_orc_reader branch January 17, 2024 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants