Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor contiguous_split API into contiguous_split.hpp #13186

Merged

Conversation

abellina
Copy link
Contributor

@abellina abellina commented Apr 20, 2023

This PR moves contiguous_split specific APIs (pack/unpack/metadata/contiguous_split) out of copying.hpp and creates a new header contiguous_split.hpp.

I have built the cpp side, created doxygen docs, and built the python side and ran pack_test, but I could use more eyes there.

I've marked this as breaking because APIs are moving from copying.hpp to contiguous_split.hpp

@nvdbaranec fyi

@github-actions github-actions bot added Java Affects Java cuDF API. Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Apr 20, 2023
@abellina abellina added improvement Improvement / enhancement to an existing function breaking Breaking change labels Apr 20, 2023
@abellina abellina mentioned this pull request Apr 20, 2023
4 tasks
@github-actions github-actions bot added the conda label Apr 20, 2023
@abellina abellina marked this pull request as ready for review April 20, 2023 17:54
@abellina abellina requested review from a team as code owners April 20, 2023 17:54
@abellina
Copy link
Contributor Author

Java tests passed, so I took this out of draft.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java approval

@ttnghia
Copy link
Contributor

ttnghia commented Apr 20, 2023

Python builds fail because of unrelated (already known) reason.

Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving ops-codeowner file changes

@abellina
Copy link
Contributor Author

@ttnghia Thanks for the reviews so far. I have addressed the things I could address, and commented on those I couldn't and am looking for any follow on steps here.

@shwina FYI, I had to change more of the python code and will need another check but I would wait until we settle on the c++ types first, since those are all at the interface level.

: metadata_(std::move(md)), gpu_data(std::move(gd))
{
}

std::unique_ptr<metadata> metadata_; ///< Host-side metadata buffer
std::unique_ptr<rmm::device_buffer> gpu_data; ///< Device-side data buffer
std::unique_ptr<std::vector<uint8_t>> metadata_; ///< Host-side metadata buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unique_ptr<std::vector<uint8_t>> metadata_; ///< Host-side metadata buffer
std::unique_ptr<std::vector<uint8_t>> metadata; ///< Host-side metadata buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good eye.. will fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 43 to 47
packed_columns()
: metadata_(std::make_unique<std::vector<uint8_t>>()),
gpu_data(std::make_unique<rmm::device_buffer>())
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
packed_columns()
: metadata_(std::make_unique<std::vector<uint8_t>>()),
gpu_data(std::make_unique<rmm::device_buffer>())
{
}
packed_columns() = default;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is code depending on the default constructor to instantiate the unique_ptrs (since we need to keep these for now). If these were members directly of packed_columns I am with you, but as it is I can't remove.

@abellina
Copy link
Contributor Author

@shwina I think this is ready for another python pass if you have a chance. I had to change the cython code again, and I don't trust myself.

Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python changes LGTM

@abellina
Copy link
Contributor Author

abellina commented Apr 26, 2023

I am noticing a JNI gpu leak with this patch applied if I run with OOM injection logic we have added. I am working on identifying what is going on here.

@abellina
Copy link
Contributor Author

I triaged the JNI leak to: #13225

Essentially the cuDF version I tested without leaks vs. the new one in this PR missed some code that was added that causes creation of column vectors to be unsafe due to exceptions. That is unrelated to this change.

@abellina
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 588643f into rapidsai:branch-23.06 Apr 26, 2023
@abellina abellina deleted the refactor_contig_split_headers branch April 26, 2023 16:01
@abellina
Copy link
Contributor Author

Thanks all for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants