Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. #17179

Merged
merged 10 commits into from
Oct 30, 2024

Conversation

liurenjie1024
Copy link
Contributor

@liurenjie1024 liurenjie1024 commented Oct 25, 2024

Description

This is the first pr of a larger one to introduce a new serialization format. It make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. For more background, see NVIDIA/spark-rapids-jni#2496

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: liurenjie1024 <[email protected]>
@liurenjie1024 liurenjie1024 requested a review from a team as a code owner October 25, 2024 04:00
Copy link

copy-pr-bot bot commented Oct 25, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

java/src/main/java/ai/rapids/cudf/SlicedTable.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/Arms.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/Arms.java Outdated Show resolved Hide resolved
Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the utilities look interesting. But I don't see a lot of value in adding them to CUDF unless there is something in CUDF that is going to use them. I can see a lot of places that Arms might be used. But arguably it should not be a part of CUDF until we have code that uses it. We don't want CUDF to become guava or some other utilities library. We want CUDF to provide APIs for processing dataframe data on the GPU, and it is not clear how these APIs facilitate that.

*/
public static <R extends AutoCloseable> void close(Iterator<R> resources) {
Throwable t = null;
while (resources.hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off this is not closing the iterator. This is closing all of the values that the iterator holds. So at a minimum I would like it if we changed the name to closeAll, or something like that.

Second an Iterator has no guarantee that you are looping over actual values. It can lazily generate values or things like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, I have changed method name to closeAll.

/**
* This method safes closes the resources.
*/
public static <R extends AutoCloseable> void close(Iterable<R> resources) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the same problems as the Iterator API, but worse. An Iterable is typically a Collection or something like that. If that collection is AutoClosable then close is now ambiguous. Do we want to close the collection itself or do we want to close the things in that collection. This needs to be called closeAll at least. I would also prefer it if we switched this to Collection instead of Iterable, unless there is a specific use case where we want an Iterable that is not a Collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, renamed to closeAll and use Collection.

@@ -246,7 +246,7 @@ public final void copyFromHostBuffer(long destOffset, HostMemoryBuffer srcData,
* @param in input stream to copy bytes from
* @param byteLength number of bytes to copy
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docs so that it is clear that an EOFException is thrown if the InputStream does not include enough bytes to do the copy. Also it would really be nice if we could include a message with the EOFException to indicate what happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@liurenjie1024
Copy link
Contributor Author

But I don't see a lot of value in adding them to CUDF unless there is something in CUDF that is going to use them. I can see a lot of places that Arms might be used. But arguably it should not be a part of CUDF until we have code that uses it. We don't want CUDF to become guava or some other utilities library. We want CUDF to provide APIs for processing dataframe data on the GPU, and it is not clear how these APIs facilitate that.

Hi, @revans2 As I've mentioned in description, it's part of NVIDIA/spark-rapids-jni#2532 , which follows discussion with @jlowe to split that larger one into smaller pieces to be easier to review, please refer to NVIDIA/spark-rapids-jni#2532 (review) for more details.

@revans2
Copy link
Contributor

revans2 commented Oct 28, 2024

Hi, @revans2 As I've mentioned in description, it's part of NVIDIA/spark-rapids-jni#2532 , which follows discussion with @jlowe to split that larger one into smaller pieces to be easier to review, please refer to NVIDIA/spark-rapids-jni#2532 (review) for more details.

@liurenjie1024 I get that these utilities are here so that spark-rapids-jni can use them. My problem is that CUDF is a library for doing data frame processing on the GPU. It is not a library for providing general purpose java utilities. If the CUDF java code actually used these APIs, then I could justify it in my mind. But that is not the case. These are here only so that spark-rapids-jni can use them. Even if CUDF used these APIs I would argue that they should be package private unless they are exposed in the CUDF public APIs. This is because the point of CUDF is to provide data frame processing on the GPU. It is not here to provide general purpose java utilities.

If these are for spark-rapids-jni, then lets put them in spark-rapids-jni.

@liurenjie1024
Copy link
Contributor Author

If these are for spark-rapids-jni, then lets put them in spark-rapids-jni.

I get your point. I'll put these utilities nito spark-rapids-jni.

@liurenjie1024
Copy link
Contributor Author

Moved to NVIDIA/spark-rapids-jni#2542

@liurenjie1024 liurenjie1024 changed the title Add some utility methods. Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. Oct 29, 2024
@liurenjie1024 liurenjie1024 changed the title Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. Oct 29, 2024
@liurenjie1024 liurenjie1024 changed the title Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. Oct 29, 2024
@jlowe jlowe added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 29, 2024
@liurenjie1024
Copy link
Contributor Author

build

@liurenjie1024
Copy link
Contributor Author

/ok to test

2 similar comments
@liurenjie1024
Copy link
Contributor Author

/ok to test

@pxLi
Copy link
Member

pxLi commented Oct 30, 2024

/ok to test

@liurenjie1024
Copy link
Contributor Author

/ok to test

@sperlingxx
Copy link
Contributor

build

@sperlingxx
Copy link
Contributor

/ok to test

1 similar comment
@firestarman
Copy link
Contributor

/ok to test

@firestarman
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 6328ad6 into rapidsai:branch-24.12 Oct 30, 2024
86 checks passed
@liurenjie1024 liurenjie1024 deleted the ray/kudo-utils branch October 30, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants