-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java API change for supporting structs #7730
Java API change for supporting structs #7730
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level idea seems OK to me as long as we flatten it when calling into JNI.
I still don't know a good way to get rid of the old withColumnNames API on ParquetWriterOptions.
We'd either need to disconnect it from the old one or have a compatibility mode where it translates the old values to new values if they were set via the old API, and of course the old API won't support structs. Setting options via both APIs would be an error. I'm OK if we want to ditch the old one, but it would be a breaking change.
java/src/main/java/ai/rapids/cudf/ParquetColumnWriterOptions.java
Outdated
Show resolved
Hide resolved
A few things about the API.
The current C++ API lets anything go. i.e. Setting precision for non-decimal types and adding children to non-nested columns. Perhaps we could have something like.
One of the best ways I have found to do API design is to use the API. Lets create a schema for a table that has a list of timestamps that should be output as int96.
Is that better than
? The second one is much more verbose, but I think it does a better job of matching up the
|
68d0d30
to
4907f2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should be rushing this. I know you have deadlines but the current API is not clean and I'm not that excited about pushing one change at the end of a release knowing that we are likely going to have to make breaking changes to it in the next release.
java/src/main/java/ai/rapids/cudf/ParquetColumnWriterOptions.java
Outdated
Show resolved
Hide resolved
java/src/main/java/ai/rapids/cudf/ParquetColumnWriterOptions.java
Outdated
Show resolved
Hide resolved
java/src/main/java/ai/rapids/cudf/ParquetColumnWriterOptions.java
Outdated
Show resolved
Hide resolved
java/src/main/java/ai/rapids/cudf/ParquetTimestampColumnWriterOptions.java
Outdated
Show resolved
Hide resolved
@devavret for cpp change |
@revans2 do you have any more questions? |
java/src/main/java/ai/rapids/cudf/ParquetColumnWriterOptions.java
Outdated
Show resolved
Hide resolved
java/src/main/java/ai/rapids/cudf/ParquetWriterOptionsBuilder.java
Outdated
Show resolved
Hide resolved
java/src/main/java/ai/rapids/cudf/ParquetWriterOptionsBuilder.java
Outdated
Show resolved
Hide resolved
java/src/main/java/ai/rapids/cudf/ParquetColumnWriterOptions.java
Outdated
Show resolved
Hide resolved
@revans2 I have addressed your concerns. PTAL |
build |
rerun tests |
} | ||
|
||
/** | ||
* Set column name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to say that it adds a child column with the given name. and that should apply to all of the with methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not done
|
||
/** | ||
* Set column name | ||
* @param name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we describe what the name is marking it as a param does nothing. This applies to all of the other java doc comments for params that are empty.
java/src/main/java/ai/rapids/cudf/ParquetWriterOptionsBuilder.java
Outdated
Show resolved
Hide resolved
java/src/main/java/ai/rapids/cudf/ParquetColumnWriterOptions.java
Outdated
Show resolved
Hide resolved
return new ListBuilder(name, true); | ||
} | ||
|
||
public static ListBuilder listBuilder(String name, boolean isNullable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need java docs here. They really need to describe how a list builder is different from a StructBuilder and that the name of the child is ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Still don't see it documented anywhere how a list builder works or that the name for it's children is ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just done to nits in the comments.
rerun tests |
@gpucibot merge |
This reverts commit 3327f7b.
This reverts commit 3327f7b. We have to revert this because the dependent project is broken and my system is in a broken state. Authors: - Raza Jafri (https://github.com/razajafri) Approvers: - Rong Ou (https://github.com/rongou) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - Robert (Bobby) Evans (https://github.com/revans2) URL: #7987
This is a very rough draft PR to tie down the interface change to support Structs for Parquet writer. Once we have the interface down, it's just a matter of coding in the rest of the pieces.
Here is how I envision it to be used by the end-user.
I still don't know a good way to get rid of the old
withColumnNames
API onParquetWriterOptions
. We can't remove it as ORC is still using it. One option could be to just rip out theWriterOptions
from theParquetWriterOptions
hierarchy.