-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JNI: Pass names of children struct columns to native Arrow IPC writer [skip ci] #7598
JNI: Pass names of children struct columns to native Arrow IPC writer [skip ci] #7598
Conversation
Pass the names of children struct columns to the naitve for arrow IPC writer, which is required to build column_metadata. Also add the related unit tests. Signed-off-by: Firestarman <[email protected]>
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7598 +/- ##
===============================================
+ Coverage 81.86% 82.38% +0.52%
===============================================
Files 101 101
Lines 16884 17350 +466
===============================================
+ Hits 13822 14294 +472
+ Misses 3062 3056 -6
Continue to review full report at Codecov.
|
This is required by native. Signed-off-by: Firestarman <[email protected]>
rerun tests |
Do this to avoid callback callback into the JVM. Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary to refactor this to the flattened name approach, but there are some resource leaks that are possible with the approach used here that are not with the flattened approach and should be fixed.
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
I updated to align with the flattened name approach, and it is a good suggestion, because it not only reduces the code change, but also hides some column metadata details (e.g. stub meta for list type) from Java. |
rerun tests |
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating @firestarman, this did get a lot cleaner overall. The main thing I see missing now is that the behavior of column names for nested types isn't documented in the Java APIs anywhere. If we're going the flattened names route for all writers then this should be documented on WriterBuilder
, but if the flattening logic is only going to apply to Arrow IPC then its builder should override the withColumnNames
method if only to provide documentation on the expected behavior.
cc: @revans2 for visibility
Signed-off-by: Firestarman <[email protected]>
I think it is only for Arrow IPC now, so I updated its builder to override the two |
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Firestarman <[email protected]>
@gpucibot merge |
@gpucibot merge |
Thanks Jason, learnt a lot |
This PR is to support running scalar pandas UDF with array type. Add array type signature for related expressions and plans. Flatten the names of nested struct columns from schema, which is also required by the cudf Arrow IPC writer. This PR depends on rapidsai/cudf#7598 closes #1912 Signed-off-by: Firestarman <[email protected]>
This PR is to support running scalar pandas UDF with array type. Add array type signature for related expressions and plans. Flatten the names of nested struct columns from schema, which is also required by the cudf Arrow IPC writer. This PR depends on rapidsai/cudf#7598 closes NVIDIA#1912 Signed-off-by: Firestarman <[email protected]>
This PR is to support running scalar pandas UDF with array type. Add array type signature for related expressions and plans. Flatten the names of nested struct columns from schema, which is also required by the cudf Arrow IPC writer. This PR depends on rapidsai/cudf#7598 closes NVIDIA#1912 Signed-off-by: Firestarman <[email protected]>
This PR is to add the support of building the structure of column metadata from the flattened column names according to the table schema.
Since the children column metadata is required when converting cudf tables to arrow tables.
Also updating the related unit tests.
closes #7570
Signed-off-by: Firestarman [email protected]