-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fail to convert the data to arrow format when there is a child column of struct type. #7570
Comments
It the below a reasonable fix (It works)? Since I see there is a simliar check here https://github.com/rapidsai/cudf/blob/branch-0.19/cpp/src/interop/to_arrow.cpp#L248 for list_view . diff --git a/cpp/src/interop/to_arrow.cpp b/cpp/src/interop/to_arrow.cpp
index 7daffc1a3c..bf269ebfe1 100644
--- a/cpp/src/interop/to_arrow.cpp
+++ b/cpp/src/interop/to_arrow.cpp
@@ -202,20 +202,22 @@ std::shared_ptr<arrow::Array> dispatch_to_arrow::operator()<cudf::struct_view>(
arrow::MemoryPool* ar_mr,
rmm::cuda_stream_view stream)
{
- CUDF_EXPECTS(metadata.children_meta.size() == static_cast<std::size_t>(input.num_children()),
+ auto children_meta =
+ metadata.children_meta.empty() ? std::vector<column_metadata>(input.num_children()) : metadata.children_meta;
+ CUDF_EXPECTS(children_meta.size() == static_cast<std::size_t>(input.num_children()),
"Number of field names and number of children doesn't match\n");
std::unique_ptr<column> tmp_column = nullptr;
if (input.offset() != 0) { tmp_column = std::make_unique<cudf::column>(input); }
column_view input_view = (tmp_column != nullptr) ? tmp_column->view() : input;
- auto child_arrays = fetch_child_array(input_view, metadata.children_meta, ar_mr, stream);
+ auto child_arrays = fetch_child_array(input_view, children_meta, ar_mr, stream);
auto mask = fetch_mask_buffer(input_view, ar_mr, stream);
std::vector<std::shared_ptr<arrow::Field>> fields;
std::transform(child_arrays.cbegin(),
child_arrays.cend(),
- metadata.children_meta.cbegin(),
+ children_meta.cbegin(),
std::back_inserter(fields),
[](auto const array, auto const meta) {
|
@rgsl888prabhu I believe you implemented this originally. Could you take a look? |
@firestarman When you try to read the struct column written with the patch you have provided, what are the names of children/struct members. Would it be empty ? |
@rgsl888prabhu Yes, i believe it is empty here. Are the names necessary for this operation ? Or could it be empty ? If names are required, we may need to fix this from JNI side. |
I add the support in JNI to carry the column names onto native by this change, however seems native @rgsl888prabhu Could you help confirm whether |
cudf/cpp/tests/interop/to_arrow_test.cpp Line 266 in 5fea6ad
For structs, it is better to have name for members which is required for accessing the element, else it would be same as list of list. What error are you getting currently with the changes you mentioned ? |
Thanks, i made the PR #7598 to close this issue. |
…#7598) This PR is to add the support of building the structure of column metadata from the flattened column names according to the table schema. Since the children column metadata is required when converting cudf tables to arrow tables. Also updating the related unit tests. closes #7570 Signed-off-by: Firestarman <[email protected]> Authors: - Liangcai Li (@firestarman) Approvers: - Jason Lowe (@jlowe) URL: #7598
Describe the bug
The cudf is complaining the error as below when writing a child column of struct type to an arrow table.
Steps/Code to reproduce bug
Apply this patch and run the test
testArrowIPCWriteToBufferChunked
.Expected behavior
The test applied the patch above should pass.
The text was updated successfully, but these errors were encountered: