Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix generate_nested_dictionary_case integration test failure #1636

Merged
merged 5 commits into from
May 8, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented May 1, 2022

Which issue does this PR close?

Closes #1635.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels May 1, 2022
@viirya viirya changed the title Fix ipc nested dict Fix generate_nested_dictionary_case integration test failure for Rust cases May 1, 2022
}
}
// Add (possibly multiple) array refs to the dictionaries array.
dictionaries_by_field.insert(id, dictionary_values.clone());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main difference here. In create_array, we use dictionaries[node_index] to access dictionary array by node_index. But dictionary arrays are indexed by dict id, not node index.

@@ -640,6 +645,7 @@ fn dictionary_array_from_json(
dict_key: &DataType,
dict_value: &DataType,
dictionary: &ArrowJsonDictionaryBatch,
dictionaries: Option<&HashMap<i64, ArrowJsonDictionaryBatch>>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing in the map of dictionaries so nested dictionary can be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the following error comes out when running archery --debug integration --run-flight --with-cpp=false --with-rust=true:

dictionary value type: List(Field { name: "str_dict", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })
Error: JsonError("Unable to find any dictionaries for field Field { name: \"str_dict\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }")

After fixing this, there is index error fixed by #1636 (review).

@codecov-commenter
Copy link

codecov-commenter commented May 1, 2022

Codecov Report

Merging #1636 (7622bc3) into master (22e9f95) will increase coverage by 0.09%.
The diff coverage is 60.00%.

@@            Coverage Diff             @@
##           master    #1636      +/-   ##
==========================================
+ Coverage   83.02%   83.11%   +0.09%     
==========================================
  Files         193      193              
  Lines       55612    55847     +235     
==========================================
+ Hits        46174    46420     +246     
+ Misses       9438     9427      -11     
Impacted Files Coverage Δ
arrow-flight/src/utils.rs 0.00% <0.00%> (ø)
...ng/src/flight_client_scenarios/integration_test.rs 0.00% <0.00%> (ø)
...ng/src/flight_server_scenarios/integration_test.rs 0.00% <0.00%> (ø)
integration-testing/src/lib.rs 0.00% <0.00%> (ø)
arrow/src/ipc/reader.rs 88.80% <88.23%> (-0.23%) ⬇️
arrow/src/array/array.rs 89.67% <0.00%> (-3.04%) ⬇️
arrow/src/ffi.rs 88.51% <0.00%> (ø)
arrow/src/array/mod.rs 100.00% <0.00%> (ø)
arrow/src/array/builder.rs 86.79% <0.00%> (ø)
arrow/src/array/iterator.rs 96.11% <0.00%> (ø)
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22e9f95...7622bc3. Read the comment docs.

@tustvold tustvold added the api-change Changes to the arrow API label May 6, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, I think we should update the docs and naming to match the change 👍

let value_array = dictionaries[node_index].clone().unwrap();

let value_array =
dictionaries.get(&field.dict_id().unwrap()).unwrap().clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe return an error here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, updated.

@@ -49,7 +50,7 @@ pub fn flight_data_from_arrow_batch(
pub fn flight_data_to_arrow_batch(
data: &FlightData,
schema: SchemaRef,
dictionaries_by_field: &[Option<ArrayRef>],
dictionaries_by_field: &HashMap<i64, ArrayRef>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be renamed to dictionaries_by_id, here and in all the other places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I will update them.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @viirya

BTW thank you so much for working through the integration test failures ❤️ so nice!

data: &[u8],
buffers: &[ipc::Buffer],
dictionaries: &[Option<ArrayRef>],
dictionaries: &HashMap<i64, ArrayRef>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dictionaries: &HashMap<i64, ArrayRef>,
dictionaries_by_id: &HashMap<i64, ArrayRef>,

Maybe this would more consistent with the names used in the rest of this PR

@@ -457,7 +468,7 @@ pub fn read_record_batch(
buf: &[u8],
batch: ipc::RecordBatch,
schema: SchemaRef,
dictionaries: &[Option<ArrayRef>],
dictionaries: &HashMap<i64, ArrayRef>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dictionaries: &HashMap<i64, ArrayRef>,
dictionaries_by_id: &HashMap<i64, ArrayRef>,

@viirya
Copy link
Member Author

viirya commented May 6, 2022

Thank you @alamb. Renamed these names too.

@viirya viirya force-pushed the fix_ipc_nested_dict branch from 0dcf24d to 7622bc3 Compare May 6, 2022 19:13
@viirya viirya merged commit 6ad893c into apache:master May 8, 2022
@alamb alamb changed the title Fix generate_nested_dictionary_case integration test failure for Rust cases Fix generate_nested_dictionary_case integration test failure May 12, 2022
@alamb alamb changed the title Fix generate_nested_dictionary_case integration test failure Fix generate_nested_dictionary_case integration test failure May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix generate_nested_dictionary_case integration test failure for Rust cases
4 participants