-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix generate_nested_dictionary_case
integration test failure
#1636
Conversation
arrow/src/ipc/reader.rs
Outdated
} | ||
} | ||
// Add (possibly multiple) array refs to the dictionaries array. | ||
dictionaries_by_field.insert(id, dictionary_values.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main difference here. In create_array
, we use dictionaries[node_index]
to access dictionary array by node_index
. But dictionary arrays are indexed by dict id, not node index.
@@ -640,6 +645,7 @@ fn dictionary_array_from_json( | |||
dict_key: &DataType, | |||
dict_value: &DataType, | |||
dictionary: &ArrowJsonDictionaryBatch, | |||
dictionaries: Option<&HashMap<i64, ArrowJsonDictionaryBatch>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing in the map of dictionaries so nested dictionary can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the following error comes out when running archery --debug integration --run-flight --with-cpp=false --with-rust=true
:
dictionary value type: List(Field { name: "str_dict", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })
Error: JsonError("Unable to find any dictionaries for field Field { name: \"str_dict\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }")
After fixing this, there is index error fixed by #1636 (review).
Codecov Report
@@ Coverage Diff @@
## master #1636 +/- ##
==========================================
+ Coverage 83.02% 83.11% +0.09%
==========================================
Files 193 193
Lines 55612 55847 +235
==========================================
+ Hits 46174 46420 +246
+ Misses 9438 9427 -11
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, I think we should update the docs and naming to match the change 👍
arrow/src/ipc/reader.rs
Outdated
let value_array = dictionaries[node_index].clone().unwrap(); | ||
|
||
let value_array = | ||
dictionaries.get(&field.dict_id().unwrap()).unwrap().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could maybe return an error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, updated.
arrow-flight/src/utils.rs
Outdated
@@ -49,7 +50,7 @@ pub fn flight_data_from_arrow_batch( | |||
pub fn flight_data_to_arrow_batch( | |||
data: &FlightData, | |||
schema: SchemaRef, | |||
dictionaries_by_field: &[Option<ArrayRef>], | |||
dictionaries_by_field: &HashMap<i64, ArrayRef>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be renamed to dictionaries_by_id, here and in all the other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I will update them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @viirya
BTW thank you so much for working through the integration test failures ❤️ so nice!
arrow/src/ipc/reader.rs
Outdated
data: &[u8], | ||
buffers: &[ipc::Buffer], | ||
dictionaries: &[Option<ArrayRef>], | ||
dictionaries: &HashMap<i64, ArrayRef>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dictionaries: &HashMap<i64, ArrayRef>, | |
dictionaries_by_id: &HashMap<i64, ArrayRef>, |
Maybe this would more consistent with the names used in the rest of this PR
arrow/src/ipc/reader.rs
Outdated
@@ -457,7 +468,7 @@ pub fn read_record_batch( | |||
buf: &[u8], | |||
batch: ipc::RecordBatch, | |||
schema: SchemaRef, | |||
dictionaries: &[Option<ArrayRef>], | |||
dictionaries: &HashMap<i64, ArrayRef>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dictionaries: &HashMap<i64, ArrayRef>, | |
dictionaries_by_id: &HashMap<i64, ArrayRef>, |
Thank you @alamb. Renamed these names too. |
0dcf24d
to
7622bc3
Compare
generate_nested_dictionary_case
integration test failure
generate_nested_dictionary_case
integration test failure generate_nested_dictionary_case
integration test failure
Which issue does this PR close?
Closes #1635.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?