Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Fixed empty reader panic for NDJSON type infer #974

Merged

Conversation

Roberto-XY
Copy link
Contributor

Hi there,
I am currently in the process of learning some Rust & what better way then to contribute ;)

  • Now returns an error on empty readers instead of panic in infer. In deserialize as well, to have consistent behavior.
  • json::coerce_data_type fails if there are no lines to be read from reader #911 also suggested handling that as DataType::Null. I decided against that, since an empty string is not valid JSON. But null is, which also panicked—fixed that. Or do you prefer a separate PR?
  • added some explicit tests for null to coerce_data_type

Fixes #911

assert_eq!(
coerce_data_type(&[DataType::Null, DataType::Boolean]),
DataType::Utf8
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended? Should it not be just DataType::Boolean?

Similarly, {"a": null}" is currently infered as DataType::Struct(vec![])and not DataType::Struct(vec![Field::new("a", DataType::Null, true)])

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. I think it is intended - infer_array does not pass DataType::Null to coerce_data_type since arrow arrays are nullable and thus we handle Null as part before passing to coerce_data_type. The idea is here.

With that said, imo it is not the cleanest design and maybe we should clean it up?

@jorgecarleitao jorgecarleitao added the bug Something isn't working label May 1, 2022
@codecov
Copy link

codecov bot commented May 1, 2022

Codecov Report

Merging #974 (bf3dc2a) into main (fa34796) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #974      +/-   ##
==========================================
+ Coverage   71.50%   71.52%   +0.02%     
==========================================
  Files         356      356              
  Lines       19671    19679       +8     
==========================================
+ Hits        14065    14076      +11     
+ Misses       5606     5603       -3     
Impacted Files Coverage Δ
src/io/json/read/infer_schema.rs 91.46% <100.00%> (+0.21%) ⬆️
src/io/ndjson/read/deserialize.rs 100.00% <100.00%> (ø)
src/io/ndjson/read/file.rs 80.00% <100.00%> (+1.42%) ⬆️
src/io/json/read/deserialize.rs 67.54% <0.00%> (+0.66%) ⬆️
src/array/equal/mod.rs 88.54% <0.00%> (+2.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa34796...bf3dc2a. Read the comment docs.

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wonderful. Thanks a lot, very clean and super easy to follow.

Feel welcome to use this project to learn Rust - e.g. via incomplete PRs, PR reviews and/or challenging the design or code. :)

assert_eq!(
coerce_data_type(&[DataType::Null, DataType::Boolean]),
DataType::Utf8
);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. I think it is intended - infer_array does not pass DataType::Null to coerce_data_type since arrow arrays are nullable and thus we handle Null as part before passing to coerce_data_type. The idea is here.

With that said, imo it is not the cleanest design and maybe we should clean it up?

@jorgecarleitao jorgecarleitao merged commit 3c64e7a into jorgecarleitao:main May 1, 2022
@Roberto-XY Roberto-XY deleted the 911_fix_coerce_data_type branch May 2, 2022 07:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

json::coerce_data_type fails if there are no lines to be read from reader
2 participants