Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix decimal repr in parquet schema printer #721

Merged
merged 1 commit into from
Aug 31, 2021

Conversation

sergiimk
Copy link
Contributor

Which issue does this PR close?

Closes #713.

Rationale for this change

Formatting of DECIMAL in schemas where only ConvertedType is present and not LogicalType was broken.
This is the case for parquet files produced by Spark.

What changes are included in this PR?

I also had to update decimal_length_from_precision function as the math there seemed off.
When used as .with_length(decimal_length_from_precision(19)) it resulted in a panic:

Cannot represent FIXED_LEN_BYTE_ARRAY as DECIMAL with length 8 and precision 19. The max precision can only be 18

Are there any user-facing changes?

print_schema will no longer produce malformed results for decimals.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 26, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #721 (4ab1a14) into master (5a12d97) will increase coverage by 0.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #721      +/-   ##
==========================================
+ Coverage   82.50%   82.51%   +0.01%     
==========================================
  Files         168      168              
  Lines       47589    47597       +8     
==========================================
+ Hits        39261    39273      +12     
+ Misses       8328     8324       -4     
Impacted Files Coverage Δ
parquet/src/schema/printer.rs 74.65% <83.33%> (+1.95%) ⬆️
parquet/src/encodings/encoding.rs 94.48% <0.00%> (-0.20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a12d97...4ab1a14. Read the comment docs.

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, this is where named argument passing would be really helpful >_<

@alamb alamb requested a review from sunchao August 28, 2021 11:20
@alamb alamb changed the title Fix decimal repr in schema Fix decimal repr in parquet schema printer Aug 28, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @sergiimk ! I can't say I am an expert in this area of the parquet but the test looks good so given CI passes, looks good to me

@sunchao FYI

.with_repetition(Repetition::OPTIONAL)
.build()
.unwrap(),
"OPTIONAL FIXED_LEN_BYTE_ARRAY (9) decimal (DECIMAL(19,4));",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case anyone is interested, I double checked that test fails without the code change in this PR (unsurprisingly):


---- schema::printer::tests::test_print_flba_logical_types stdout ----
thread 'schema::printer::tests::test_print_flba_logical_types' panicked at 'called `Result::unwrap()` on an `Err` value: General("Cannot represent FIXED_LEN_BYTE_ARRAY as DECIMAL with length 8 and precision 19. The max precision can only be 18")', parquet/src/schema/printer.rs:644:18

@alamb alamb merged commit 1a03358 into apache:master Aug 31, 2021
alamb pushed a commit that referenced this pull request Sep 9, 2021
alamb added a commit that referenced this pull request Sep 9, 2021
Fixes #713

Co-authored-by: Sergii Mikhtoniuk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decimal logical type is formatted incorrectly by print_schema
4 participants