-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: schema validation should allow scale == precision for decimal type #1607
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think decimal(5, 5)
is valid but it sounds a bit weird that the spec is incorrect but the implementations are correct. Should we update the Parquet spec too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I can get, this should be correct. I have no idea why the Parquet spec has defined scale must be less than precision, but looks like the known implementations don't follow it...
The majority of implementations I've used allow for scale == precision. See apache/arrow-rs#1607 for further motivation.
The majority of implementations I've used allow for scale == precision. See apache/arrow-rs#1607 for further motivation.
The majority of implementations allow for scale == precision. See apache/arrow-rs#1607 for further motivation.
- The majority of implementations allow for scale == precision. See apache/arrow-rs#1607 for further motivation. - Add comments to DecimalType in parquet.thrift
Which issue does this PR close?
Closes #1606.
Rationale for this change
For decimal type, it is a valid case for scale to be equal to precision. However currently the
PrimitiveTypeBuilder
will return error in such case.It seems the Parquet logical type specification for decimal type is also incorrect on this:
Both Java and C++ implementation allows scale to be equal to precision.
What changes are included in this PR?
Fix the schema validation and allow the case when scale is equal to precision. Added a test for this.
Are there any user-facing changes?
Decimal(5, 5)
.