Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IntLogicalTypeAnnotation in parquet writer #18301

Merged
merged 1 commit into from
Jul 17, 2023

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Jul 14, 2023

Description

Allows reader implementations to potentially produce smaller in-memory representations
when reading data or skip bound checks from reading INT32 as smaller types.
Also useful for schema discovery tools to infer byte/short types from parquet INT32 types with annotation.

Additional context and related issues

https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#signed-integers

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Allows reader implementations to potentially produce
smaller in-memory representations when reading data
or skip bound checks from reading INT32 as smaller types
if (INTEGER.equals(type) || SMALLINT.equals(type) || TINYINT.equals(type)) {
return Types.primitive(PrimitiveType.PrimitiveTypeName.INT32, repetition).named(name);
// https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#signed-integers
// INT(32, true) and INT(64, true) are implied by the int32 and int64 primitive types if no other annotation is present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test is against databricks or hive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestHiveCompatibility and TestHiveSparkCompatibility test hive/spark reading files written by trino.
Worst case other readers should ignore this annotation and behave as before.

@raunaqmorarka raunaqmorarka merged commit 30e116a into trinodb:master Jul 17, 2023
@raunaqmorarka raunaqmorarka deleted the pqw-int-logical branch July 17, 2023 11:24
@github-actions github-actions bot added this to the 423 milestone Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants