-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for stats on timestamp type in Delta Lake #22159
Conversation
.../src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeParquetStatisticsUtils.java
Show resolved
Hide resolved
.../src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeParquetStatisticsUtils.java
Show resolved
Hide resolved
...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogParser.java
Outdated
Show resolved
Hide resolved
91d5e77
to
65e1c3a
Compare
@findepi Addressed comments. |
.../src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeParquetStatisticsUtils.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeParquetStatisticsUtils.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeParquetStatisticsUtils.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeParquetStatisticsUtils.java
Outdated
Show resolved
Hide resolved
.../src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeParquetStatisticsUtils.java
Outdated
Show resolved
Hide resolved
long epochSeconds = floorDiv(epochMicros, MICROSECONDS_PER_SECOND); | ||
int nanoAdjustment = floorMod(epochMicros, MICROSECONDS_PER_SECOND) * NANOSECONDS_PER_MICROSECOND; | ||
Instant ts = Instant.ofEpochSecond(epochSeconds, nanoAdjustment); | ||
Instant truncatedToMillis = ts.truncatedTo(MILLIS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- add a check that
type
has precision at least millis. - add a comment why truncating to millis even for timestamp micros
- however, it wouldn't be hard to pick the truncation unit based on timestamp precision, so maybe let's do that instead of the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting a conversion from TimestampType.TIMESTAMP_MILLIS
to ChronoUnit.MILLIS
? Can you share the code snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add a code comment instead. I didn't come up with a simple code for such a conversion.
Description
Example of stats generated by Spark:
2020-08-26T01:02:03.123
2024-05-29T15:08:00.991Z
2020-08-26T01:02:03.123987
with INT64Both int96 and int64 are allowed in parquet files: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#delta-data-type-to-parquet-type-mappings
Fixes #21878
Release notes
(x) Release notes are required, with the following suggested text: