-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add statistics collection for added column #16109
Conversation
@@ -2215,9 +2212,7 @@ private TableStatisticsMetadata getStatisticsCollectionMetadata( | |||
.filter(columnMetadata -> analyzeColumnNames.contains(columnMetadata.getName())) | |||
.forEach(columnMetadata -> { | |||
if (!(columnMetadata.getType() instanceof FixedWidthType)) { | |||
if (existingStatistics.isEmpty() || totalSizeStatisticsExists(existingStatistics.get().getColumnStatistics(), columnMetadata.getName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for previous statistics of total size is probably redundant. If column was not selected in ANALYZE WITH query it should be already filtered by .filter(columnMetadata -> analyzeColumnNames.contains(columnMetadata.getName()))
I could not find any other case when those statistics are not present except for case when new column is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, Alex. let's assume this is fine.
What do you think about instead modifying the |
That's an option. But if there is no scenario when totalSizeStatisticsExists is useful removing it will leave code less complex. |
It will also start showing 0 instead of null in case of using ANALYZE x WITH (columns = ...). I'm not sure it's a big issue but it can be misleading. |
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAnalyze.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAnalyze.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexjo2144 PTAL
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAnalyze.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAnalyze.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAnalyze.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAnalyze.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAnalyze.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Yuya and Piotr are okay with this: #16109 (comment) then this looks good to me.
Rebased to include fix for skipped tests in CI |
Description
Currently when column is added when statistics are present total size is not collected for newly added column.
NDV statistics works fine.
Currently:
After change:
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: