-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read spark generated statistics in hive connector #16120
Conversation
c272e12
to
2e8cd62
Compare
2e8cd62
to
6d60eed
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
68fa9f2
to
a3a91f6
Compare
...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveSparkCompatibility.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreUtil.java
Outdated
Show resolved
Hide resolved
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreConfig.java
Outdated
Show resolved
Hide resolved
Are there potential downsides in adding this fallback functionality? cc: @raunaqmorarka |
60cf4ae
to
8fb947d
Compare
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreConfig.java
Outdated
Show resolved
Hide resolved
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreConfig.java
Outdated
Show resolved
Hide resolved
...trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftSparkMetastoreUtil.java
Outdated
Show resolved
Hide resolved
...trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftSparkMetastoreUtil.java
Outdated
Show resolved
Hide resolved
...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveSparkCompatibility.java
Outdated
Show resolved
Hide resolved
...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveSparkCompatibility.java
Show resolved
Hide resolved
...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveSparkCompatibility.java
Show resolved
Hide resolved
...o-hive/src/test/java/io/trino/plugin/hive/metastore/thrift/TestThriftSparkMetastoreUtil.java
Outdated
Show resolved
Hide resolved
...trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftSparkMetastoreUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreUtil.java
Outdated
Show resolved
Hide resolved
We don't do any extra metadata calls and parsing logic is very simple. The only downside is that the Spark SQL generated stats might be wrong or incomplete, in that case the new flag can be used to disable the fallback or the usual trino/hive statistics can be generated to avoid the fallback. |
Build is red
|
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreConfig.java
Outdated
Show resolved
Hide resolved
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreConfig.java
Outdated
Show resolved
Hide resolved
8fb947d
to
1ec5848
Compare
...trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftSparkMetastoreUtil.java
Outdated
Show resolved
Hide resolved
...trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftSparkMetastoreUtil.java
Outdated
Show resolved
Hide resolved
...trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftSparkMetastoreUtil.java
Outdated
Show resolved
Hide resolved
be70e8a
to
83d0c6d
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreUtil.java
Outdated
Show resolved
Hide resolved
...trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftSparkMetastoreUtil.java
Show resolved
Hide resolved
83d0c6d
to
715c812
Compare
")", tableName1)); | ||
|
||
onSpark().executeQuery(format("INSERT INTO %s VALUES " + | ||
"(120, 32760, 2147483640, 9223372036854775800, 123.340, 234.560, CAST(343.0 AS DECIMAL(10, 0)), CAST(345.670 AS DECIMAL(10, 5)), TIMESTAMP '2015-05-10 12:15:30', DATE '2015-05-08', 'p1 varchar', CAST('varchar10' AS VARCHAR(10)), CAST('p1 char10' AS CHAR(10)), false, CAST('p1 binary' as BINARY))," + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be beneficial to add in this test another row with non-null values.
As can be seen from here io.trino.plugin.hive.metastore.thrift.ThriftSparkMetastoreUtil#fromMetastoreDistinctValuesCount(long, long, long)
the ndv count is set to 1
so a future maintainer of the code may be a bit puzzled to see that the NDV doesn't correspond to the number of actual distinct values.
db6b0d8
to
0c6957b
Compare
79261aa
to
1d4b117
Compare
1d4b117
to
a3af9ee
Compare
...src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreParameterParserUtils.java
Outdated
Show resolved
Hide resolved
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreUtil.java
Outdated
Show resolved
Hide resolved
a3af9ee
to
f93d4a1
Compare
f93d4a1
to
b8556c8
Compare
add8439
to
4d36e4c
Compare
4d36e4c
to
bc66d80
Compare
bc66d80
to
10b9873
Compare
Description
Use spark generated statistics as a fallback to hive statistics. This will help in environments where table statistics are already generated by Apache Spark and users either don't want to re-run statistics generation through ANALYZE in Trino or don't want to give HMS write permissions to Trino.
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: