-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HiveQueryRunner#main more user friendly #11151
Conversation
@@ -358,6 +400,9 @@ public static void main(String[] args) | |||
.setHiveProperties(ImmutableMap.of()) | |||
.setInitialTables(TpchTable.getTables()) | |||
.setBaseDataDir(baseDataDir) | |||
.setSecurity(ALLOW_ALL) | |||
.setTpchColumnNaming(STANDARD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we converge everywhere to standard, this is useful.
until then, it's a no go to be default in query runner's main.
Query runner's main must allow running test queries from the tests themselves, and tests currently rely on the simplified names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. Yeah, let's keep the defaults. What do you think about commenting out these two lines so the standard naming can be easily enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with commented out lines.
queryRunner.createCatalog("tpch", "tpch", tpchCatalogProperties); | ||
|
||
queryRunner.installPlugin(new TpcdsPlugin()); | ||
queryRunner.createCatalog("tpcds", "tpcds"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we think it's generally useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HiveQueryRunner#main
is a convenient way to start pseudo distributed cluster for testing. Having tpcds
connector deploy allows to run tpcds
queries and play with the tpcds
dataset. Is the concern here that tpcds
will be using unnecessary resources when running integration tests? If that's the case and if we think that the resources used are significant I can make this conditional and only enable it in HiveQueryRunner#main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am more concerned about startup time. but probably it doesn't cost much.
(i almost never use tpcds data set, so i am surprised this is useful outside of benchmarks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are planning to use tpcds dataset for project Tardigrade testing
0591a49
to
8671cd0
Compare
Updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except "Support decimal columns in Tpch statistics provider"
@@ -356,6 +357,9 @@ private static ColumnStatistics toColumnStatistics(ColumnStatisticsData stats, T | |||
if (min.isEmpty() || max.isEmpty()) { | |||
return Optional.empty(); | |||
} | |||
if (columnType instanceof DecimalType) { | |||
return Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a weird change. The commit says "Support decimal columns in Tpch statistics provider", but it looks like removing support for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this line it fails on DecimalType
being not supported. Perhaps a better solution would be to try to return values, but that requires a bigger refactor. I need this change just to unblock running tpch queries with decimal columns enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a better solution would be to try to return values
agreed
but that requires a bigger refactor.
i hoped not. The actual ranges should be the same, and io.trino.spi.statistics.ColumnStatistics
is already "normalized" into doubles, so we should be able use the min/max values we have both for doubles and decimals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you are probably right, let me take a closer look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually yeah, the stats for decimal columns are available and it was super easy to propagate them. Thanks for the suggestion. Updated.
It is useful to have tpcds connector enabled when trying to debug tpcds queries locally
To avoid dealing with roles and permissions when debugging
To allow running unmodified tpch queries when debugging
8671cd0
to
69a6841
Compare
Description
Change
HiveQueryRunner#main
configurationtpcds
connector by defaulttpch
schema to allow running unmodifiedtpch
queriesImprovement
Tests
Non end user visible change
Related issues, pull requests, and links
-
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: