-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the explain (TYPE IO) Exception when table is hive partition tabl… #12349
Conversation
Please do follow the guideline for the Git commits Make the git commit message header much smaller, wrap the message to 70-80 chars, and concentrate on make it more descriptive for the other fellows working on your changes. |
@@ -2926,9 +2926,12 @@ public ConnectorTableHandle makeCompatiblePartitioning(ConnectorSession session, | |||
@VisibleForTesting | |||
static TupleDomain<ColumnHandle> createPredicate(List<ColumnHandle> partitionColumns, List<HivePartition> partitions) | |||
{ | |||
if (partitions.isEmpty()) { | |||
if (partitionColumns.isEmpty() && partitions.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (partitionColumns.isEmpty() && partitions.isEmpty()) { | |
if (partitions.isEmpty()){ | |
return partitionColumns.isEmpty() ? TupleDomain.none() : TupleDomain.all(); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albericgenius partitions.isEmpty()
check is common in both if
statements.
Please consider applying the suggested change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and updated, I can see the improvement from your comments.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
|
public void testCreatePredicateWithEmptyPartition() | ||
{ | ||
ImmutableList.Builder<HivePartition> partitions = ImmutableList.builder(); | ||
Domain domain = createPredicate(ImmutableList.of(TEST_COLUMN_HANDLE), partitions.build()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use ImmutableList.of()
instead of defining extra variable partitions
for keeping the code smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveMetadata.java
Outdated
Show resolved
Hide resolved
Please add in the description:
|
Please be more brief with implementation details in the commit message. |
Thanks for notes, I added :) |
|
@findepi and @findinpath |
@albericgenius generally it may take a while (a few hours/ a few days) until a maintainer merges the commit. Good job! @bitsondatadev we should probably document the PR process on https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md to avoid confusion. |
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
@@ -2925,7 +2925,7 @@ public ConnectorTableHandle makeCompatiblePartitioning(ConnectorSession session, | |||
static TupleDomain<ColumnHandle> createPredicate(List<ColumnHandle> partitionColumns, List<HivePartition> partitions) | |||
{ | |||
if (partitions.isEmpty()) { | |||
return TupleDomain.none(); | |||
return partitionColumns.isEmpty() ? TupleDomain.none() : TupleDomain.all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (partitionColumns.isEmpty()) {
// not a partitioned table
checkArgument(partitions.size()==1 && UNPARTITIONED_ID.equals(getOnlyElement(partitions).getPartitionId()), "Unexpected partitions for a non-partitioned table: %s", partitions);
return TupleDomain.all();
but then what remains would be (as it used to be)
if (partitions.isEmpty()) {
return TupleDomain.none();
}
you return the opposite value. I don't understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- originally we return TupleDomain.none() when partitions.isEmpty()
- but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in
IoPlanPrinter.parseConstraints
. because the constraint is none. - for no data of the PartitionedTable case, the partitions is empty, and partitionColumns is not empty
- I could be wrong, please help to point out, I will update asap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation
- but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in
IoPlanPrinter.parseConstraints
. because the constraint is none.
does it mean we should fix that method instead?
e5d2a9f
to
e0a5f1b
Compare
Optional.empty(), | ||
estimate)); | ||
|
||
assertUpdate("DROP TABLE test_io_explain"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_io_explain -> test_io_explain_with_empty_partitioned_table
using same table name as in the other test will make tests fail when run concurrently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- updated the table name.
- IoPlanPrinter.parseConstraints will check the constraint is none or not. The logic is correct.
- I think if a partitioned table do not have any data, we should return TupleDomain.all() as constraint. there is only one case that partitions is empty and partitionColumns is not because of no data.
- Only partitionColumns and partitions are empty, we should return TupleDomain.none().
- What is your thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if a partitioned table do not have any data, we should return TupleDomain.all() as constraint.
TupleDomain.none()
is also correct ("no rows match the filter"), and may be more useful ("more correct"), as can allow pruning other parts of the query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This implement will not take effect "no rows match the filter". because there is no data(partitions is empty and partitionColumns is not empty), even we return TupleDomain.all(), there is no data match the filter, the result is same.
- I agree with you, this is better to fix inside IoPlanPrinter.parseConstraints, but i still do not know how to get partitions informations in plan process. i will continue to think about it tomorrow. if you have free time, please help to give me some suggestion.
Thanks for your time
Alberic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even we return TupleDomain.all(), there is no data match the filter, the result is same.
io.trino.spi.connector.ConnectorMetadata#getTableProperties
's io.trino.spi.connector.ConnectorTableProperties#predicate
is used to inform the planner and allow deriving filters for other tables.
for example, a query
SELECT * FROM some_table JOIN empty_partitioned_table ON ...
should be reduced to SELECT .. WHERE false
because the planner realizes empty_partitioned_table
has no rows (TupleDomain.none()
).
@@ -2925,7 +2925,7 @@ public ConnectorTableHandle makeCompatiblePartitioning(ConnectorSession session, | |||
static TupleDomain<ColumnHandle> createPredicate(List<ColumnHandle> partitionColumns, List<HivePartition> partitions) | |||
{ | |||
if (partitions.isEmpty()) { | |||
return TupleDomain.none(); | |||
return partitionColumns.isEmpty() ? TupleDomain.none() : TupleDomain.all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation
- but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in
IoPlanPrinter.parseConstraints
. because the constraint is none.
does it mean we should fix that method instead?
035afed
to
9a4b2d1
Compare
@@ -717,7 +716,9 @@ private EstimatedStatsAndCost getEstimatedStatsAndCost(TableScanNode node) | |||
|
|||
private Set<ColumnConstraint> parseConstraints(TableHandle tableHandle, TupleDomain<ColumnHandle> constraint) | |||
{ | |||
checkArgument(!constraint.isNone()); | |||
if (constraint.isNone()) { | |||
return ImmutableSet.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what we would return if predicate is all
, so it's probably not the right return value for the none
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IoPlan.TableColumnInfo
cannot currently represent a table with NONE constraint.
In general, such table should be eliminated from the query plan, this happens e.g. here
Lines 255 to 258 in 14b901d
TableProperties newTableProperties = plannerContext.getMetadata().getTableProperties(session, newTable); | |
Optional<TablePartitioning> newTablePartitioning = newTableProperties.getTablePartitioning(); | |
if (newTableProperties.getPredicate().isNone()) { | |
return Optional.of(new ValuesNode(node.getId(), node.getOutputSymbols(), ImmutableList.of())); |
We could fix the optimizer so that it happens as well in the SELECT * FROM empty_partitioned_table
case and we probably should. However, this wouldn't eliminate the need to fix EXPLAIN (TYPE IO)
from failing in such case, since it's generally a possible situation. That's why we deal with it also on page source level instead of failing there
return new EmptyPageSource(); |
So, back to our problem. We need to make IoPlan.TableColumnInfo
be able to represent table scan without data, with none constraint.
I would suggest replacing Set<ColumnConstraint> columnConstraints
field with
class Constraint {
boolean isNone;
Set<ColumnConstraint> columnConstraints;
}
@albericgenius please don't rebase, unless necessary. |
6fc32ec
to
030f789
Compare
@findepi Thanks for your help and time. I am not sure my implement way is match your idea or not? now it affect some JDBC test cases because of this new Rule.
|
@albericgenius
However, i also see that we could choose NOT to fix the plan printer, and consider a plan with redundant TableScan left as "bogus" or "invalid". It's surely undesirable. @martint thoughts? |
d90cbf1
to
5254b78
Compare
A TableScan that produces no data is a perfectly valid plan, so that should be fixed in the plan printer. It might be undesirable from a performance perspective, but that's just an optimization. |
public static class Constraint | ||
{ | ||
private final boolean isNone; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JsonProperty("columnConstraints") Set<ColumnConstraint> columnConstraints) | ||
{ | ||
this.isNone = isNone; | ||
this.columnConstraints = columnConstraints; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make a defensive copy to ensure that Constraint is immutable
this.columnConstraints = columnConstraints; | |
this.columnConstraints = ImmutableSet.copyIf(requireNonNull(columnConstraints, "columnConstraints is null")); |
@@ -697,13 +754,14 @@ private void addInputTableConstraints(TupleDomain<ColumnHandle> filterDomain, Ta | |||
TableMetadata tableMetadata = plannerContext.getMetadata().getTableMetadata(session, table); | |||
TupleDomain<ColumnHandle> predicateDomain = plannerContext.getMetadata().getTableProperties(session, table).getPredicate(); | |||
EstimatedStatsAndCost estimatedStatsAndCost = getEstimatedStatsAndCost(tableScan); | |||
boolean withoutData = estimatedStatsAndCost.getOutputRowCount() == 0d ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here should not depend on stats.
Stats are estimates, can be inaccurate and off.
{ | ||
checkArgument(!constraint.isNone()); | ||
checkArgument(!constraint.isNone() || withoutData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we need to replace this check with an if
if (constraint.isNone()) {
return new Constraint(true, ImmutableSet.of());
}
new IoPlanPrinter.Constraint(withoutData, | ||
ImmutableSet.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the indentation here is off and will change as soon as someone invokes reformatting
new IoPlanPrinter.Constraint(withoutData, | |
ImmutableSet.of( | |
new IoPlanPrinter.Constraint(withoutData, ImmutableSet.of( |
@@ -1043,13 +1044,15 @@ public void testIoExplain() | |||
computeActual("CREATE TABLE test_io_explain WITH (partitioned_by = ARRAY['orderkey', 'processing']) AS SELECT custkey, orderkey, orderstatus = 'P' processing FROM orders WHERE orderkey < 3"); | |||
|
|||
EstimatedStatsAndCost estimate = new EstimatedStatsAndCost(2.0, 40.0, 40.0, 0.0, 0.0); | |||
boolean withoutData = estimate.getOutputRowCount() == 0d ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now the expected result (constraint here shouldn't be none)
we don't need a variable and conditional initialization for that
@@ -1091,64 +1094,70 @@ public void testIoExplain() | |||
computeActual("CREATE TABLE test_io_explain WITH (partitioned_by = ARRAY['orderkey']) AS SELECT custkey, orderkey FROM orders WHERE orderkey < 200"); | |||
|
|||
estimate = new EstimatedStatsAndCost(55.0, 990.0, 990.0, 0.0, 0.0); | |||
withoutData = estimate.getOutputRowCount() == 0d ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above -- replace with constant and inline
5254b78
to
dd8ac8c
Compare
Currently, this can happen e.g. when a hive partitioned table is empty. Ideally, such a table scan should be eliminated from the plan, but plan printing should not rely on that, this would be just an optimization.
I applied some cosmetic changes (to the code & commit message) myself. Will merge once the build passes. Thank you for your contribution! |
Thanks for your time and your coaching. |
Description
Fix the explain (TYPE IO) Exception when table is hive partition table which is empty
Related issues, pull requests, and links
#10398
Documentation
(+) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(+) No release notes entries required.
( ) Release notes entries required with the following suggested text: