Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the explain (TYPE IO) Exception when table is hive partition tabl… #12349

Merged
merged 1 commit into from
Sep 20, 2022

Conversation

albericgenius
Copy link
Contributor

Description

Fix the explain (TYPE IO) Exception when table is hive partition table which is empty

Is this change a fix, improvement, new feature, refactoring, or other?
Yes, a bug fix
Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
This a change for HIVE connector
How would you describe this change to a non-technical end user or system administrator?
Fix the explain (TYPE IO) Exception when table is hive partition table which is empty

Related issues, pull requests, and links

#10398

Documentation

(+) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(+) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`10398`)

@cla-bot cla-bot bot added the cla-signed label May 12, 2022
@findepi findepi requested review from homar and findinpath May 12, 2022 19:36
@findinpath
Copy link
Contributor

Please do follow the guideline for the Git commits

https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages

Make the git commit message header much smaller, wrap the message to 70-80 chars, and concentrate on make it more descriptive for the other fellows working on your changes.

@@ -2926,9 +2926,12 @@ public ConnectorTableHandle makeCompatiblePartitioning(ConnectorSession session,
@VisibleForTesting
static TupleDomain<ColumnHandle> createPredicate(List<ColumnHandle> partitionColumns, List<HivePartition> partitions)
{
if (partitions.isEmpty()) {
if (partitionColumns.isEmpty() && partitions.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (partitionColumns.isEmpty() && partitions.isEmpty()) {
if (partitions.isEmpty()){
return partitionColumns.isEmpty() ? TupleDomain.none() : TupleDomain.all();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albericgenius partitions.isEmpty() check is common in both if statements.
Please consider applying the suggested change.

Copy link
Contributor Author

@albericgenius albericgenius May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and updated, I can see the improvement from your comments.

@albericgenius
Copy link
Contributor Author

  • Thanks for sharing, it is really helpful
  • Updated the commit message header to smaller.

@findepi findepi requested review from findinpath and homar and removed request for homar May 17, 2022 14:42
public void testCreatePredicateWithEmptyPartition()
{
ImmutableList.Builder<HivePartition> partitions = ImmutableList.builder();
Domain domain = createPredicate(ImmutableList.of(TEST_COLUMN_HANDLE), partitions.build())
Copy link
Contributor

@findinpath findinpath May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ImmutableList.of() instead of defining extra variable partitions for keeping the code smaller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@findinpath
Copy link
Contributor

Please add in the description:

Fixes #10398

@findinpath
Copy link
Contributor

Please be more brief with implementation details in the commit message.
Think of explaining what you are doing and not every implementation detail how you did it.

@albericgenius
Copy link
Contributor Author

Please add in the description:

Fixes #10398

Thanks for notes, I added :)

@albericgenius
Copy link
Contributor Author

Yes, I have updated, English is not my first language, I try my best.
Thanks for your help and time.

@albericgenius
Copy link
Contributor Author

@findepi and @findinpath
This is my first approved commit, I am not sure the process.
do I need the second approval here, or you can help me to merge into master branch?
Thanks
Alberic

@findinpath
Copy link
Contributor

@albericgenius generally it may take a while (a few hours/ a few days) until a maintainer merges the commit.
In any case feel free to take a stab on another open issue in the meanwhile.

Good job!

@bitsondatadev we should probably document the PR process on https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md to avoid confusion.

@@ -2925,7 +2925,7 @@ public ConnectorTableHandle makeCompatiblePartitioning(ConnectorSession session,
static TupleDomain<ColumnHandle> createPredicate(List<ColumnHandle> partitionColumns, List<HivePartition> partitions)
{
if (partitions.isEmpty()) {
return TupleDomain.none();
return partitionColumns.isEmpty() ? TupleDomain.none() : TupleDomain.all();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (partitionColumns.isEmpty()) {
  // not a partitioned table
  checkArgument(partitions.size()==1 && UNPARTITIONED_ID.equals(getOnlyElement(partitions).getPartitionId()), "Unexpected partitions for a non-partitioned table: %s", partitions);
  return TupleDomain.all();

but then what remains would be (as it used to be)

if (partitions.isEmpty()) {
  return TupleDomain.none();
}

you return the opposite value. I don't understand why.

Copy link
Contributor Author

@albericgenius albericgenius May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • originally we return TupleDomain.none() when partitions.isEmpty()
  • but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in IoPlanPrinter.parseConstraints. because the constraint is none.
  • for no data of the PartitionedTable case, the partitions is empty, and partitionColumns is not empty
  • I could be wrong, please help to point out, I will update asap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation

  • but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in IoPlanPrinter.parseConstraints. because the constraint is none.

does it mean we should fix that method instead?

@albericgenius albericgenius force-pushed the mainline branch 4 times, most recently from e5d2a9f to e0a5f1b Compare May 19, 2022 10:24
Optional.empty(),
estimate));

assertUpdate("DROP TABLE test_io_explain");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_io_explain -> test_io_explain_with_empty_partitioned_table

using same table name as in the other test will make tests fail when run concurrently

Copy link
Contributor Author

@albericgenius albericgenius May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#12349 (comment)

  • updated the table name.
  • IoPlanPrinter.parseConstraints will check the constraint is none or not. The logic is correct.
  • I think if a partitioned table do not have any data, we should return TupleDomain.all() as constraint. there is only one case that partitions is empty and partitionColumns is not because of no data.
  • Only partitionColumns and partitions are empty, we should return TupleDomain.none().
  • What is your thought?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if a partitioned table do not have any data, we should return TupleDomain.all() as constraint.

TupleDomain.none() is also correct ("no rows match the filter"), and may be more useful ("more correct"), as can allow pruning other parts of the query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This implement will not take effect "no rows match the filter". because there is no data(partitions is empty and partitionColumns is not empty), even we return TupleDomain.all(), there is no data match the filter, the result is same.
  • I agree with you, this is better to fix inside IoPlanPrinter.parseConstraints, but i still do not know how to get partitions informations in plan process. i will continue to think about it tomorrow. if you have free time, please help to give me some suggestion.

Thanks for your time
Alberic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even we return TupleDomain.all(), there is no data match the filter, the result is same.

io.trino.spi.connector.ConnectorMetadata#getTableProperties's io.trino.spi.connector.ConnectorTableProperties#predicate is used to inform the planner and allow deriving filters for other tables.

for example, a query

SELECT * FROM some_table JOIN empty_partitioned_table ON ...

should be reduced to SELECT .. WHERE false because the planner realizes empty_partitioned_table has no rows (TupleDomain.none()).

@@ -2925,7 +2925,7 @@ public ConnectorTableHandle makeCompatiblePartitioning(ConnectorSession session,
static TupleDomain<ColumnHandle> createPredicate(List<ColumnHandle> partitionColumns, List<HivePartition> partitions)
{
if (partitions.isEmpty()) {
return TupleDomain.none();
return partitionColumns.isEmpty() ? TupleDomain.none() : TupleDomain.all();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation

  • but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in IoPlanPrinter.parseConstraints. because the constraint is none.

does it mean we should fix that method instead?

@albericgenius albericgenius force-pushed the mainline branch 2 times, most recently from 035afed to 9a4b2d1 Compare May 20, 2022 17:04
@findinpath findinpath self-requested a review May 23, 2022 10:20
@@ -717,7 +716,9 @@ private EstimatedStatsAndCost getEstimatedStatsAndCost(TableScanNode node)

private Set<ColumnConstraint> parseConstraints(TableHandle tableHandle, TupleDomain<ColumnHandle> constraint)
{
checkArgument(!constraint.isNone());
if (constraint.isNone()) {
return ImmutableSet.of();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what we would return if predicate is all, so it's probably not the right return value for the none case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IoPlan.TableColumnInfo cannot currently represent a table with NONE constraint.

In general, such table should be eliminated from the query plan, this happens e.g. here

TableProperties newTableProperties = plannerContext.getMetadata().getTableProperties(session, newTable);
Optional<TablePartitioning> newTablePartitioning = newTableProperties.getTablePartitioning();
if (newTableProperties.getPredicate().isNone()) {
return Optional.of(new ValuesNode(node.getId(), node.getOutputSymbols(), ImmutableList.of()));

We could fix the optimizer so that it happens as well in the SELECT * FROM empty_partitioned_table case and we probably should. However, this wouldn't eliminate the need to fix EXPLAIN (TYPE IO) from failing in such case, since it's generally a possible situation. That's why we deal with it also on page source level instead of failing there

So, back to our problem. We need to make IoPlan.TableColumnInfo be able to represent table scan without data, with none constraint.
I would suggest replacing Set<ColumnConstraint> columnConstraints field with

class Constraint {
  boolean isNone;
  Set<ColumnConstraint> columnConstraints;
}

cc @martint @sopel39 @kokosing @losipiuk @JamesRTaylor

@findepi
Copy link
Member

findepi commented May 24, 2022

@albericgenius albericgenius force-pushed the mainline branch from 9a4b2d1 to 26ef697 17 hours ago

the diff is https://github.com/trinodb/trino/compare/9a4b2d177ee4d2ebd0c992c8385bc00fbab7c5ac..26ef697107add3b991a11775fe66764dfbce5837

@albericgenius please don't rebase, unless necessary.
it makes reviewing harder. Github should be able to show a diff that of changes that matter, but it's apparently not implemented yet.

@albericgenius
Copy link
Contributor Author

albericgenius commented Sep 15, 2022

@findepi Thanks for your help and time.
I try to follow your idea to add a Rule: RemoveEmptyTableScan. if the table is empty and partition is empty, i will replace the scan table node with a value node. Then the IoPlanPrinter issue is resolved.

I am not sure my implement way is match your idea or not? now it affect some JDBC test cases because of this new Rule.

  • If it is matched, i will continue to fix the JDBC test cases.
  • Otherwise still need your expertise and help to provide any comment. I will response as soon as possible.

@albericgenius albericgenius reopened this Sep 15, 2022
@findepi
Copy link
Member

findepi commented Sep 16, 2022

@albericgenius
in #12349 (comment) i indicated we need to fix the IO plan printer

So, back to our problem. We need to make IoPlan.TableColumnInfo be able to represent table scan without data, with none constraint.
I would suggest replacing Set columnConstraints field with ...

However, i also see that we could choose NOT to fix the plan printer, and consider a plan with redundant TableScan left as "bogus" or "invalid". It's surely undesirable.

@martint thoughts?

@albericgenius albericgenius force-pushed the mainline branch 3 times, most recently from d90cbf1 to 5254b78 Compare September 19, 2022 15:41
@martint
Copy link
Member

martint commented Sep 19, 2022

However, i also see that we could choose NOT to fix the plan printer, and consider a plan with redundant TableScan left as "bogus" or "invalid". It's surely undesirable.

A TableScan that produces no data is a perfectly valid plan, so that should be fixed in the plan printer. It might be undesirable from a performance perspective, but that's just an optimization.

public static class Constraint
{
private final boolean isNone;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@JsonProperty("columnConstraints") Set<ColumnConstraint> columnConstraints)
{
this.isNone = isNone;
this.columnConstraints = columnConstraints;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a defensive copy to ensure that Constraint is immutable

Suggested change
this.columnConstraints = columnConstraints;
this.columnConstraints = ImmutableSet.copyIf(requireNonNull(columnConstraints, "columnConstraints is null"));

@@ -697,13 +754,14 @@ private void addInputTableConstraints(TupleDomain<ColumnHandle> filterDomain, Ta
TableMetadata tableMetadata = plannerContext.getMetadata().getTableMetadata(session, table);
TupleDomain<ColumnHandle> predicateDomain = plannerContext.getMetadata().getTableProperties(session, table).getPredicate();
EstimatedStatsAndCost estimatedStatsAndCost = getEstimatedStatsAndCost(tableScan);
boolean withoutData = estimatedStatsAndCost.getOutputRowCount() == 0d ? true : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here should not depend on stats.
Stats are estimates, can be inaccurate and off.

{
checkArgument(!constraint.isNone());
checkArgument(!constraint.isNone() || withoutData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to replace this check with an if

if (constraint.isNone()) {
  return new Constraint(true, ImmutableSet.of());
}

Comment on lines 1055 to 1056
new IoPlanPrinter.Constraint(withoutData,
ImmutableSet.of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indentation here is off and will change as soon as someone invokes reformatting

Suggested change
new IoPlanPrinter.Constraint(withoutData,
ImmutableSet.of(
new IoPlanPrinter.Constraint(withoutData, ImmutableSet.of(

@@ -1043,13 +1044,15 @@ public void testIoExplain()
computeActual("CREATE TABLE test_io_explain WITH (partitioned_by = ARRAY['orderkey', 'processing']) AS SELECT custkey, orderkey, orderstatus = 'P' processing FROM orders WHERE orderkey < 3");

EstimatedStatsAndCost estimate = new EstimatedStatsAndCost(2.0, 40.0, 40.0, 0.0, 0.0);
boolean withoutData = estimate.getOutputRowCount() == 0d ? true : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now the expected result (constraint here shouldn't be none)
we don't need a variable and conditional initialization for that

@@ -1091,64 +1094,70 @@ public void testIoExplain()
computeActual("CREATE TABLE test_io_explain WITH (partitioned_by = ARRAY['orderkey']) AS SELECT custkey, orderkey FROM orders WHERE orderkey < 200");

estimate = new EstimatedStatsAndCost(55.0, 990.0, 990.0, 0.0, 0.0);
withoutData = estimate.getOutputRowCount() == 0d ? true : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above -- replace with constant and inline

Currently, this can happen e.g. when a hive partitioned table is empty.
Ideally, such a table scan should be eliminated from the plan, but plan
printing should not rely on that, this would be just an optimization.
@findepi
Copy link
Member

findepi commented Sep 20, 2022

I applied some cosmetic changes (to the code & commit message) myself. Will merge once the build passes.

Thank you for your contribution!

@albericgenius
Copy link
Contributor Author

I applied some cosmetic changes (to the code & commit message) myself. Will merge once the build passes.

Thank you for your contribution!

Thanks for your time and your coaching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants