Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support automatic type coercion for table creation #13994

Merged
merged 5 commits into from
Sep 4, 2023

Conversation

mdesmet
Copy link
Contributor

@mdesmet mdesmet commented Sep 5, 2022

Allow connectors to specify the actual Trino types that will be persisted and let the engine applies coercions as necessary.

Description

The behaviour around type coercion and casts in a CREATE TABLE is different from the CTAS. For example: the user created a table with TIMESTAMP(6). On insert the table type is compared to the query type and casts or coercions from the same type are applied upon the supplied values (resulting in rounding if necessary)

create table catalog.default.test (a TIMESTAMP(6));

insert into catalog.default.test values TIMESTAMP '9999-12-31 23:59:59.999999999';
-- no error

select a from catalog.default.test;
-- value is rounded

In this PR similar logic is applied upon CTAS statements through distinguishing between the target table type and the query type. By default the query and table type are the same, however a connector can specify overrides through a Connector Metadata method.

   /**
     * Coerces the input type to the effective Trino type supported by the connector
     * Return the effective {@link io.trino.spi.type.Type} that is supported by the connector for the given type, otherwise return {@link Optional#empty()}.
     */
    default Optional<Type> coerceNewTableColumn(ConnectorSession session, Type type)

Given this information coercions and casts can be performed by the engine instead of the connector itself, resulting in simpler connectors.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Change to core query engine and SPI

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# General
* Support automatic type coercion in engine for table creation.

@losipiuk
Copy link
Member

@findepi, @martint I vaguely recall there was a discussion around this specific matter, not sure if conclusions were made. Is your memory more crispy?

@hashhar
Copy link
Member

hashhar commented Sep 12, 2022

@losipiuk See #13981 (comment) - I guess that's what you are thinking of.

@losipiuk
Copy link
Member

@losipiuk See #13981 (comment) - I guess that's what you are thinking of.

I meant something older - but this is a good summary. Thank you :)


String catalogName = destination.getCatalogName();
CatalogHandle catalogHandle = metadata.getCatalogHandle(session, catalogName)
.orElseThrow(() -> semanticException(CATALOG_NOT_FOUND, query, "Destination catalog '%s' does not exist", catalogName));

Assignments.Builder assignments = Assignments.builder();
ImmutableList.Builder<ColumnMetadata> insertedColumnsBuilder = ImmutableList.builder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: createdColumnsBuilder/finalColumnsBuilder?

@@ -530,6 +543,29 @@ private RelationPlan getInsertPlan(
statisticsMetadata);
}

private RelationPlan createPlanWithCoercions(RelationPlan plan, Assignments.Builder assignments, List<ColumnMetadata> insertedColumns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would rather not pass builder but built assignemtns as arguemnt

ProjectNode projectNode = new ProjectNode(idAllocator.getNextId(), plan.getRoot(), assignments.build());
Scope scope = Scope.builder().withRelationType(RelationId.anonymous(), new RelationType(fields)).build();
plan = new RelationPlan(projectNode, scope, projectNode.getOutputSymbols(), Optional.empty());
return plan;
Copy link
Member

@losipiuk losipiuk Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline plan

@@ -92,6 +92,30 @@ public boolean isTypeOnlyCoercion(Type source, Type result)
return sameDecimalSubtype && sameScale && sourcePrecisionIsLessOrEqualToResultPrecision;
}

if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: commit message Enable coercion for Time types -> Enable coercion for temporal types

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the commit about "Enabling a coercion" or recognizing more coercions as "type-only"?

@@ -92,6 +92,30 @@ public boolean isTypeOnlyCoercion(Type source, Type result)
return sameDecimalSubtype && sameScale && sourcePrecisionIsLessOrEqualToResultPrecision;
}

if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) {
boolean sameTimestampWithTimeZoneType = sourceTimestampWithTimeZone.isShort() && targetTimestampWithTimeZone.isShort();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if both are long?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what about just using:

boolean sameTimestampType = sourceTimestamp.isShort() == targetTimestamp.isShort();

if (source instanceof TimestampType sourceTimestamp && result instanceof TimestampType targetTimestamp) {
boolean sameTimestampType = (sourceTimestamp.isShort() && targetTimestamp.isShort())
|| (!sourceTimestamp.isShort() && !targetTimestamp.isShort());
boolean sourcePrecisionIsLessOrEqualToResultPrecision = sourceTimestamp.getPrecision() <= targetTimestamp.getPrecision();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targetMorePercise?
precisionNotLost?

CatalogHandle catalogHandle = catalogMetadata.getCatalogHandle(session, table);
ConnectorMetadata metadata = catalogMetadata.getMetadataFor(session, catalogHandle);
ConnectorSession connectorSession = session.toConnectorSession(catalogHandle);
return metadata.getNewTableMetadata(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check that number of columns returned by metadata.getNewTableMetadata matches columns from arguments?

@losipiuk losipiuk requested a review from kasiafi September 12, 2022 10:19
@mdesmet mdesmet force-pushed the feature/ctas_table_type branch from 7e76ee9 to b7a2a61 Compare September 12, 2022 20:23
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend spitting "Enable coercion for Time types" to a separate PR.
I hope it's orthogonal, but please correct me if i am wrong

return sameDecimalSubtype && sameScale && noPrecisionLoss;
}

if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sourceTimestampWithTimeZone -> sourceType
targetTimestampWithTimeZone -> targetType

These long variables names do not add to readability

@@ -92,6 +92,30 @@ public boolean isTypeOnlyCoercion(Type source, Type result)
return sameDecimalSubtype && sameScale && sourcePrecisionIsLessOrEqualToResultPrecision;
}

if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the commit about "Enabling a coercion" or recognizing more coercions as "type-only"?

}

if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) {
boolean sameTimestampWithTimeZoneType = sourceTimestampWithTimeZone.isShort() == targetTimestampWithTimeZone.isShort();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sameTimestampWithTimeZoneType -> sameRepresentation

columns,
properties,
comment);
checkState(newTableMetadata.getColumns().size() == columns.size(), "Table and query column count doesn't match");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What query is the message talking about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this terminology from the INSERT vs table but maybe this is slightly flawed in the case of a CTAS.

So the query is the CTAS query, the table is the effective storage table.

Comment on lines 413 to 473
if (queryType.equals(tableType) || typeCoercion.isTypeOnlyCoercion(queryType, tableType)) {
expression = fieldMapping.toSymbolReference();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not OK to omit type-only coercions in the plan.
The PlanNodes are strictly typed and all types must match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same code is there for INSERT. isn't the point of coercion that the types match?

@@ -446,6 +446,14 @@ default Optional<ConnectorTableLayout> getNewTableLayout(ConnectorSession sessio
return Optional.empty();
}

/**
* Get the physical metadata for a new table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

physical -> effective

Also, a method name could be changed to reflect that.
Maybe getNewTableEffectiveMetadata

@@ -154,6 +154,11 @@ default void setTableProperties(ConnectorSession session, JdbcTableHandle handle
throw new TrinoException(NOT_SUPPORTED, "This connector does not support setting table properties");
}

default ConnectorTableMetadata getNewTableMetadata(ConnectorSession connectorSession, SchemaTableName schemaTableName, List<ColumnMetadata> columns, Map<String, Object> properties, Optional<String> comment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JdbcClient interface should not need to operate on ConnectorTableMetadata. It should be granular, on per type basis.

Actually, we'd want to change or replace existing operation: io.trino.plugin.jdbc.JdbcClient#toWriteMapping.
The JdbcClient would want to return replacement Type that would go into the effective new table metadata.

@mdesmet
Copy link
Contributor Author

mdesmet commented Sep 18, 2022

I'd recommend spitting "Enable coercion for Time types" to a separate PR. I hope it's orthogonal, but please correct me if i am wrong

It is totally independent. I will move that to a separate PR.

@mdesmet mdesmet force-pushed the feature/ctas_table_type branch from b7a2a61 to c1a653b Compare October 2, 2022 17:56
@@ -70,6 +87,11 @@ public WriteFunction getWriteFunction()
return writeFunction;
}

public WriteMapping withEffectiveType(Type effectiveType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you can set the effectiveType arbitrarily, but it must be consistent with what toColumnMapping will return upon read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot verify that, because until we create the table, we don't have the input to feed into the toColumnMapping.

however, this also uncovers a problem with my earlier suggestion (#13994 (comment))
i suggested to change toWriteMapping, so that it can optionally return "effective type"
This is OK for callers like io.trino.plugin.jdbc.DefaultJdbcMetadata#getNewTableEffectiveMetadata and maybe for io.trino.plugin.jdbc.BaseJdbcClient#getColumnDefinitionSql.
But it's not OK for JdbcPageSink::JdbcPageSink -- the JdbcPageSink is too late to apply any "effective type".

Now as I am thinking about this, let's not put that into toWriteMapping method, but let's add a new method (that will be invoked from DefaultJdbcMetadata#getNewTableEffectiveMetadata for each column separately)

@@ -910,7 +910,7 @@ protected Scope visitCreateTableAsSelect(CreateTableAsSelect node, Optional<Scop
}

// create target table metadata
ConnectorTableMetadata tableMetadata = new ConnectorTableMetadata(targetTable.asSchemaTableName(), columns.build(), properties, node.getComment());
ConnectorTableMetadata tableMetadata = metadata.getNewTableEffectiveMetadata(session, catalogName, targetTable, columns.build(), properties, node.getComment());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should analyze the old and new types pairwise and sanity-check if there exists the appropriate cast.

@@ -910,7 +910,7 @@ protected Scope visitCreateTableAsSelect(CreateTableAsSelect node, Optional<Scop
}

// create target table metadata
ConnectorTableMetadata tableMetadata = new ConnectorTableMetadata(targetTable.asSchemaTableName(), columns.build(), properties, node.getComment());
ConnectorTableMetadata tableMetadata = metadata.getNewTableEffectiveMetadata(session, catalogName, targetTable, columns.build(), properties, node.getComment());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we must decide on the level of flexibility: I think the minimum plan would be to support implicit coercions only, but I'm not sure if it covers the targeted cases. And if we go beyond that minimum, do we want to fail on string truncation, etc.?
When we have the decision, then we need to verify if the new type returned by the connector satisfies our criteria.

The method typesMatchForInsert() can serve as a reference, but please note that it reflects (with some limitations) what the SQL standard says about inserts. The CTAS case is different, and it is not covered by the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we must decide on the level of flexibility: I think the minimum plan would be to support implicit coercions only, but I'm not sure if it covers the targeted cases.

Today, eg PostgreSQL (and co.) connector accepts any timestamp(n) and coerces it to timestamp(6).
This all happens inside the connector (in every such connector), and is an unnecessary complexity, since the coercion logic is effectively spread out.

For n <= 6 an implicit coercion would do, but for n > 6 we need to allow non-implicit coercions.

/**
* Returns the table metadata with the effective Trino types specific for the connector
*/
ConnectorTableMetadata getNewTableEffectiveMetadata(Session session, String catalogName, QualifiedObjectName tableName, List<ColumnMetadata> columns, Map<String, Object> properties, Optional<String> comment);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's meant to be used (and is used) only for CTAS and not for regular CT.
The method name should reflect that.

cc @martint for any naming suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getConnectorTableMetadataForCTAS
getEffectiveConnectorTableMetadataForCTAS
getConnectorTableMetadataForCreateTableAsSelect
getCreateTableAsSelectConnectorTableMetadata

Out of those my preference would be getConnectorTableMetadataForCreateTableAsSelect.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of passing properties and comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or can we introduce a API which could take in a Type and provide a supported Type as Output ?

@@ -70,6 +87,11 @@ public WriteFunction getWriteFunction()
return writeFunction;
}

public WriteMapping withEffectiveType(Type effectiveType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot verify that, because until we create the table, we don't have the input to feed into the toColumnMapping.

however, this also uncovers a problem with my earlier suggestion (#13994 (comment))
i suggested to change toWriteMapping, so that it can optionally return "effective type"
This is OK for callers like io.trino.plugin.jdbc.DefaultJdbcMetadata#getNewTableEffectiveMetadata and maybe for io.trino.plugin.jdbc.BaseJdbcClient#getColumnDefinitionSql.
But it's not OK for JdbcPageSink::JdbcPageSink -- the JdbcPageSink is too late to apply any "effective type".

Now as I am thinking about this, let's not put that into toWriteMapping method, but let's add a new method (that will be invoked from DefaultJdbcMetadata#getNewTableEffectiveMetadata for each column separately)

@mdesmet mdesmet force-pushed the feature/ctas_table_type branch from c1a653b to 266ef0c Compare October 26, 2022 21:09
@mdesmet
Copy link
Contributor Author

mdesmet commented Oct 27, 2022

@kasiafi, @findepi : PTAL

Outside of the method name change I've addressed all other comments.

@mdesmet mdesmet requested a review from martint November 9, 2022 15:23
@mdesmet mdesmet requested a review from ebyhr January 2, 2023 13:35
@mdesmet
Copy link
Contributor Author

mdesmet commented Jan 2, 2023

@ebyhr: Can you have a look at this?

@ebyhr
Copy link
Member

ebyhr commented Jan 13, 2023

Sorry for my late response. Let me take a look when I have the time.

@mdesmet mdesmet force-pushed the feature/ctas_table_type branch from fc16a30 to 5984f4e Compare July 23, 2023 21:50
@mdesmet mdesmet force-pushed the feature/ctas_table_type branch from 5984f4e to 42dcd1e Compare July 27, 2023 07:36
Comment on lines 466 to 473
forEachPair(tableMetadata.getColumns().stream(), visibleFieldMappings.stream(), (column, fieldMapping) -> {
Expression expression;
Type tableType = column.getType();
Type queryType = symbolAllocator.getTypes().get(fieldMapping);
if (queryType.equals(tableType) || typeCoercion.isTypeOnlyCoercion(queryType, tableType)) {
expression = fieldMapping.toSymbolReference();
}
else {
expression = noTruncationCast(fieldMapping.toSymbolReference(), queryType, tableType);
}
Symbol output = symbolAllocator.newSymbol(column.getName(), column.getType());
assignments.put(output, expression);
finalColumnsBuilder.add(column);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this comment has been addressed.

@mdesmet mdesmet force-pushed the feature/ctas_table_type branch from 1b42df9 to 9559aac Compare September 3, 2023 18:27
@mdesmet
Copy link
Contributor Author

mdesmet commented Sep 3, 2023

@martint : As discussed offline, PTAL.

@findinpath
Copy link
Contributor

findinpath commented Sep 4, 2023

There are a few unrelated failures on the build on elasticsearch and kafka connectors.
Consider pushing an empty commit to rebuild.

@Praveen2112
Copy link
Member

@mdesmet Thanks for working on this. Can you update the release notes and we could merge this PR.

@mdesmet
Copy link
Contributor Author

mdesmet commented Sep 4, 2023

@Praveen2112 : I have updated the issue description. PTAL.

@Praveen2112 Praveen2112 merged commit d904a6a into trinodb:master Sep 4, 2023
@Praveen2112
Copy link
Member

Thanks for working on this !!

@github-actions github-actions bot added this to the 426 milestone Sep 4, 2023
@kokosing
Copy link
Member

kokosing commented Sep 4, 2023

Well done! Thank you all involved! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.