-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support automatic type coercion for table creation #13994
Conversation
@losipiuk See #13981 (comment) - I guess that's what you are thinking of. |
I meant something older - but this is a good summary. Thank you :) |
|
||
String catalogName = destination.getCatalogName(); | ||
CatalogHandle catalogHandle = metadata.getCatalogHandle(session, catalogName) | ||
.orElseThrow(() -> semanticException(CATALOG_NOT_FOUND, query, "Destination catalog '%s' does not exist", catalogName)); | ||
|
||
Assignments.Builder assignments = Assignments.builder(); | ||
ImmutableList.Builder<ColumnMetadata> insertedColumnsBuilder = ImmutableList.builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: createdColumnsBuilder/finalColumnsBuilder?
@@ -530,6 +543,29 @@ private RelationPlan getInsertPlan( | |||
statisticsMetadata); | |||
} | |||
|
|||
private RelationPlan createPlanWithCoercions(RelationPlan plan, Assignments.Builder assignments, List<ColumnMetadata> insertedColumns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would rather not pass builder but built assignemtns as arguemnt
core/trino-main/src/main/java/io/trino/sql/planner/LogicalPlanner.java
Outdated
Show resolved
Hide resolved
ProjectNode projectNode = new ProjectNode(idAllocator.getNextId(), plan.getRoot(), assignments.build()); | ||
Scope scope = Scope.builder().withRelationType(RelationId.anonymous(), new RelationType(fields)).build(); | ||
plan = new RelationPlan(projectNode, scope, projectNode.getOutputSymbols(), Optional.empty()); | ||
return plan; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline plan
@@ -92,6 +92,30 @@ public boolean isTypeOnlyCoercion(Type source, Type result) | |||
return sameDecimalSubtype && sameScale && sourcePrecisionIsLessOrEqualToResultPrecision; | |||
} | |||
|
|||
if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: commit message Enable coercion for Time types
-> Enable coercion for temporal types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the commit about "Enabling a coercion" or recognizing more coercions as "type-only"?
@@ -92,6 +92,30 @@ public boolean isTypeOnlyCoercion(Type source, Type result) | |||
return sameDecimalSubtype && sameScale && sourcePrecisionIsLessOrEqualToResultPrecision; | |||
} | |||
|
|||
if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) { | |||
boolean sameTimestampWithTimeZoneType = sourceTimestampWithTimeZone.isShort() && targetTimestampWithTimeZone.isShort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if both are long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also what about just using:
boolean sameTimestampType = sourceTimestamp.isShort() == targetTimestamp.isShort();
if (source instanceof TimestampType sourceTimestamp && result instanceof TimestampType targetTimestamp) { | ||
boolean sameTimestampType = (sourceTimestamp.isShort() && targetTimestamp.isShort()) | ||
|| (!sourceTimestamp.isShort() && !targetTimestamp.isShort()); | ||
boolean sourcePrecisionIsLessOrEqualToResultPrecision = sourceTimestamp.getPrecision() <= targetTimestamp.getPrecision(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
targetMorePercise
?
precisionNotLost
?
CatalogHandle catalogHandle = catalogMetadata.getCatalogHandle(session, table); | ||
ConnectorMetadata metadata = catalogMetadata.getMetadataFor(session, catalogHandle); | ||
ConnectorSession connectorSession = session.toConnectorSession(catalogHandle); | ||
return metadata.getNewTableMetadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check that number of columns returned by metadata.getNewTableMetadata
matches columns
from arguments?
7e76ee9
to
b7a2a61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend spitting "Enable coercion for Time types" to a separate PR.
I hope it's orthogonal, but please correct me if i am wrong
return sameDecimalSubtype && sameScale && noPrecisionLoss; | ||
} | ||
|
||
if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourceTimestampWithTimeZone -> sourceType
targetTimestampWithTimeZone -> targetType
These long variables names do not add to readability
@@ -92,6 +92,30 @@ public boolean isTypeOnlyCoercion(Type source, Type result) | |||
return sameDecimalSubtype && sameScale && sourcePrecisionIsLessOrEqualToResultPrecision; | |||
} | |||
|
|||
if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the commit about "Enabling a coercion" or recognizing more coercions as "type-only"?
} | ||
|
||
if (source instanceof TimestampWithTimeZoneType sourceTimestampWithTimeZone && result instanceof TimestampWithTimeZoneType targetTimestampWithTimeZone) { | ||
boolean sameTimestampWithTimeZoneType = sourceTimestampWithTimeZone.isShort() == targetTimestampWithTimeZone.isShort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sameTimestampWithTimeZoneType -> sameRepresentation
columns, | ||
properties, | ||
comment); | ||
checkState(newTableMetadata.getColumns().size() == columns.size(), "Table and query column count doesn't match"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What query is the message talking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this terminology from the INSERT
vs table but maybe this is slightly flawed in the case of a CTAS.
So the query is the CTAS query, the table is the effective storage table.
if (queryType.equals(tableType) || typeCoercion.isTypeOnlyCoercion(queryType, tableType)) { | ||
expression = fieldMapping.toSymbolReference(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not OK to omit type-only coercions in the plan.
The PlanNodes are strictly typed and all types must match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same code is there for INSERT
. isn't the point of coercion that the types match?
@@ -446,6 +446,14 @@ default Optional<ConnectorTableLayout> getNewTableLayout(ConnectorSession sessio | |||
return Optional.empty(); | |||
} | |||
|
|||
/** | |||
* Get the physical metadata for a new table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
physical -> effective
Also, a method name could be changed to reflect that.
Maybe getNewTableEffectiveMetadata
@@ -154,6 +154,11 @@ default void setTableProperties(ConnectorSession session, JdbcTableHandle handle | |||
throw new TrinoException(NOT_SUPPORTED, "This connector does not support setting table properties"); | |||
} | |||
|
|||
default ConnectorTableMetadata getNewTableMetadata(ConnectorSession connectorSession, SchemaTableName schemaTableName, List<ColumnMetadata> columns, Map<String, Object> properties, Optional<String> comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JdbcClient interface should not need to operate on ConnectorTableMetadata
. It should be granular, on per type basis.
Actually, we'd want to change or replace existing operation: io.trino.plugin.jdbc.JdbcClient#toWriteMapping
.
The JdbcClient would want to return replacement Type
that would go into the effective new table metadata.
It is totally independent. I will move that to a separate PR. |
b7a2a61
to
c1a653b
Compare
@@ -70,6 +87,11 @@ public WriteFunction getWriteFunction() | |||
return writeFunction; | |||
} | |||
|
|||
public WriteMapping withEffectiveType(Type effectiveType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you can set the effectiveType arbitrarily, but it must be consistent with what toColumnMapping
will return upon read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot verify that, because until we create the table, we don't have the input to feed into the toColumnMapping
.
however, this also uncovers a problem with my earlier suggestion (#13994 (comment))
i suggested to change toWriteMapping
, so that it can optionally return "effective type"
This is OK for callers like io.trino.plugin.jdbc.DefaultJdbcMetadata#getNewTableEffectiveMetadata
and maybe for io.trino.plugin.jdbc.BaseJdbcClient#getColumnDefinitionSql
.
But it's not OK for JdbcPageSink::JdbcPageSink
-- the JdbcPageSink
is too late to apply any "effective type".
Now as I am thinking about this, let's not put that into toWriteMapping
method, but let's add a new method (that will be invoked from DefaultJdbcMetadata#getNewTableEffectiveMetadata
for each column separately)
@@ -910,7 +910,7 @@ protected Scope visitCreateTableAsSelect(CreateTableAsSelect node, Optional<Scop | |||
} | |||
|
|||
// create target table metadata | |||
ConnectorTableMetadata tableMetadata = new ConnectorTableMetadata(targetTable.asSchemaTableName(), columns.build(), properties, node.getComment()); | |||
ConnectorTableMetadata tableMetadata = metadata.getNewTableEffectiveMetadata(session, catalogName, targetTable, columns.build(), properties, node.getComment()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should analyze the old and new types pairwise and sanity-check if there exists the appropriate cast.
@@ -910,7 +910,7 @@ protected Scope visitCreateTableAsSelect(CreateTableAsSelect node, Optional<Scop | |||
} | |||
|
|||
// create target table metadata | |||
ConnectorTableMetadata tableMetadata = new ConnectorTableMetadata(targetTable.asSchemaTableName(), columns.build(), properties, node.getComment()); | |||
ConnectorTableMetadata tableMetadata = metadata.getNewTableEffectiveMetadata(session, catalogName, targetTable, columns.build(), properties, node.getComment()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we must decide on the level of flexibility: I think the minimum plan would be to support implicit coercions only, but I'm not sure if it covers the targeted cases. And if we go beyond that minimum, do we want to fail on string truncation, etc.?
When we have the decision, then we need to verify if the new type returned by the connector satisfies our criteria.
The method typesMatchForInsert()
can serve as a reference, but please note that it reflects (with some limitations) what the SQL standard says about inserts. The CTAS case is different, and it is not covered by the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we must decide on the level of flexibility: I think the minimum plan would be to support implicit coercions only, but I'm not sure if it covers the targeted cases.
Today, eg PostgreSQL (and co.) connector accepts any timestamp(n)
and coerces it to timestamp(6)
.
This all happens inside the connector (in every such connector), and is an unnecessary complexity, since the coercion logic is effectively spread out.
For n <= 6
an implicit coercion would do, but for n > 6
we need to allow non-implicit coercions.
/** | ||
* Returns the table metadata with the effective Trino types specific for the connector | ||
*/ | ||
ConnectorTableMetadata getNewTableEffectiveMetadata(Session session, String catalogName, QualifiedObjectName tableName, List<ColumnMetadata> columns, Map<String, Object> properties, Optional<String> comment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's meant to be used (and is used) only for CTAS and not for regular CT.
The method name should reflect that.
cc @martint for any naming suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getConnectorTableMetadataForCTAS
getEffectiveConnectorTableMetadataForCTAS
getConnectorTableMetadataForCreateTableAsSelect
getCreateTableAsSelectConnectorTableMetadata
Out of those my preference would be getConnectorTableMetadataForCreateTableAsSelect
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of passing properties and comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or can we introduce a API which could take in a Type
and provide a supported Type
as Output ?
@@ -70,6 +87,11 @@ public WriteFunction getWriteFunction() | |||
return writeFunction; | |||
} | |||
|
|||
public WriteMapping withEffectiveType(Type effectiveType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot verify that, because until we create the table, we don't have the input to feed into the toColumnMapping
.
however, this also uncovers a problem with my earlier suggestion (#13994 (comment))
i suggested to change toWriteMapping
, so that it can optionally return "effective type"
This is OK for callers like io.trino.plugin.jdbc.DefaultJdbcMetadata#getNewTableEffectiveMetadata
and maybe for io.trino.plugin.jdbc.BaseJdbcClient#getColumnDefinitionSql
.
But it's not OK for JdbcPageSink::JdbcPageSink
-- the JdbcPageSink
is too late to apply any "effective type".
Now as I am thinking about this, let's not put that into toWriteMapping
method, but let's add a new method (that will be invoked from DefaultJdbcMetadata#getNewTableEffectiveMetadata
for each column separately)
c1a653b
to
266ef0c
Compare
@ebyhr: Can you have a look at this? |
Sorry for my late response. Let me take a look when I have the time. |
fc16a30
to
5984f4e
Compare
5984f4e
to
42dcd1e
Compare
...n/trino-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlConnectorTest.java
Outdated
Show resolved
Hide resolved
...n/trino-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlConnectorTest.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/CreateTableTask.java
Outdated
Show resolved
Hide resolved
...n/trino-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlConnectorTest.java
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/execution/TestCreateTableColumnTypeCoercion.java
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/execution/TestCreateTableTask.java
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/execution/TestCreateTableTask.java
Show resolved
Hide resolved
42dcd1e
to
e20e6d7
Compare
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/metadata/MetadataManager.java
Outdated
Show resolved
Hide resolved
forEachPair(tableMetadata.getColumns().stream(), visibleFieldMappings.stream(), (column, fieldMapping) -> { | ||
Expression expression; | ||
Type tableType = column.getType(); | ||
Type queryType = symbolAllocator.getTypes().get(fieldMapping); | ||
if (queryType.equals(tableType) || typeCoercion.isTypeOnlyCoercion(queryType, tableType)) { | ||
expression = fieldMapping.toSymbolReference(); | ||
} | ||
else { | ||
expression = noTruncationCast(fieldMapping.toSymbolReference(), queryType, tableType); | ||
} | ||
Symbol output = symbolAllocator.newSymbol(column.getName(), column.getType()); | ||
assignments.put(output, expression); | ||
finalColumnsBuilder.add(column); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this comment has been addressed.
e20e6d7
to
1b42df9
Compare
...n/trino-postgresql/src/test/java/io/trino/plugin/postgresql/TestPostgreSqlConnectorTest.java
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
1b42df9
to
9559aac
Compare
@martint : As discussed offline, PTAL. |
There are a few unrelated failures on the build on elasticsearch and kafka connectors. |
@mdesmet Thanks for working on this. Can you update the release notes and we could merge this PR. |
@Praveen2112 : I have updated the issue description. PTAL. |
Thanks for working on this !! |
Well done! Thank you all involved! 🎉 |
Allow connectors to specify the actual Trino types that will be persisted and let the engine applies coercions as necessary.
Description
The behaviour around type coercion and casts in a CREATE TABLE is different from the CTAS. For example: the user created a table with TIMESTAMP(6). On insert the table type is compared to the query type and casts or coercions from the same type are applied upon the supplied values (resulting in rounding if necessary)
In this PR similar logic is applied upon CTAS statements through distinguishing between the target table type and the query type. By default the query and table type are the same, however a connector can specify overrides through a Connector Metadata method.
Given this information coercions and casts can be performed by the engine instead of the connector itself, resulting in simpler connectors.
Improvement
Change to core query engine and SPI
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: