Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct numeric type mapping of Snowflake Connector #21755

Merged
merged 2 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 157 additions & 24 deletions docs/src/main/sphinx/connector/snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,30 +49,163 @@ multiple instances of the Snowflake connector.

## Type mapping

Trino supports the following Snowflake data types:

| Snowflake Type | Trino Type |
| -------------- | -------------- |
| `boolean` | `boolean` |
| `tinyint` | `bigint` |
| `smallint` | `bigint` |
| `byteint` | `bigint` |
| `int` | `bigint` |
| `integer` | `bigint` |
| `bigint` | `bigint` |
| `float` | `real` |
| `real` | `real` |
| `double` | `double` |
| `decimal` | `decimal(P,S)` |
| `varchar(n)` | `varchar(n)` |
| `char(n)` | `varchar(n)` |
| `binary(n)` | `varbinary` |
| `varbinary` | `varbinary` |
| `date` | `date` |
| `time(n)` | `time(n)` |
| `timestampntz` | `timestamp` |

Complete list of [Snowflake data types](https://docs.snowflake.com/en/sql-reference/intro-summary-data-types.html).
Because Trino and Snowflake each support types that the other does not, this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @mosabua

connector {ref}`modifies some types <type-mapping-overview>` when reading or
writing data. Data types may not map the same way in both directions between
Trino and the data source. Refer to the following sections for type mapping in
each direction.

List of [Snowflake data types](https://docs.snowflake.com/en/sql-reference/intro-summary-data-types.html).

### Snowflake type to Trino type mapping

The connector maps Snowflake types to the corresponding Trino types following
this table:

:::{list-table} Snowflake type to Trino type mapping
:widths: 30, 30, 40
:header-rows: 1

* - Snowflake type
- Trino type
- Notes
* - `BOOLEAN`
- `BOOLEAN`
-
* - `INT`, `INTEGER`, `BIGINT`, `SMALLINT`, `TINYINT`, `BYTEINT`
- `DECIMAL(38,0)`
- Synonymous with `NUMBER(38,0)`. See Snowflake
[data types for fixed point numbers](https://docs.snowflake.com/en/sql-reference/data-types-numeric#data-types-for-fixed-point-numbers)
for more information.
* - `FLOAT`, `FLOAT4`, `FLOAT8`
- `DOUBLE`
- The names `FLOAT`, `FLOAT4`, and `FLOAT8` are for compatibility with other systems; Snowflake treats all three as
64-bit floating-point numbers. See Snowflake
[data types for floating point numbers](https://docs.snowflake.com/en/sql-reference/data-types-numeric#data-types-for-floating-point-numbers)
for more information.
* - `DOUBLE`, `DOUBLE PRECISION`, `REAL`
- `DOUBLE`
- Synonymous with `FLOAT`. See Snowflake
[data types for floating point numbers](https://docs.snowflake.com/en/sql-reference/data-types-numeric#data-types-for-floating-point-numbers)
for more information.
* - `NUMBER`
- `DECIMAL`
- Default precision and scale are (38,0).
* - `DECIMAL`, `NUMERIC`
- `DECIMAL`
- Synonymous with `NUMBER`. See Snowflake
[data types for fixed point numbers](https://docs.snowflake.com/en/sql-reference/data-types-numeric#data-types-for-fixed-point-numbers)
for more information.
* - `VARCHAR`
- `VARCHAR`
-
* - `CHAR`, `CHARACTER`
- `VARCHAR`
- Synonymous with `VARCHAR` except default length is `VARCHAR(1)`. See Snowflake
[String & Binary Data Types](https://docs.snowflake.com/en/sql-reference/data-types-text)
for more information.
* - `STRING`, `TEXT`
- `VARCHAR`
- Synonymous with `VARCHAR`. See Snowflake
[String & Binary Data Types](https://docs.snowflake.com/en/sql-reference/data-types-text)
for more information.
* - `BINARY`
- `VARBINARY`
-
* - `VARBINARY`
- `VARBINARY`
- Synonymous with `BINARY`. See Snowflake
[String & Binary Data Types](https://docs.snowflake.com/en/sql-reference/data-types-text)
for more information.
* - `DATE`
- `DATE`
-
* - `TIME`
- `TIME`
-
* - `TIMESTAMP_NTZ`
- `TIMESTAMP`
- TIMESTAMP with no time zone; time zone, if provided, is not stored. See Snowflake
[Date & Time Data Types](https://docs.snowflake.com/en/sql-reference/data-types-datetime)
for more information.
* - `DATETIME`
- `TIMESTAMP`
- Alias for `TIMESTAMP_NTZ`. See Snowflake
[Date & Time Data Types](https://docs.snowflake.com/en/sql-reference/data-types-datetime)
for more information.
* - `TIMESTAMP`
- `TIMESTAMP`
- Alias for one of the `TIMESTAMP` variations (`TIMESTAMP_NTZ` by default). This connector always sets `TIMESTAMP_NTZ` as the variant.
* - `TIMESTAMP_TZ`
- `TIMESTAMP WITH TIME ZONE`
- TIMESTAMP with time zone.
:::

No other types are supported.

### Trino type to Snowflake type mapping

The connector maps Trino types to the corresponding Snowflake types following
this table:

:::{list-table} Trino type to Snowflake type mapping
:widths: 30, 30, 40
:header-rows: 1

* - Trino type
- Snowflake type
- Notes
* - `BOOLEAN`
- `BOOLEAN`
-
* - `TINYINT`
- `NUMBER(3, 0)`
-
* - `SMALLINT`
- `NUMBER(5, 0)`
-
* - `INTEGER`
- `NUMBER(10, 0)`
-
* - `BIGINT`
- `NUMBER(19, 0)`
-
* - `REAL`
- `DOUBLE`
-
* - `DOUBLE`
- `DOUBLE`
-
* - `DECIMAL`
- `NUMBER`
-
* - `VARCHAR`
- `VARCHAR`
-
* - `CHAR`
- `VARCHAR`
-
* - `VARBINARY`
- `BINARY`
-
* - `VARBINARY`
- `VARBINARY`
-
* - `DATE`
- `DATE`
-
* - `TIME`
- `TIME`
-
* - `TIMESTAMP`
- `TIMESTAMP_NTZ`
-
* - `TIMESTAMP WITH TIME ZONE`
- `TIMESTAMP_TZ`
-
:::

No other types are supported.

```{include} jdbc-type-mapping.fragment
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,15 +113,11 @@
import static io.trino.plugin.jdbc.StandardColumnMappings.decimalColumnMapping;
import static io.trino.plugin.jdbc.StandardColumnMappings.doubleColumnMapping;
import static io.trino.plugin.jdbc.StandardColumnMappings.doubleWriteFunction;
import static io.trino.plugin.jdbc.StandardColumnMappings.integerColumnMapping;
import static io.trino.plugin.jdbc.StandardColumnMappings.integerWriteFunction;
import static io.trino.plugin.jdbc.StandardColumnMappings.longDecimalWriteFunction;
import static io.trino.plugin.jdbc.StandardColumnMappings.realColumnMapping;
import static io.trino.plugin.jdbc.StandardColumnMappings.realWriteFunction;
import static io.trino.plugin.jdbc.StandardColumnMappings.shortDecimalWriteFunction;
import static io.trino.plugin.jdbc.StandardColumnMappings.smallintColumnMapping;
import static io.trino.plugin.jdbc.StandardColumnMappings.smallintWriteFunction;
import static io.trino.plugin.jdbc.StandardColumnMappings.tinyintColumnMapping;
import static io.trino.plugin.jdbc.StandardColumnMappings.tinyintWriteFunction;
import static io.trino.plugin.jdbc.StandardColumnMappings.toTrinoTimestamp;
import static io.trino.plugin.jdbc.StandardColumnMappings.varbinaryColumnMapping;
Expand Down Expand Up @@ -220,19 +216,13 @@ public Optional<ColumnMapping> toColumnMapping(ConnectorSession session, Connect
switch (type) {
case Types.BOOLEAN:
return Optional.of(booleanColumnMapping());
case Types.TINYINT:
return Optional.of(tinyintColumnMapping());
case Types.SMALLINT:
return Optional.of(smallintColumnMapping());
case Types.INTEGER:
return Optional.of(integerColumnMapping());
// This is kept for synthetic columns generated by count() aggregation pushdown
case Types.BIGINT:
return Optional.of(bigintColumnMapping());
case Types.REAL:
return Optional.of(realColumnMapping());
case Types.FLOAT:
case Types.DOUBLE:
return Optional.of(doubleColumnMapping());

// In Snowflake all fixed-point numeric types are decimals. It always returns a DECIMAL type when JDBC_TREAT_DECIMAL_AS_INT is set to False.
case Types.NUMERIC:
case Types.DECIMAL: {
int precision = typeHandle.requiredColumnSize();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,7 @@ public ConnectionFactory getConnectionFactory(BaseJdbcConfig baseJdbcConfig, Sno
snowflakeConfig.getRole().ifPresent(role -> properties.setProperty("role", role));
snowflakeConfig.getWarehouse().ifPresent(warehouse -> properties.setProperty("warehouse", warehouse));

// Set the expected date/time formatting we expect for our plugin to parse
properties.setProperty("TIMESTAMP_OUTPUT_FORMAT", "YYYY-MM-DD\"T\"HH24:MI:SS.FF9TZH:TZM");
properties.setProperty("TIMESTAMP_NTZ_OUTPUT_FORMAT", "YYYY-MM-DD\"T\"HH24:MI:SS.FF9TZH:TZM");
properties.setProperty("TIMESTAMP_TZ_OUTPUT_FORMAT", "YYYY-MM-DD\"T\"HH24:MI:SS.FF9TZH:TZM");
properties.setProperty("TIMESTAMP_LTZ_OUTPUT_FORMAT", "YYYY-MM-DD\"T\"HH24:MI:SS.FF9TZH:TZM");
properties.setProperty("TIME_OUTPUT_FORMAT", "HH24:MI:SS.FF9");
setOutputProperties(properties);

// Support for Corporate proxies
if (snowflakeConfig.getHttpProxy().isPresent()) {
Expand Down Expand Up @@ -100,4 +95,16 @@ public ConnectionFactory getConnectionFactory(BaseJdbcConfig baseJdbcConfig, Sno
.setOpenTelemetry(openTelemetry)
.build();
}

protected static void setOutputProperties(Properties properties)
{
// Set the expected date/time formatting we expect for our plugin to parse
properties.setProperty("TIMESTAMP_OUTPUT_FORMAT", "YYYY-MM-DD\"T\"HH24:MI:SS.FF9TZH:TZM");
properties.setProperty("TIMESTAMP_NTZ_OUTPUT_FORMAT", "YYYY-MM-DD\"T\"HH24:MI:SS.FF9TZH:TZM");
properties.setProperty("TIMESTAMP_TZ_OUTPUT_FORMAT", "YYYY-MM-DD\"T\"HH24:MI:SS.FF9TZH:TZM");
properties.setProperty("TIMESTAMP_LTZ_OUTPUT_FORMAT", "YYYY-MM-DD\"T\"HH24:MI:SS.FF9TZH:TZM");
properties.setProperty("TIME_OUTPUT_FORMAT", "HH24:MI:SS.FF9");
// Don't treat decimals as bigints as they may overflow
properties.setProperty("JDBC_TREAT_DECIMAL_AS_INT", "FALSE");
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@

import com.google.common.collect.ImmutableMap;
import io.trino.plugin.jdbc.BaseJdbcConnectorTest;
import io.trino.sql.planner.plan.AggregationNode;
import io.trino.sql.planner.plan.ProjectNode;
import io.trino.testing.MaterializedResult;
import io.trino.testing.QueryRunner;
import io.trino.testing.TestingConnectorBehavior;
Expand Down Expand Up @@ -130,16 +132,16 @@ protected boolean isColumnNameRejected(Exception exception, String columnName, b
@Override
protected MaterializedResult getDescribeOrdersResult()
{
// Override this test because the type of row "shippriority" should be bigint rather than integer for snowflake case
lxynov marked this conversation as resolved.
Show resolved Hide resolved
// Override this test because the type of columns "orderkey", "custkey" and "shippriority" should be decimal rather than integer for snowflake case
return resultBuilder(getSession(), VARCHAR, VARCHAR, VARCHAR, VARCHAR)
.row("orderkey", "bigint", "", "")
.row("custkey", "bigint", "", "")
.row("orderkey", "decimal(19,0)", "", "")
.row("custkey", "decimal(19,0)", "", "")
.row("orderstatus", "varchar(1)", "", "")
.row("totalprice", "double", "", "")
.row("orderdate", "date", "", "")
.row("orderpriority", "varchar(15)", "", "")
.row("clerk", "varchar(15)", "", "")
.row("shippriority", "bigint", "", "")
.row("shippriority", "decimal(10,0)", "", "")
.row("comment", "varchar(79)", "", "")
.build();
}
Expand All @@ -164,17 +166,17 @@ public void testViews()
@Override
public void testShowCreateTable()
{
// Override this test because the type of row "shippriority" should be bigint rather than integer for snowflake case
// Override this test because the type of columns "orderkey", "custkey" and "shippriority" should be decimal rather than integer for snowflake case
assertThat(computeActual("SHOW CREATE TABLE orders").getOnlyValue())
.isEqualTo("CREATE TABLE snowflake.tpch.orders (\n" +
" orderkey bigint,\n" +
" custkey bigint,\n" +
" orderkey decimal(19, 0),\n" +
" custkey decimal(19, 0),\n" +
" orderstatus varchar(1),\n" +
" totalprice double,\n" +
" orderdate date,\n" +
" orderpriority varchar(15),\n" +
" clerk varchar(15),\n" +
" shippriority bigint,\n" +
" shippriority decimal(10, 0),\n" +
" comment varchar(79)\n" +
")");
}
Expand Down Expand Up @@ -366,7 +368,7 @@ public void testInformationSchemaFiltering()
"SELECT table_name FROM information_schema.tables WHERE table_schema = 'tpch' AND table_name = 'orders' LIMIT 1",
"SELECT 'orders' table_name");
assertQuery(
"SELECT table_name FROM information_schema.columns WHERE data_type = 'bigint' AND table_schema = 'tpch' AND table_name = 'nation' and column_name = 'nationkey' LIMIT 1",
"SELECT table_name FROM information_schema.columns WHERE data_type = 'decimal(19,0)' AND table_schema = 'tpch' AND table_name = 'nation' and column_name = 'nationkey' LIMIT 1",
"SELECT 'nation' table_name");
}

Expand All @@ -377,4 +379,28 @@ public void testSelectInformationSchemaColumns()
{
// TODO https://github.com/trinodb/trino/issues/21157 Enable this test after fixing the timeout issue
}

@Test
@Override // Override because for approx_set(nationkey) a ProjectNode is present above the TableScanNode. It's used to project decimals to doubles.
public void testAggregationWithUnsupportedResultType()
{
// TODO array_agg returns array, so it could be supported
assertThat(query("SELECT array_agg(nationkey) FROM nation"))
.skipResultsCorrectnessCheckForPushdown() // array_agg doesn't have a deterministic order of elements in result array
.isNotFullyPushedDown(AggregationNode.class);
// histogram returns map, which is not supported
assertThat(query("SELECT histogram(regionkey) FROM nation")).isNotFullyPushedDown(AggregationNode.class);
// multimap_agg returns multimap, which is not supported
assertThat(query("SELECT multimap_agg(regionkey, nationkey) FROM nation"))
.skipResultsCorrectnessCheckForPushdown() // multimap_agg doesn't have a deterministic order of values for a key
.isNotFullyPushedDown(AggregationNode.class);
// approx_set returns HyperLogLog, which is not supported
assertThat(query("SELECT approx_set(nationkey) FROM nation")).isNotFullyPushedDown(AggregationNode.class, ProjectNode.class);
}

@Override // Override because integers are represented as decimals in Snowflake Connector.
protected String sumDistinctAggregationPushdownExpectedResult()
{
return "VALUES (BIGINT '4', DECIMAL '8')";
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
import static io.trino.plugin.jdbc.UnsupportedTypeHandling.CONVERT_TO_VARCHAR;
import static io.trino.plugin.jdbc.UnsupportedTypeHandling.IGNORE;
import static io.trino.plugin.snowflake.SnowflakeQueryRunner.createSnowflakeQueryRunner;
import static io.trino.spi.type.BigintType.BIGINT;
import static io.trino.spi.type.BooleanType.BOOLEAN;
import static io.trino.spi.type.DateType.DATE;
import static io.trino.spi.type.DecimalType.createDecimalType;
Expand Down Expand Up @@ -112,21 +111,23 @@ public void testInteger()
private void testInteger(String inputType)
{
SqlDataTypeTest.create()
.addRoundTrip(inputType, "-9223372036854775808", BIGINT, "-9223372036854775808")
.addRoundTrip(inputType, "9223372036854775807", BIGINT, "9223372036854775807")
.addRoundTrip(inputType, "0", BIGINT, "CAST(0 AS BIGINT)")
.addRoundTrip(inputType, "NULL", BIGINT, "CAST(NULL AS BIGINT)")
.addRoundTrip(inputType, "'-9223372036854775808'", createDecimalType(38, 0), "CAST('-9223372036854775808' AS decimal(38, 0))")
.addRoundTrip(inputType, "'9223372036854775807'", createDecimalType(38, 0), "CAST('9223372036854775807' AS decimal(38, 0))")
.addRoundTrip(inputType, "'-99999999999999999999999999999999999999'", createDecimalType(38, 0), "CAST('-99999999999999999999999999999999999999' AS decimal(38, 0))")
.addRoundTrip(inputType, "'99999999999999999999999999999999999999'", createDecimalType(38, 0), "CAST('99999999999999999999999999999999999999' AS decimal(38, 0))")
.addRoundTrip(inputType, "0", createDecimalType(38, 0), "CAST(0 AS decimal(38, 0))")
.addRoundTrip(inputType, "NULL", createDecimalType(38, 0), "CAST(NULL AS decimal(38, 0))")
.execute(getQueryRunner(), snowflakeCreateAndInsert("tpch.integer"));
}

@Test
public void testDecimal()
{
SqlDataTypeTest.create()
.addRoundTrip("decimal(3, 0)", "NULL", BIGINT, "CAST(NULL AS BIGINT)")
.addRoundTrip("decimal(3, 0)", "CAST('193' AS decimal(3, 0))", BIGINT, "CAST('193' AS BIGINT)")
.addRoundTrip("decimal(3, 0)", "CAST('19' AS decimal(3, 0))", BIGINT, "CAST('19' AS BIGINT)")
.addRoundTrip("decimal(3, 0)", "CAST('-193' AS decimal(3, 0))", BIGINT, "CAST('-193' AS BIGINT)")
.addRoundTrip("decimal(3, 0)", "NULL", createDecimalType(3, 0), "CAST(NULL AS decimal(3, 0))")
.addRoundTrip("decimal(3, 0)", "CAST('193' AS decimal(3, 0))", createDecimalType(3, 0), "CAST('193' AS decimal(3, 0))")
.addRoundTrip("decimal(3, 0)", "CAST('19' AS decimal(3, 0))", createDecimalType(3, 0), "CAST('19' AS decimal(3, 0))")
.addRoundTrip("decimal(3, 0)", "CAST('-193' AS decimal(3, 0))", createDecimalType(3, 0), "CAST('-193' AS decimal(3, 0))")
.addRoundTrip("decimal(3, 1)", "CAST('10.0' AS decimal(3, 1))", createDecimalType(3, 1), "CAST('10.0' AS decimal(3, 1))")
.addRoundTrip("decimal(3, 1)", "CAST('10.1' AS decimal(3, 1))", createDecimalType(3, 1), "CAST('10.1' AS decimal(3, 1))")
.addRoundTrip("decimal(3, 1)", "CAST('-10.1' AS decimal(3, 1))", createDecimalType(3, 1), "CAST('-10.1' AS decimal(3, 1))")
Expand All @@ -138,7 +139,8 @@ public void testDecimal()
.addRoundTrip("decimal(24, 4)", "CAST('12345678901234567890.31' AS decimal(24, 4))", createDecimalType(24, 4), "CAST('12345678901234567890.31' AS decimal(24, 4))")
.addRoundTrip("decimal(30, 5)", "CAST('3141592653589793238462643.38327' AS decimal(30, 5))", createDecimalType(30, 5), "CAST('3141592653589793238462643.38327' AS decimal(30, 5))")
.addRoundTrip("decimal(30, 5)", "CAST('-3141592653589793238462643.38327' AS decimal(30, 5))", createDecimalType(30, 5), "CAST('-3141592653589793238462643.38327' AS decimal(30, 5))")
.addRoundTrip("decimal(38, 0)", "CAST(NULL AS decimal(38, 0))", BIGINT, "CAST(NULL AS BIGINT)")
.addRoundTrip("decimal(38, 0)", "CAST(NULL AS decimal(38, 0))", createDecimalType(38, 0), "CAST(NULL AS decimal(38, 0))")
.addRoundTrip("decimal(38, 0)", "CAST('99999999999999999999999999999999999999' AS decimal(38, 0))", createDecimalType(38, 0), "CAST('99999999999999999999999999999999999999' AS decimal(38, 0))")
.execute(getQueryRunner(), snowflakeCreateAndInsert("tpch.test_decimal"))
.execute(getQueryRunner(), trinoCreateAsSelect("test_decimal"))
.execute(getQueryRunner(), trinoCreateAndInsert("test_decimal"));
Expand Down
Loading