Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type mapping for Snowflake Connector #21012

Closed
wants to merge 1 commit into from
Closed

Conversation

lxynov
Copy link
Member

@lxynov lxynov commented Mar 11, 2024

Description

For #20977

Please reference snowflake.md diff for the exact type conversion behavior.

References for source-of-truths:

Additional context and related issues

  • TODO: Make it configurable whether SF session property JDBC_TREAT_DECIMAL_AS_INT is set to FALSE

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Snowflake Connector
* Improve type mapping for Snowflake Connector. ({issue}`20977`)

@cla-bot cla-bot bot added the cla-signed label Mar 11, 2024
@github-actions github-actions bot added the docs label Mar 11, 2024
@lxynov lxynov added the snowflake Snowflake connector label Mar 11, 2024
@lxynov lxynov requested a review from ebyhr March 11, 2024 20:09
@lxynov
Copy link
Member Author

lxynov commented Mar 21, 2024

@mosabua Thanks for the review! Addressed the comment.

@ebyhr Could you also help review the code part?

Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look like a great improvements .. just minor suggestion. For code reviews I pulled in Erik and Teng

docs/src/main/sphinx/connector/snowflake.md Show resolved Hide resolved
docs/src/main/sphinx/connector/snowflake.md Show resolved Hide resolved
ColumnMapping columnMap = standardColumnMappings.get(type);
if (columnMap != null) {
return Optional.of(columnMap);
switch (type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the case orders as PostgreSqlClient. Same for toWriteMapping method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. But would it be a better idea to let the order be same as what's in https://docs.snowflake.com/en/sql-reference/intro-summary-data-types?

Copy link
Member

@ebyhr ebyhr Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do. Orders in Trino is more important than Snowflake for consistency with other connectors.

Comment on lines +369 to +376
.addRoundTrip("TIMESTAMP_TZ(0)", "'2020-09-27 12:34:56 -08:00'::TIMESTAMP_TZ", createTimestampWithTimeZoneType(0), "TIMESTAMP '2020-09-27 12:34:56 -08:00'")
.addRoundTrip("TIMESTAMP_TZ(3)", "'2020-09-27 12:34:56.333 -08:00'::TIMESTAMP_TZ", createTimestampWithTimeZoneType(3), "TIMESTAMP '2020-09-27 12:34:56.333 -08:00'")
.addRoundTrip("TIMESTAMP_TZ(9)", "'2020-09-27 12:34:56.999999999 -08:00'::TIMESTAMP_TZ", createTimestampWithTimeZoneType(9), "TIMESTAMP '2020-09-27 12:34:56.999999999 -08:00'")
.execute(getQueryRunner(), session, snowflakeCreateAndInsert("tpch.test_timestamptz_sf_insert"));
SqlDataTypeTest.create()
.addRoundTrip("TIMESTAMP(0) WITH TIME ZONE", "TIMESTAMP '2020-09-27 12:34:56 -08:00'", createTimestampWithTimeZoneType(0), "TIMESTAMP '2020-09-27 12:34:56 -08:00'")
.addRoundTrip("TIMESTAMP(3) WITH TIME ZONE", "TIMESTAMP '2020-09-27 12:34:56.333 -08:00'", createTimestampWithTimeZoneType(3), "TIMESTAMP '2020-09-27 12:34:56.333 -08:00'")
.addRoundTrip("TIMESTAMP(9) WITH TIME ZONE", "TIMESTAMP '2020-09-27 12:34:56.999999999 -08:00'", createTimestampWithTimeZoneType(9), "TIMESTAMP '2020-09-27 12:34:56.999999999 -08:00'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more test cases, e.g. null, min, max, julian->gregorian calendar switch and etc.
You can refer to TestPostgreSqlTypeMapping class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I just read the existing test cases in TestSnowflakeTypeMapping::testDate. It's testing min/max values like 0001-01-01 and 99999-12-31. However, years beyond the range 1582~9999 are not recommended by Snowflake: https://docs.snowflake.com/en/sql-reference/data-types-datetime#date. Do we actually want to remove them? Also, I feel it's best to test Gregorian Calendar with leap days like 2024-02-29. I can actually create a separate PR to clean this up, altogether with testDate, testTimestamp, and testTimestampWithTimeZone. What do you think?

@ebyhr
Copy link
Member

ebyhr commented Mar 23, 2024

Could you rebase on master to resolve conflicts?

@mosabua
Copy link
Member

mosabua commented Mar 27, 2024

So sorry @lxynov but it looks like there is another rebase necessary since there is a conflict. I hope we can get this reviewed and merged before we get another conflict. Maybe @oneonestar @dprophet or @yuuteng can chime in if necesssary

@lxynov
Copy link
Member Author

lxynov commented Mar 27, 2024

@mosabua No worry at all. Happy to rebase it again. Conflicts should be minimal. Thanks for keeping an eye on this. @ebyhr Could you help look at this #21012 (comment)? If it makes sense, I'll go ahead rebasing this PR.

@ebyhr
Copy link
Member

ebyhr commented Apr 1, 2024

Improve type mapping for Snowflake Connector

The current commit and PR title is unclear. Please clarify what is improved.

@lxynov
Copy link
Member Author

lxynov commented Apr 1, 2024

The current commit and PR title is unclear. Please clarify what is improved.

Sure

  1. Fixed Add Snowflake connector #17909 (comment)
  2. Fixed Add Snowflake connector #17909 (comment)
  3. Implemented the unsupported_type_handling session property
  4. Removed unused tinyintColumnMapping(), smallintColumnMapping(), etc. from SnowflakeClient::toColumnMapping as they will never be called.
  5. The original implementation builds a map inside SnowflakeClient::toColumnMapping. So the map building will happen every time SnowflakeClient::toColumnMapping is called. The new implementation is more efficient as it inlines the branching/conditioning.
  6. Improved type mapping doc.

cc @ebyhr @mosabua

@ebyhr
Copy link
Member

ebyhr commented Apr 2, 2024

Thanks for your explanation. I'm not convinced to changing all numeric types to decimal. Please exclude the changes from this PR.

Also, please split into some commits so that we can review easily.

TODO: Make it configurable whether SF session property JDBC_TREAT_DECIMAL_AS_INT is set to FALSE

Why do we want to make it configurable?

@lxynov
Copy link
Member Author

lxynov commented Apr 2, 2024

Thanks for your explanation. I'm not convinced to changing all numeric types to decimal. Please exclude the changes from this PR.

It's not a design decision to "change all numeric types to decimal". It's just to make things correct. In Snowflake, fixed-point types INT, INTEGER , BIGINT, SMALLINT, TINYINT , BYTEINT are synonymous with NUMBER(38,0) (link). So even if you create a TINYINT column in Snowflake, it'd be NUMBER(38,0) under the hood, and users are able to insert values as large as 99999999999999999999999999999999999999. I gave an example here when the original implementation would fail.

Additionally, even if you create a TINYINT column in Snowflake, it would be read in as either a BIGINT type or a DECIMAL type, depending on the JDBC_TREAT_DECIMAL_AS_INT session property. So the method tinyintColumnMapping() won't be used in any situation. It must be removed.

Why do we want to make it configurable?

If the service admin is sure that all decimal columns don't have values larger than Long.MAX_VALUE, he/she can set JDBC_TREAT_DECIMAL_AS_INT to FALSE. In this case all decimal values will be read in as BIGINT, which is more efficient than being read in as DECIMAL.

Also, please split into some commits so that we can review easily.

IMOO I don't think we need to split it into multiple commits. Essentially, this PR is to improve the implementation of SnowflakeClient::toColumnMapping and SnowflakeClient::toColumnMapping::toWriteMapping. Honestly speaking, the previous implementation is kind of messy in the sense that 1. it's building a static map inside a frequently-called method, 2. it's not handling default JDBC Connector configs, 3. it has code that will never be called, etc. The whole purpose of this PR is to clean things up.

CC'ing @hashhar @findepi @martint if you have bandwidth to also take a look.

@hashhar
Copy link
Member

hashhar commented Apr 3, 2024

The whole purpose of this PR is to clean things up.

keeping cleanup separate from type mapping change would be very very helpful to review. otherwise it's hard to see which changes impact logic vs which ones don't.

I'll try and take a look anyway but unlikely to merge in current shape.

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Apr 25, 2024
@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Apr 26, 2024
@mosabua
Copy link
Member

mosabua commented Apr 26, 2024

Applied stale-ignore label .. we definitely want this improvement. Please rebase @lxynov and let us know if you need any help

@ebyhr
Copy link
Member

ebyhr commented May 20, 2024

Superseded by #21755

@ebyhr ebyhr closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed docs snowflake Snowflake connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

4 participants