Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix serialization for timestamps with precision higher than 3 for MySQL #6893

Merged
merged 2 commits into from
Feb 18, 2021

Conversation

hashhar
Copy link
Member

@hashhar hashhar commented Feb 15, 2021

Fixes #6852.

@cla-bot cla-bot bot added the cla-signed label Feb 15, 2021
@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from 1371a8c to ebce4c3 Compare February 15, 2021 09:11
@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from ebce4c3 to 3cb91be Compare February 16, 2021 07:47
@hashhar
Copy link
Member Author

hashhar commented Feb 16, 2021

I'll need to disable some test-cases because we currently use SqlTimestamp while writing which means that timestamps lying within a gap do not round-trip.

(resultSet, columnIndex) -> {
Timestamp timestamp = resultSet.getTimestamp(columnIndex);
Instant instant = timestamp.toLocalDateTime().atZone(UTC).toInstant();
return toPrestoTimestamp(timestampType, LocalDateTime.ofEpochSecond(instant.getEpochSecond(), Math.toIntExact(round(instant.getNano(), 9 - timestampType.getPrecision())), UTC));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default read function is not correct -- it should either round OR fail, but should not return incorrect data silently as it does today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it returns incorrect data - a VerifyException is thrown if the value and the precision don't match.

The issue here was that the value returned from MySQL has precision != 3 and hence the verify fails. It's our fault as callers of the code to not provide the correct TimestampType (we pass TIMESTAMP_MILLIS always).

Should be addressed as part of #6910.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it returns incorrect data - a VerifyException is thrown if the value and the precision don't match.

Right. That's a new addition (#6654), and so #6852 reported a different stacktrace than one we would get today.

Still, we could add an opt-in mode where the value is rounded (can be eg boolean round)
This should be used by all connectors which use TIMESTAMP_MILLIS (not only MySQL, but also the legacy default mappings (#6607) and maybe some others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we need to propagate the boolean through legacyToPrestoType -> timestampColumnMapping -> toPrestoTimestamp.

I think it's more worthwhile to add the boolean (or a method with more meaningful name) to toPrestoTimestamp itself and add explicit column mapping for timestamps to connectors which are using legacyToPrestoType.

I would like to do this in a separate PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we need to propagate the boolean through legacyToPrestoType -> timestampColumnMapping -> toPrestoTimestamp.

legacyToPrestoType should not take this boolean. This is the place where we make decision to map to TS_MILLIS.
But yes, you'd need to pass a boolean to timestampColumnMappingUsingSqlTimestamp and on.

Comment on lines 590 to 592
tests.execute(getQueryRunner(), session, trinoCreateAsSelect(session, "test_timestamp"));
tests.execute(getQueryRunner(), session, trinoCreateAsSelect(getSession(), "test_timestamp"));
tests.execute(getQueryRunner(), session, trinoCreateAndInsert(session, "test_timestamp"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add atest when data is written with MySQL and read with Trino -- the existing tests should not exercise the read path's rounding, as the data should be stored pre-rounded on the Trino side.

@findepi
Copy link
Member

findepi commented Feb 16, 2021

I'll need to disable some test-cases because we currently use SqlTimestamp while writing which means that timestamps lying within a gap do not round-trip.

Please do not disable them completely. just use a different input and expected output literals, with a clear TODO comment.
This will ensure we update the test when we fix underlying problem, while the comment ensures no-one will get confused about that.

cc @nineinchnick

@hashhar
Copy link
Member Author

hashhar commented Feb 16, 2021

I'll need to disable some test-cases because we currently use SqlTimestamp while writing which means that timestamps lying within a gap do not round-trip.

Please do not disable them completely. just use a different input and expected output literals, with a clear TODO comment.

Sorry for the misleading comment, there are no tests today - I meant disabling the ones I was trying to add. Instead I'll add the full set of tests (and fixes) as part of #6910.

@findepi
Copy link
Member

findepi commented Feb 16, 2021

Sorry for the misleading comment, there are no tests today - I meant disabling the ones I was trying to add.

That's exactly what i understood.

@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from 3cb91be to aa8b651 Compare February 16, 2021 14:01
@hashhar hashhar marked this pull request as ready for review February 16, 2021 14:07
@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from aa8b651 to 575186f Compare February 16, 2021 14:08
@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch 2 times, most recently from c99c8ff to 6937ac4 Compare February 16, 2021 16:01
Copy link
Member

@jirassimok jirassimok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. My most important comments are the last two.

// max supported precision
.addRoundTrip("TIMESTAMP '2020-09-27 12:34:56.123456'", "TIMESTAMP '2020-09-27 12:34:56.123'")

// round down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests with rounding, I think it's better to put the actual and expected values on their own lines, so it's easier to see what is expected to change.

                 .addRoundTrip(
                         "TIMESTAMP '1970-01-01 00:00:00.12345671'",
                         "TIMESTAMP '1970-01-01 01:00:00.123'")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also works for addRoundTrip(String, String, Type, String), even if the type makes it look a little less nice.

.addRoundTrip(
        "datetime(6)",
        "TIMESTAMP '1970-01-01 00:00:01.1234569'",
        TIMESTAMP_MILLIS,
        "TIMESTAMP '1970-01-01 01:00:01.123'")

@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch 2 times, most recently from 94df59b to 94a2100 Compare February 17, 2021 07:13
@hashhar
Copy link
Member Author

hashhar commented Feb 17, 2021

AC @findepi @jirassimok

Thanks @jirassimok for catching the rounding/overflow bug.

@hashhar
Copy link
Member Author

hashhar commented Feb 17, 2021

CI hit #5758.

LocalDateTime roundedTimestamp = timestamp
.withNano(roundedNanos % toIntExact(NANOSECONDS_PER_SECOND))
.plusSeconds(roundedNanos / NANOSECONDS_PER_SECOND);
return toPrestoTimestamp(timestampType, roundedTimestamp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for rounding applies to MemSQL, Druid, Redshift connectors as well -- i.e. caller of

return Optional.of(timestampColumnMappingUsingSqlTimestamp(TIMESTAMP_MILLIS));
who do not handle timestamps on their own.

That flow used to round until be2b97a, but it went unnoticed until we added more verifications in other places.

MySQL is not a special case here, and we have other connectors suffering -- those who're not smart enough to handle their mapping by themsleves. We should fix the mapping returned from io.trino.plugin.jdbc.StandardColumnMappings#jdbcTypeToPrestoType by rounding to millis. This should be an opt-in, ie it must not be applied to timestampColumnMappingUsingSqlTimestamp or timestampColumnMapping when someone calls them directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continued @ #6893 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll change the mapping returned from legacyToPrestoType while leaving the type-level mappings (e.g. timestampColumnMapping as it is.

Copy link
Member Author

@hashhar hashhar Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With rounding enabled all verifyPredicate calls fail for values where we round during reads - most likely because to predicate generated in verifyPredicate is being pushed down.

I'll need to disable pushdown for all connectors using the legacyToPrestoType @findepi.

@hashhar
Copy link
Member Author

hashhar commented Feb 17, 2021

@findepi AC but notice that pushdown is now disabled for anyone using the rounding timestamp mapping (because rounding happens on our end and hence any pushed-down operations will use wrong value).

@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from d90cf88 to bd322d1 Compare February 17, 2021 13:45
}

@DataProvider
public Object[][] sessionZonesDataProvider()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this return ZoneId[] to make the method body look a little nicer.

One of IntelliJ's TestNG inspections ("invalid data provider return type") doesn't like it very much, but IntelliJ is wrong here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree IntelliJ is wrong and its inspection needs an update, but i don't want to have to eye-ball & ignore warnings manually.
So unless we decide the inspection is not helpful and we recommend disabing it for everyone (see DEVELOPING.md), we should avoid triggering warnings.

// max supported precision
.addRoundTrip("TIMESTAMP '2020-09-27 12:34:56.123456'", "TIMESTAMP '2020-09-27 12:34:56.123'")

// round down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also works for addRoundTrip(String, String, Type, String), even if the type makes it look a little less nice.

.addRoundTrip(
        "datetime(6)",
        "TIMESTAMP '1970-01-01 00:00:01.1234569'",
        TIMESTAMP_MILLIS,
        "TIMESTAMP '1970-01-01 01:00:01.123'")

@hashhar
Copy link
Member Author

hashhar commented Feb 18, 2021

AC as fixup.

Note that I changed the existing method timestampColumnMappingSqlTimestamp to perform rounding instead of adding a boolean argument/override because I couldn't find any code which directly called this mapping. i.e. all users of this method are the connectors which haven't implemented their own type mappings.

@@ -368,7 +369,7 @@ public static LongWriteFunction dateWriteFunction()
* {@link #timeColumnMapping} instead.
*/
@Deprecated
public static ColumnMapping timeColumnMappingUsingSqlTime()
private static ColumnMapping timeColumnMappingUsingSqlTime()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default TIME mapping also misses rounding. Let's solve this separately though.
Maybe revert this for now, as unrelated.

@findepi
Copy link
Member

findepi commented Feb 18, 2021

As a followup we should add same tests for MemSQL and Druid.

@hashhar
Copy link
Member Author

hashhar commented Feb 18, 2021

AC as fixups again. Will squash the fixups after approval.

PTAL again @findepi @jirassimok

@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from 35e7288 to e1889ea Compare February 18, 2021 11:37
@hashhar
Copy link
Member Author

hashhar commented Feb 18, 2021

Squashed the fixups. Should be good to merge on a green build.

@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from e1889ea to ff10d4d Compare February 18, 2021 12:00
@findepi
Copy link
Member

findepi commented Feb 18, 2021

Please rebase to force CI run

@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from ff10d4d to 3cffe1b Compare February 18, 2021 13:23
@findepi
Copy link
Member

findepi commented Feb 18, 2021

@hashhar see CI

@hashhar hashhar force-pushed the hashhar/mysql-timestamp branch from 3cffe1b to 7cfd002 Compare February 18, 2021 16:34
@findepi findepi merged commit dc1fb53 into trinodb:master Feb 18, 2021
@findepi findepi added this to the 353 milestone Feb 18, 2021
@findepi findepi mentioned this pull request Feb 18, 2021
10 tasks
@hashhar hashhar deleted the hashhar/mysql-timestamp branch February 19, 2021 04:55
aakashnand added a commit to aakashnand/trino-netezza that referenced this pull request Jul 15, 2021
Older version of trino-base-jdbc dependency gives serialization error for timestamp
column.
Issue: trinodb/trino#6852

This bug was fixed by trinodb/trino#6893

Hence updating version was necessary to work with timestamp with
precision of more than 3 digits for NetezzaClient as well.

Readme is also updated for reference
aakashnand added a commit to aakashnand/trino-netezza that referenced this pull request Jul 15, 2021
Older version of trino-base-jdbc dependency gives serialization error for timestamp
column.
Issue: trinodb/trino#6852

This bug was fixed by trinodb/trino#6893

Hence updating version was necessary to work with timestamp with
precision of more than 3 digits for NetezzaClient as well.

Readme is also updated for reference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

MySQL connector serialize issue for timestamp with precision higher than 3
4 participants