Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIMESTAMP type cannot represent seconds representable in Spark #9904

Open
NEUpanning opened this issue May 23, 2024 · 4 comments
Open

TIMESTAMP type cannot represent seconds representable in Spark #9904

NEUpanning opened this issue May 23, 2024 · 4 comments
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@NEUpanning
Copy link
Contributor

Bug description

Presto's Timestamp is stored in one 64-bit signed integer for milliseconds, so TIMESTAMP type limits the range of seconds in [INT64_MIN/1000 - 1, INT64_MAX/1000]. Spark's Timestamp is stored in 64-bit signed integer for seconds, so the range of seconds in [INT64_MIN- 1, INT64_MAX]. Therefore, TIMESTAMP type cannot represent seconds representable in Spark.

System information

Relevant logs

No response

@NEUpanning NEUpanning added bug Something isn't working triage Newly created issue that needs attention. labels May 23, 2024
@rui-mo
Copy link
Collaborator

rui-mo commented May 23, 2024

@NEUpanning Thanks for opening an issue. I believe Spark's timestamp is in microsecond unit, see https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampType.scala#L23. But the issue in the test of from_unixtime is that it creates a timestamp with INT64_MAX. As for Spark, the check should be enough as Spark stores int64_t value as microseconds.

@rui-mo
Copy link
Collaborator

rui-mo commented May 23, 2024

The maximum seconds of a valid timestamp in Spark is INT64_MAX / 10^6, which is under the limit in Velox for maximum seconds (INT64_MAX / 10^3).

fromUnixTime(std::numeric_limits<int64_t>::max(), "yyyy-MM-dd HH:mm:ss")

But in the function from_unixtime, timestamp could be created with a larger number of seconds which can cause check failure in debug mode. I wonder if we need to fix that in the function from_unixtime by avoiding creating timestamp if the input exceeds INT64_MAX / 10^6, instead of removing the check.
@mbasmanova How do you think? Thanks!

@NEUpanning
Copy link
Contributor Author

NEUpanning commented May 23, 2024

@rui-mo

@NEUpanning Thanks for opening an issue. I believe Spark's timestamp is in microsecond unit, see https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/types/TimestampType.scala#L23. But the issue in the test of from_unixtime is that it creates a timestamp with INT64_MAX. As for Spark, the check should be enough as Spark stores int64_t value as microseconds.

You are right. Spark's timestamp isn't exceed the range of seconds in [INT64_MIN/1000 - 1, INT64_MAX/1000]. In the function from_unixtime, argument unix_time can be represented by LongType or DecimalType etc. as seconds, so its range exceeds Velox timestamp type's range limit.

@mbasmanova
Copy link
Contributor

I wonder if we need to fix that in the function from_unixtime by avoiding creating timestamp if the input exceeds INT64_MAX / 10^6, instead of removing the check.

@rui-mo I'm thinking the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

3 participants