-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support from_unixtime via Gpu for non-UTC time zone #9814
Conversation
build |
Please only review the last commit: Support from_unixtime via CPU POC |
build |
1 similar comment
build |
@@ -1098,6 +1098,7 @@ abstract class BaseExprMeta[INPUT <: Expression]( | |||
//+------------------------+-------------------+-----------------------------------------+ | |||
lazy val needTimezoneCheck: Boolean = { | |||
wrapped match { | |||
case _: FromUnixTime => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we still need FromUnixTime
given it's a sub-class of TimeZoneAwareExpression
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, now use new APIs.
tsVector.asStrings(strfFormat) | ||
withResource(lhs.getBase.asTimestampSeconds) { secondsVector => | ||
withResource(secondsVector.asTimestampMicroseconds) { tsVector => | ||
if (zoneId.normalized().equals(ZoneId.of("UTC").normalized())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TimeZoneDB
utils instead of hard coding UTC here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Premerge failed, case is: convert large InternalRow iterator to cached batch single col *** FAILED ***
|
build |
afb6ac8
to
172de1d
Compare
Signed-off-by: Chong Gao <[email protected]>
172de1d
to
7625e16
Compare
@@ -1130,7 +1130,7 @@ abstract class BaseExprMeta[INPUT <: Expression]( | |||
if (!isTimeZoneSupported) return checkUTCTimezone(this) | |||
|
|||
// Level 4 check | |||
if (TimeZoneDB.isSupportedTimezone(getZoneId())) { | |||
if (!TimeZoneDB.isSupportedTimezone(getZoneId())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to GpuTimeZoneDB.isSupportedTimezone
. It's the same logic but in a class that will not be removed in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
build |
build |
@@ -0,0 +1,630 @@ | |||
# Copyright (c) 2021-2022, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be 2023.
will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
build |
# limitations under the License. | ||
|
||
# get from Java: | ||
# ZoneId.getAvailableZoneIds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The problem is that the backend on java is pluggable. This can change from one version of java to another. The TimeZoneProvider can also change this within a single version of java. Also this does not actually reflect all of the possible time zones. These are the normalized ones. There are deprecated ones that java still kind of supports, but get mapped to these. There are also fixed time zone offsets that are not covered by this and are technically any time range up to 24 hours at second granularity. This is minor because it is not likely to change, but if we could take the time zone and send it to java to check if it is valid/etc and then memorize the result instead that would make me feel better about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can potentially handle as part of #9747
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this issue: #9633
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now forward to Java to check:
jvm.org.apache.spark.sql.rapids.TimeZoneDB.isSupportedTimeZone(tz)
I tested, it works in the Pytest marker.
build |
Tested |
tz = get_test_tz() | ||
jvm = spark_jvm() | ||
ret = jvm.org.apache.spark.sql.rapids.TimeZoneDB.isSupportedTimeZone(tz) | ||
print("my debug: is_supported_time_zone " + str(ret)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this print is useful it should not be "my debug"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, remove the debug print.
""" | ||
tz = get_test_tz() | ||
jvm = spark_jvm() | ||
ret = jvm.org.apache.spark.sql.rapids.TimeZoneDB.isSupportedTimeZone(tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we memorize this? Calling back into the JVM can be expensive. And if this is never going to change I would much rather have us keep the result cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also use the GpuTimeZoneDB
version of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now cached the support info; And updated to GpuTimeZoneDB.
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
build |
build |
closes #9605
Support from_unixtime via Gpu
Signed-off-by: Chong Gao [email protected]