Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback non-UTC TimeZoneAwareExpression with zoneId [databricks] #10996

Merged
merged 2 commits into from
Jun 18, 2024

Conversation

thirtiseven
Copy link
Collaborator

@thirtiseven thirtiseven commented Jun 7, 2024

Close #10995

In some TimeZoneAwareExpression that plugin supports only UTC, the expression will fallback according to GpuOverrides.isUTCTimezone. In some case when we call an expression directly by passing a timezone as it's parameter, like in Spark UT:

checkEvaluation(
      StructsToJson(Map.empty, struct, Option(PST.getId)),
      """{"t":"2015-12-31T16:00:00.000-08:00"}"""
    )

The fallback doesn't work because it didn't change any timezone config.

This PR uses the zoneId in the expr to check the timezone instead of reading the timezone config to check whether to fallback to cpu under non-utc timezone.

@thirtiseven thirtiseven self-assigned this Jun 7, 2024
@thirtiseven
Copy link
Collaborator Author

build

@razajafri razajafri changed the title Fallback non-UTC TimeZoneAwareExpression with zoneId Fallback non-UTC TimeZoneAwareExpression with zoneId [databricks] Jun 7, 2024
@razajafri
Copy link
Collaborator

build

@@ -1123,7 +1123,7 @@ abstract class BaseExprMeta[INPUT <: Expression](
if (!needTimeZoneCheck) return

// Level 2 check
if (!isTimeZoneSupported) return checkUTCTimezone(this)
if (!isTimeZoneSupported) return checkUTCTimezone(this, getZoneId())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we still need to check SQLConf.get.sessionLocalTimeZone? what if user set SQLConf.get.sessionLocalTimeZone to a non UTC timezone and did not set timezone in expression?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. there're still many invokation of isUTCTimezone() in the code, do we need to go through all of them and keep them consistent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we still need to check SQLConf.get.sessionLocalTimeZone? what if user set SQLConf.get.sessionLocalTimeZone to a non UTC timezone and did not set timezone in expression?

I believed that the expression's timezone is from the config and have higher priority, will test this behavior to see if it is correct. thanks.

  1. there're still many invokation of isUTCTimezone() in the code, do we need to go through all of them and keep them consistent?

Will check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this behavior via spark ut:
expression will follow zoneId first after this pr, under both cpu and gpu.

package org.apache.spark.sql.rapids.suites

import java.util.{Calendar, TimeZone}

import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils.{PST, UTC, UTC_OPT}
import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.rapids.utils.{RapidsJsonConfTrait, RapidsTestsTrait}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._

class RapidsJsonExpressionsSuite
  extends JsonExpressionsSuite with RapidsTestsTrait with RapidsJsonConfTrait {

  test("to_json with timestamp 2") {
    val schema = StructType(StructField("t", TimestampType) :: Nil)
    val c = Calendar.getInstance(TimeZone.getTimeZone(UTC))
    c.set(2016, 0, 1, 0, 0, 0)
    c.set(Calendar.MILLISECOND, 0)
    val struct = Literal.create(create_row(c.getTimeInMillis * 1000L), schema)

    checkEvaluation(
      StructsToJson(Map.empty, struct, UTC_OPT),
      """{"t":"2016-01-01T00:00:00.000Z"}"""
    )

    SQLConf.get.setConfString("spark.sql.session.timeZone", "UTC")
    // SQLConf.get.setConfString("spark.rapids.sql.enabled", "false")
    checkEvaluation(
      StructsToJson(Map.empty, struct, Option(PST.getId)),
      """{"t":"2015-12-31T16:00:00.000-08:00"}"""
    )
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. there're still many invokation of isUTCTimezone() in the code, do we need to go through all of them and keep them consistent?

Verified that these use cases (mostly related to scan and file format) should read the config. There is no zoneId there.

Copy link
Collaborator

@binmahone binmahone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thirtiseven thirtiseven merged commit 0952dea into NVIDIA:branch-24.08 Jun 18, 2024
45 checks passed
@thirtiseven thirtiseven deleted the utc_fallback_expr branch June 18, 2024 06:24
@sameerz sameerz added the bug Something isn't working label Jun 25, 2024
SurajAralihalli pushed a commit to SurajAralihalli/spark-rapids that referenced this pull request Jul 12, 2024
…IDIA#10996)

* Fallback non-UTC TimeZoneAwareExpression with zoneId instead of timeZone config

Signed-off-by: Haoyang Li <[email protected]>

* clean up

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fallback TimeZoneAwareExpression that only support UTC with zoneId instead of timeZone config
4 participants