Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback non-UTC TimeZoneAwareExpression with zoneId [databricks] #10996

Merged
merged 2 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
* Copyright (c) 2019-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -1123,7 +1123,7 @@ abstract class BaseExprMeta[INPUT <: Expression](
if (!needTimeZoneCheck) return

// Level 2 check
if (!isTimeZoneSupported) return checkUTCTimezone(this)
if (!isTimeZoneSupported) return checkUTCTimezone(this, getZoneId())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we still need to check SQLConf.get.sessionLocalTimeZone? what if user set SQLConf.get.sessionLocalTimeZone to a non UTC timezone and did not set timezone in expression?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. there're still many invokation of isUTCTimezone() in the code, do we need to go through all of them and keep them consistent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we still need to check SQLConf.get.sessionLocalTimeZone? what if user set SQLConf.get.sessionLocalTimeZone to a non UTC timezone and did not set timezone in expression?

I believed that the expression's timezone is from the config and have higher priority, will test this behavior to see if it is correct. thanks.

  1. there're still many invokation of isUTCTimezone() in the code, do we need to go through all of them and keep them consistent?

Will check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this behavior via spark ut:
expression will follow zoneId first after this pr, under both cpu and gpu.

package org.apache.spark.sql.rapids.suites

import java.util.{Calendar, TimeZone}

import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils.{PST, UTC, UTC_OPT}
import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.rapids.utils.{RapidsJsonConfTrait, RapidsTestsTrait}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._

class RapidsJsonExpressionsSuite
  extends JsonExpressionsSuite with RapidsTestsTrait with RapidsJsonConfTrait {

  test("to_json with timestamp 2") {
    val schema = StructType(StructField("t", TimestampType) :: Nil)
    val c = Calendar.getInstance(TimeZone.getTimeZone(UTC))
    c.set(2016, 0, 1, 0, 0, 0)
    c.set(Calendar.MILLISECOND, 0)
    val struct = Literal.create(create_row(c.getTimeInMillis * 1000L), schema)

    checkEvaluation(
      StructsToJson(Map.empty, struct, UTC_OPT),
      """{"t":"2016-01-01T00:00:00.000Z"}"""
    )

    SQLConf.get.setConfString("spark.sql.session.timeZone", "UTC")
    // SQLConf.get.setConfString("spark.rapids.sql.enabled", "false")
    checkEvaluation(
      StructsToJson(Map.empty, struct, Option(PST.getId)),
      """{"t":"2015-12-31T16:00:00.000-08:00"}"""
    )
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. there're still many invokation of isUTCTimezone() in the code, do we need to go through all of them and keep them consistent?

Verified that these use cases (mostly related to scan and file format) should read the config. There is no zoneId there.


// Level 3 check
val zoneId = getZoneId()
Expand Down Expand Up @@ -1203,8 +1203,8 @@ abstract class BaseExprMeta[INPUT <: Expression](
*
* @param meta to check whether it's UTC
*/
def checkUTCTimezone(meta: RapidsMeta[_, _, _]): Unit = {
if (!GpuOverrides.isUTCTimezone()) {
def checkUTCTimezone(meta: RapidsMeta[_, _, _], zoneId: ZoneId): Unit = {
if (!GpuOverrides.isUTCTimezone(zoneId)) {
meta.willNotWorkOnGpu(
TimeZoneDB.nonUTCTimezoneNotSupportedStr(meta.wrapped.getClass.toString))
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,9 @@ class RapidsTestSettings extends BackendTestSettings {
.exclude("from_json - input=empty array, schema=struct, output=single row with null", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("from_json - input=empty object, schema=struct, output=single row with null", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("SPARK-20549: from_json bad UTF-8", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("from_json with timestamp", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("to_json - array", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("to_json - array with single empty row", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("to_json - empty array", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("to_json with timestamp", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("SPARK-21513: to_json support map[string, struct] to json", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("SPARK-21513: to_json support map[struct, struct] to json", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
.exclude("SPARK-21513: to_json support map[string, integer] to json", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/10849"))
Expand Down