-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time zone config to set non-UTC [databricks] #9652
Conversation
Signed-off-by: Chong Gao <[email protected]>
build |
@@ -77,6 +77,15 @@ def is_emr_runtime(): | |||
def is_dataproc_runtime(): | |||
return runtime_env() == "dataproc" | |||
|
|||
def get_test_tz(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: where would all these funcs be used? also pytest conf would rely on run_pyspark script seems weird
can you at least try os.environ.get('PYSP_TEST_spark_sql_session_timeZone', 'UTC')
to make sure it could have a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these funcs will be used in pytest xfail. For an existing test case before operator supports non-utc: add xfail(is_non_utc()).
also pytest conf would relying on run_pyspark script seems weird.
I tested we can get this config in the conftest
.
conftest is internal file of IT, so it's safe to get Env variable from run_pyspark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conftest is internal file of IT, so it's safe to get Env variable from run_pyspark.
hmm OK, for me pytest code itself should at least provide the defaults here
I would like to hear more feedback from other developers~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If does not set it, then an error throws:
>>> os.environ["PYSP_TEST_spark_sql_session_timeZone"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<frozen os>", line 679, in __getitem__
KeyError: 'PYSP_TEST_spark_sql_session_timeZone'
This error will force us to set this cfg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just threw the code together so a get with a default looks like a great addition for safety/robustness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can not get the TZ env variable in conftest.py.
Can you elaborate on this? I'm confused how we can export one variable and read it in conftest but somehow can't do the same to another. Is something in the shell startup environment bashing the TZ variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@res-life wrapping TZ
with PYSP_TEST_spark_sql_session_timeZone
wouldn't be same as original way we did in is_tz_utc?
Java systemDefault will respect TZ environment.
import java.time.ZoneId;
public class Test{
public static void main(String [] args) {
System.out.println("time zone is " + ZoneId.systemDefault());
}
}
$ export TZ="UTC" && java Test
time zone is UTC
$ export TZ="Asia/Shanghai" && java Test
time zone is Asia/Shanghai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline discussed with @res-life , let's add comments that we need to get utc time before spark session starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not use the following code when the Spark session is not started.
jvm = spark.sparkContext._jvm
utc = jvm.java.time.ZoneId.of('UTC').normalized()
sys_tz = jvm.java.time.ZoneId.systemDefault().normalized()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can not get the TZ env variable in conftest.py.
Because TZ is rewritten to UTC in data_gen.py
os.environ['TZ'] = 'UTC'
time.tzset()
Removed the above code. See my last commit:
I use Env variable TZ now.
And I updated TimestampGen to avoid generate out of range timestamp
I saw the skip logic in |
build |
I cherry-picked this PR into #9482 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits really.
@@ -77,6 +77,15 @@ def is_emr_runtime(): | |||
def is_dataproc_runtime(): | |||
return runtime_env() == "dataproc" | |||
|
|||
def get_test_tz(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just threw the code together so a get with a default looks like a great addition for safety/robustness.
def get_test_tz(): | ||
return os.environ["PYSP_TEST_spark_sql_session_timeZone"] | ||
|
||
def is_utc(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize is_tz_utc
existed. I think we should probably only keep one way of checking if the timezone is UTC or not. is_tz_utc
has the problem that we need a spark session to make it work. That is fine, but it also makes it difficult to use it to skip a test unless it happens after the test starts to run. This patch removes all uses of is_tz_utc
so perhaps we should also delete the implementation too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #9482
- default value with
PYSP_TEST_spark_sql_session_timeZone
- Remove useless
is_tz_utc
build |
Building was blocked by #9681 |
build |
build |
Premerge error:
Get result in 1s but time out exception occurred. Seems it's a ramdom failure. Refer to: #9671 |
build |
@revans2 We need to merge this first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I would like others to review/approve too before merging because I wrote a lot of this code.
build |
1 similar comment
build |
@res-life Per discussed, let's file a ticket tracking DB failed issue if you can produce it with seed ID in last round. |
build |
@jlowe @NVnavkumar Please help review, thannks. |
@winningsix Did not reproduce it in the last premerge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@winningsix Also DB 330 dis not reproduce the previous error. Anyway, record the previous error here
|
Contributes #9627
Add time zone config for CI to set and test non-UTC time zone
Signed-off-by: Chong Gao [email protected]