-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unix_timestamp on GPU for subset of formats #1113
Conversation
Signed-off-by: Andy Grove <[email protected]>
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Andy Grove <[email protected]>
…Ops to be set Signed-off-by: Andy Grove <[email protected]>
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Thanks for the review @revans2. I have now addressed the remaining issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great
build |
The PySpark tests failed. I will either need to modify the config to enable incompatible formats or add more formats to the list of compatible formats. |
There are only two formats that are being tested, 'yyyy/MM' and 'yyyy/MM/dd' One of the tests is marked as xfail because we allow invalid dates to be inserted. The tests are already marked with |
…mestamp Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
@pytest.mark.parametrize('data_gen,date_form', str_date_and_format_gen, ids=idfn) | ||
def test_string_to_unix_timestamp(data_gen, date_form): | ||
print("date: " + date_form) | ||
conf = {"spark.rapids.sql.improvedTimeOps.enabled": "true"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test fails if I don't add this config and I don't understand why yet. I am debugging this still.
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
build |
Signed-off-by: Andy Grove <[email protected]>
@revans2 This is ready for review now. I had misunderstood the correct behavior for |
build |
It looks like the behavior has changed in Spark 3.1.0 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine, but if we know that there are failures in 3.1.0 then we should at least file a follow-on issue for them and either skip the tests for spark 3.1.0+ or mark them as xfail
Signed-off-by: Andy Grove <[email protected]>
build |
* Support unix_timestamp on GPU for subset of formats Signed-off-by: Andy Grove <[email protected]> * close scalar value Signed-off-by: Andy Grove <[email protected]> * compatible formats will now run on GPU without requiring incompatibleOps to be set Signed-off-by: Andy Grove <[email protected]> * code cleanup and address more review comments Signed-off-by: Andy Grove <[email protected]> * add specific config option for enabling incompatible date formats on GPU * update documentation Signed-off-by: Andy Grove <[email protected]> * improve docs Signed-off-by: Andy Grove <[email protected]> * use constants for special dates Signed-off-by: Andy Grove <[email protected]> * Add support for more date formats and remove incompat from to_unix_timestamp Signed-off-by: Andy Grove <[email protected]> * remove debug print Signed-off-by: Andy Grove <[email protected]> * Revert unnecessary change Signed-off-by: Andy Grove <[email protected]> * Make ToUnixTimestamp consistent with UnixTimestamp Signed-off-by: Andy Grove <[email protected]> * refactor to remove duplicate code Signed-off-by: Andy Grove <[email protected]> * fix resource leaks and fix regressions in python tests Signed-off-by: Andy Grove <[email protected]> * scalstyle Signed-off-by: Andy Grove <[email protected]> * update docs Signed-off-by: Andy Grove <[email protected]> * fix error in handling of legacyTimeParserPolicy=EXCEPTION Signed-off-by: Andy Grove <[email protected]> * fix test failures against Spark 3.1.0 Signed-off-by: Andy Grove <[email protected]>
* Support unix_timestamp on GPU for subset of formats Signed-off-by: Andy Grove <[email protected]> * close scalar value Signed-off-by: Andy Grove <[email protected]> * compatible formats will now run on GPU without requiring incompatibleOps to be set Signed-off-by: Andy Grove <[email protected]> * code cleanup and address more review comments Signed-off-by: Andy Grove <[email protected]> * add specific config option for enabling incompatible date formats on GPU * update documentation Signed-off-by: Andy Grove <[email protected]> * improve docs Signed-off-by: Andy Grove <[email protected]> * use constants for special dates Signed-off-by: Andy Grove <[email protected]> * Add support for more date formats and remove incompat from to_unix_timestamp Signed-off-by: Andy Grove <[email protected]> * remove debug print Signed-off-by: Andy Grove <[email protected]> * Revert unnecessary change Signed-off-by: Andy Grove <[email protected]> * Make ToUnixTimestamp consistent with UnixTimestamp Signed-off-by: Andy Grove <[email protected]> * refactor to remove duplicate code Signed-off-by: Andy Grove <[email protected]> * fix resource leaks and fix regressions in python tests Signed-off-by: Andy Grove <[email protected]> * scalstyle Signed-off-by: Andy Grove <[email protected]> * update docs Signed-off-by: Andy Grove <[email protected]> * fix error in handling of legacyTimeParserPolicy=EXCEPTION Signed-off-by: Andy Grove <[email protected]> * fix test failures against Spark 3.1.0 Signed-off-by: Andy Grove <[email protected]>
* Support unix_timestamp on GPU for subset of formats Signed-off-by: Andy Grove <[email protected]> * close scalar value Signed-off-by: Andy Grove <[email protected]> * compatible formats will now run on GPU without requiring incompatibleOps to be set Signed-off-by: Andy Grove <[email protected]> * code cleanup and address more review comments Signed-off-by: Andy Grove <[email protected]> * add specific config option for enabling incompatible date formats on GPU * update documentation Signed-off-by: Andy Grove <[email protected]> * improve docs Signed-off-by: Andy Grove <[email protected]> * use constants for special dates Signed-off-by: Andy Grove <[email protected]> * Add support for more date formats and remove incompat from to_unix_timestamp Signed-off-by: Andy Grove <[email protected]> * remove debug print Signed-off-by: Andy Grove <[email protected]> * Revert unnecessary change Signed-off-by: Andy Grove <[email protected]> * Make ToUnixTimestamp consistent with UnixTimestamp Signed-off-by: Andy Grove <[email protected]> * refactor to remove duplicate code Signed-off-by: Andy Grove <[email protected]> * fix resource leaks and fix regressions in python tests Signed-off-by: Andy Grove <[email protected]> * scalstyle Signed-off-by: Andy Grove <[email protected]> * update docs Signed-off-by: Andy Grove <[email protected]> * fix error in handling of legacyTimeParserPolicy=EXCEPTION Signed-off-by: Andy Grove <[email protected]> * fix test failures against Spark 3.1.0 Signed-off-by: Andy Grove <[email protected]>
…IDIA#1113) Signed-off-by: spark-rapids automation <[email protected]>
Signed-off-by: Andy Grove [email protected]
This PR makes improvements to
unix_timestamp
so that a small number of date/timestamp formats can now be supported on GPU without requiring thatincompatibleOps=true
because they are now compatible with Spark (with the exception of #1117).Specific changes in this PR:
unix_timestamp
andto_unix_timestamp
no longer return bad data for invalid inputs when using one of the compatible formats because it first validates the contents of a column usingis_timestamp
and will either throw an exception or convert invalid inputs to null, depending on the value specified forlegacyTimeParserPolicy
.spark.rapids.sql.incompatibleDateFormats.enabled
has been added.unix_timestamp
now supports the special datesepoch
,now
,today
,yesterday
, andtomorrow
.unix_timestamp
will fall back to CPU iflegacyTimeParserPolicy
isLEGACY
, or if the supplied format is not supported on GPU andspark.rapids.sql.incompatibleDateFormats.enabled=false
.to_date
is now supported on GPU because it gets translated to an expression that is a combination ofCAST
andunix_timestamp
, using one of the compatible formats.This PR closes #1098, #1094, and largely addresses #50. I filed a new issue #1111 for the remainder of the work to support
legacyTimeParserPolicy
.