-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in support for months_between #11737
Conversation
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/datetimeExpressions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
build |
I ran some local benchmarks to see the performance improvement
An a6000 GPU can complete this with 16 CPU cores in about 16 seconds (after it warms up) Threadripper PRO 5975WX 32-Cores finishes in about 325 seconds when run with all 32 cores (no hyperthreading). That is about a 20x speedup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Seq(ParamCheck("timestamp1", TypeSig.TIMESTAMP, TypeSig.TIMESTAMP), | ||
ParamCheck("timestamp2", TypeSig.TIMESTAMP, TypeSig.TIMESTAMP), | ||
ParamCheck("round", TypeSig.lit(TypeEnum.BOOLEAN), TypeSig.BOOLEAN))), | ||
(a, conf, p, r) => new MonthsBetweenExprMeta(a, conf, p, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If there are more comments to address it would be nice to follow the pattern like GpuMapFromArraysMeta
to avoid manual parameter passing
override def convertToGpu(): GpuExpression = { | ||
val gpuChildren = childExprs.map(_.convertToGpu()) | ||
assert(gpuChildren.length == 3) | ||
GpuMonthsBetween(gpuChildren(0), gpuChildren(1), gpuChildren(2), expr.timeZoneId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: either unpack or use named params for readability
GpuMonthsBetween(gpuChildren(0), gpuChildren(1), gpuChildren(2), expr.timeZoneId) | |
val Seq(ts1, ts2, roundOff) = gpuChildren | |
GpuMonthsBetween(ts1, ts2, roundOff, expr.timeZoneId) |
This fixes #11709
The code is a little complicated, mostly because the Spark code is doing some kind of complex things.
I think that there are some more optimizations that we could do to reduce memory and improve performance, but I wanted to get something working out the door sooner, and then we can look at improving it later.