Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in support for months_between #11737

Merged
merged 4 commits into from
Nov 20, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Nov 19, 2024

This fixes #11709

The code is a little complicated, mostly because the Spark code is doing some kind of complex things.

I think that there are some more optimizations that we could do to reduce memory and improve performance, but I wanted to get something working out the door sooner, and then we can look at improving it later.

Signed-off-by: Robert (Bobby) Evans <[email protected]>
@revans2
Copy link
Collaborator Author

revans2 commented Nov 19, 2024

build

@sameerz sameerz added the feature request New feature or request label Nov 19, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Nov 20, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Nov 20, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Nov 20, 2024

I ran some local benchmarks to see the performance improvement

spark.time(spark.range(100000000000L, 120000000000L, 1, 64).selectExpr("AVG(months_between(timestamp_micros(id), timestamp_micros(10))) as mbt").show())

An a6000 GPU can complete this with 16 CPU cores in about 16 seconds (after it warms up)

Threadripper PRO 5975WX 32-Cores finishes in about 325 seconds when run with all 32 cores (no hyperthreading). That is about a 20x speedup.

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Seq(ParamCheck("timestamp1", TypeSig.TIMESTAMP, TypeSig.TIMESTAMP),
ParamCheck("timestamp2", TypeSig.TIMESTAMP, TypeSig.TIMESTAMP),
ParamCheck("round", TypeSig.lit(TypeEnum.BOOLEAN), TypeSig.BOOLEAN))),
(a, conf, p, r) => new MonthsBetweenExprMeta(a, conf, p, r)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If there are more comments to address it would be nice to follow the pattern like GpuMapFromArraysMeta

https://github.com/NVIDIA/spark-rapids/pull/11163/files#diff-8926565c5f6bc26203bed459ac37ea0fba15f9f5488aa2412aeac26b31f80ea6R2881

to avoid manual parameter passing

override def convertToGpu(): GpuExpression = {
val gpuChildren = childExprs.map(_.convertToGpu())
assert(gpuChildren.length == 3)
GpuMonthsBetween(gpuChildren(0), gpuChildren(1), gpuChildren(2), expr.timeZoneId)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: either unpack or use named params for readability

Suggested change
GpuMonthsBetween(gpuChildren(0), gpuChildren(1), gpuChildren(2), expr.timeZoneId)
val Seq(ts1, ts2, roundOff) = gpuChildren
GpuMonthsBetween(ts1, ts2, roundOff, expr.timeZoneId)

@revans2 revans2 merged commit 20c5281 into NVIDIA:branch-24.12 Nov 20, 2024
48 of 49 checks passed
@revans2 revans2 deleted the gpu_months_between branch November 20, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add support for MonthsBetween
4 participants