Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate RunningWindow queries on GPU #2600

Merged
merged 11 commits into from
Jun 9, 2021

Conversation

mythrocks
Copy link
Collaborator

Fixes #1791.

A window-spec defined as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is termed a "running" window, because for each successive row in a row-group, the window increases in size by 1.

"Running" windows have special handling on Databricks, via a RunningWindowFunctionExec. spark-rapids does not currently recognize or replace this exec with a GPU equivalent, thereby causing "running" window queries to fail thus:

java.lang.IllegalStateException: Found an aggregation function in an unexpected context Some(!NOT_FOUND <RunningWindowFunctionExec> NOT EVALUATED FOR GPU YET
...
  at com.nvidia.spark.rapids.ExpressionContext$.getAggregateFunctionContext(RapidsMeta.scala:723)
  at com.nvidia.spark.rapids.BaseExprMeta.context$lzycompute(RapidsMeta.scala:764)
  at com.nvidia.spark.rapids.BaseExprMeta.context(RapidsMeta.scala:760)
  at com.nvidia.spark.rapids.ExprChecksImpl.tag(TypeChecks.scala:732)
  at com.nvidia.spark.rapids.BaseExprMeta.$anonfun$tagSelfForGpu$3(RapidsMeta.scala:774)
  at com.nvidia.spark.rapids.BaseExprMeta.$anonfun$tagSelfForGpu$3$adapted(RapidsMeta.scala:774)
  at scala.Option.foreach(Option.scala:407)
  at com.nvidia.spark.rapids.BaseExprMeta.tagSelfForGpu(RapidsMeta.scala:774)
  at com.nvidia.spark.rapids.RapidsMeta.tagForGpu(RapidsMeta.scala:279)
...

This commit aims to recognize and replace RunningWindowFunctionExec with a GPU implementation. In this first pass, it will delegate to GpuWindowExec. This has the potential to be optimized further, (i.e. evaluated in a single pass), once rapids/cudf has the support for it.

@mythrocks mythrocks self-assigned this Jun 5, 2021
@mythrocks mythrocks requested review from tgravescs and revans2 June 5, 2021 01:57
@sameerz sameerz added the bug Something isn't working label Jun 6, 2021
@sameerz sameerz added this to the June 7 - June 18 milestone Jun 6, 2021
@tgravescs
Copy link
Collaborator

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit. It looks good.

tgravescs
tgravescs previously approved these changes Jun 8, 2021
@tgravescs
Copy link
Collaborator

build

@mythrocks
Copy link
Collaborator Author

Note: The build against Databricks 8.2 is likely to fail because of #2628. We can re-kick the tests, once #2628 is resolved.

@mythrocks mythrocks requested review from revans2 and tgravescs June 8, 2021 16:20
@mythrocks
Copy link
Collaborator Author

@revans2, thanks for solving #2637. I'll rebase when that's been checked in.

@mythrocks
Copy link
Collaborator Author

Rebased now, to include #2637.

@mythrocks
Copy link
Collaborator Author

build

1 similar comment
@sameerz
Copy link
Collaborator

sameerz commented Jun 9, 2021

build

tgravescs
tgravescs previously approved these changes Jun 9, 2021
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor things.

revans2
revans2 previously approved these changes Jun 9, 2021
@mythrocks mythrocks dismissed stale reviews from revans2 and tgravescs via 87551ae June 9, 2021 14:59
revans2
revans2 previously approved these changes Jun 9, 2021
@revans2
Copy link
Collaborator

revans2 commented Jun 9, 2021

build

@tgravescs tgravescs merged commit 2b75173 into NVIDIA:branch-21.06 Jun 9, 2021
rongou pushed a commit to rongou/spark-rapids that referenced this pull request Jun 9, 2021
* Tentative fix

Signed-off-by: Mithun RK <[email protected]>

* Delegate recognition of window-function exec to Shim.

* Tentative support for Databricks 7.x

* Fix formatting.

* Moved commonality to base class for Gpu window execs.

* Delegate to GpuBaseWindowExecMeta, from Databricks Shims.

* Renamed test for RunningWindowFunctionExec.

* Updated copyright date.

* Fix copyright to include full year.

* Switch from explicit cast to match() condition.
@mythrocks mythrocks deleted the running-window-function branch June 9, 2021 21:21
@mythrocks
Copy link
Collaborator Author

Thanks for the reviews, all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] GPU support for the plan RunningWindowFunctionExec on db
5 participants