Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[window] Add GpuWindowExec requiredChildOrdering #645

Merged
merged 1 commit into from
Sep 8, 2020

Conversation

mythrocks
Copy link
Collaborator

Closes #486.

WindowExec has checks for child ordering and distribution.
This commit adds requiredChildOrdering and requiredChildDistribution
to GpuWindowExec.

Signed-off-by: Mithun RK [email protected]

@mythrocks mythrocks added bug Something isn't working P1 Nice to have for release labels Sep 2, 2020
@mythrocks mythrocks self-assigned this Sep 2, 2020
WindowExec has checks for child ordering and distribution.
This commit adds requiredChildOrdering and requiredChildDistribution
to GpuWindowExec.

Signed-off-by: Mithun RK <[email protected]>
andygrove
andygrove previously approved these changes Sep 3, 2020
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on comparing to the Spark source code. This seems consistent.

@kuhushukla
Copy link
Collaborator

The changes LGTM! Curious if there is a way to test this change?

@mythrocks
Copy link
Collaborator Author

Curious if there is a way to test this change?

Haven't a Scooby Doo, truth be told. :] This change is to assuage concerns that our Exec doesn't share the signature/preconditions of WindowExec. Doesn't seem to affect query behaviour, in my limited testing.

As an aside, there's one more change coming to this PR.

@revans2
Copy link
Collaborator

revans2 commented Sep 3, 2020

The main reason to do this is for the optimizations that we have where we could remove a sort, like with operations where we can do a hash aggregate instead of a sort aggregate or where we might replace a sort-merge join with a hash based join.

In those cases the output of the join/aggregate operation will not match with how the plan was originally setup, and if spark optimized out a sort, because it was relying on the output of one of the operators already being sorted, then without this patch we will not add it back in properly.

I don't know how often in the real world this would happen, but it is a possibility.

@mythrocks
Copy link
Collaborator Author

P.S I can look into concocting a test. This change declares what the expected ordering of its inputs are. So this is an assertion check against losing that ordering.

@mythrocks
Copy link
Collaborator Author

Hey, thanks Bobby. (Missed your reply.)

Thanks for confirming. Based on how we're currently replacing ops/execs, I think we might not run into that scenario (of losing the ordering). Unless AQE changes things in some way.

@kuhushukla
Copy link
Collaborator

Makes sense, we have such code in Partitioning and other places too but I was mainly curious what this would really change as far as functionality goes, so thanks for the explanation Bobby, Mithun. :)

@mythrocks mythrocks changed the title [WIP] [window] Add GpuWindowExec requiredChildOrdering [WIP] [window] Fixes for GpuWindowExec Sep 3, 2020
@mythrocks mythrocks changed the title [WIP] [window] Fixes for GpuWindowExec [window] Add GpuWindowExec requiredChildOrdering Sep 3, 2020
@mythrocks
Copy link
Collaborator Author

mythrocks commented Sep 3, 2020

As an aside, there's one more change coming to this PR.

Chaps, apologies for the churn on this PR, during which I seemed to have dismissed approvals from @kuhushukla and @andygrove. I intended to push a workaround for #218 to this PR. It slipped my mind that we will be squashing commits.

I will raise a separate PR to address #218 for the 0.2 release.

@tgravescs
Copy link
Collaborator

seems fine, I do worry a little about the Databricks code since we had different output columns there. I would think this is ok since its basically just a pass through.

@mythrocks
Copy link
Collaborator Author

I do worry a little about the Databricks code since we had different output columns there.

Yeah, I did consider that. I think we should be in the clear, since this is the input ordering.

@mythrocks
Copy link
Collaborator Author

build

@mythrocks mythrocks merged commit 216da1d into NVIDIA:branch-0.2 Sep 8, 2020
@mythrocks
Copy link
Collaborator Author

Thanks for the reviews, @andygrove, @kuhushukla, @revans2, @tgravescs!

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
WindowExec has checks for child ordering and distribution.
This commit adds requiredChildOrdering and requiredChildDistribution
to GpuWindowExec.

Signed-off-by: Mithun RK <[email protected]>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
WindowExec has checks for child ordering and distribution.
This commit adds requiredChildOrdering and requiredChildDistribution
to GpuWindowExec.

Signed-off-by: Mithun RK <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 Nice to have for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] GpuWindowExec does not implement requiredChildOrdering
5 participants