Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45133][CONNECT] Make Spark Connect queries be FINISHED when last result task is finished #42889

Closed
wants to merge 2 commits into from

Conversation

juliuszsompolski
Copy link
Contributor

@juliuszsompolski juliuszsompolski commented Sep 12, 2023

What changes were proposed in this pull request?

In the situation before, query will only be FINISHED when all results have been pushed into the output buffers (not necessarily received by client, but pushed out of the server).

For LocalTableScanExec, post FINISHED before sending result batches, because nothing is executed, only cached local results are returned. For regular execution, post FINISHED after all task results have been returned from Spark, not after they have been processed and sent out.

Why are the changes needed?

Currently, even if a query finished running in Spark, it keeps being RUNNING until all results are sent. Then there is a very small difference between FINISHED and CLOSED. This change makes it behave more similar to e.g. Thriftserver.

Does this PR introduce any user-facing change?

Yes. Queries will be posted as FINISHED when they finish executing, not when they finish sending results.

How was this patch tested?

Will add test in #42560

Was this patch authored or co-authored using generative AI tooling?

No.

@juliuszsompolski
Copy link
Contributor Author

cc @jdesjean @gjxdxh @gengliangwang @hvanhovell @grundprinzip
I think it kind of regressed in #42454
I think it's more logical this way.

@jdesjean
Copy link
Contributor

The change seems logical to me as well.

@HyukjinKwon HyukjinKwon changed the title [SPARK-45133] Make Spark Connect queries be FINISHED when last result task is finished [SPARK-45133][CONNECT] Make Spark Connect queries be FINISHED when last result task is finished Sep 13, 2023
@HyukjinKwon
Copy link
Member

Merged to master.

@dongjoon-hyun
Copy link
Member

Hi, @juliuszsompolski and @HyukjinKwon .

I hoped this recovers the master branch, but it seems that CI is still failing since the commit which ReattachableExecuteSuite is added. Could you check this comment?

@juliuszsompolski
Copy link
Contributor Author

In description I wrote "Will add test in #42560", but since that one was merged yesterday, I planned to rebase and add test here. I will add test in a followup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants