Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop checking store size when spilling #7923

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Mar 22, 2023

Linked issue: #7706

Given this discussion during a prior PR on spill framework refactoring (#7572 (comment)), we filed an issue to look at removing checks for current spillable store size, as these checks could cause us to stop spilling when anything spilled could be beneficial (e.g. freeing just one buffer could open up enough space for us to satisfy an OOMing allocation).

Opening as draft for now. I ran some tests on this, but I want to run more. I wanted to get some eyes on it.

Signed-off-by: Alessandro Bellina <[email protected]>
logWarning(s"Targeting a ${store.name} size of $targetTotalSize. " +
s"Current total ${store.currentSize}. " +
s"Current spillable ${store.currentSpillableSize}")
targetTotalSize.map { tgt =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should probably be using orElse..foreach rather than map..getOrElse since the result is not used.

@sameerz sameerz added the bug Something isn't working label Mar 23, 2023
@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

So under spill pressure, I am seeing an old query that would spill 1TB or so take 100 seconds longer with this patch (from ~300 to 400 seconds) which seems really bad. So I think we need to keep this as draft until I can get some traces to understand what the differences are.

@abellina
Copy link
Collaborator Author

In addition to the slowness, the new way spills about 60% less, which I would have thought was a pure win, just adding the data point. I'll collect some traces today.

@abellina
Copy link
Collaborator Author

Moving the milestone to 23.06, as I haven't had time to debug the regression further.

@abellina abellina changed the base branch from branch-23.04 to branch-23.06 March 31, 2023 16:00
@abellina
Copy link
Collaborator Author

abellina commented Jun 5, 2023

I have rebased against 23.06 and retested this against a heavy spilling query I normally use. I am seeing it add 1.5 minute of runtime vs the baseline and we spill ~30% more per stage, neither of which are expected or desirable.

I am going to close this PR as is, to stop moving it from branch to branch and will continue looking at the underlying issue #7706 in 23.08.

@abellina abellina closed this Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants