-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abort jobs as early as possible #9927
Conversation
We don't seem to run `abortJobs` under a lock, and especially not under the write compilation lock, in other scenarios. This is causing some major slowdown when there is a long running execution or compilation, as currently experienced in the cloud. This should reduce chances of a timeout.
@@ -130,6 +130,7 @@ class CollaborativeBuffer( | |||
stop(Map.empty) | |||
|
|||
case IOTimeout => | |||
logger.warn("Timeout reached when awaiting file's content") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be this the right moment to dump all stacktraces or turn on profiling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not there yet, but yes, probably.
@@ -42,12 +42,12 @@ public Future<BoxedUnit> executeAsynchronously(RuntimeContext ctx, ExecutionCont | |||
private void setExecutionEnvironment( | |||
Runtime$Api$ExecutionEnvironment executionEnvironment, UUID contextId, RuntimeContext ctx) { | |||
var logger = ctx.executionService().getLogger(); | |||
ctx.jobControlPlane().abortJobs(contextId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is moving abortJobs
call outside of critical section, right? But the PR description says:
We don't seem to run abortJobs under a lock
How's that accurate? We do seem to run abortJobs
under a lock and this PR is changing that, am I not right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant in other locations that call abortJobs
`System.getProperty` does not return `null`, it returns `"null"` :facepalm:. I broke the internet, sorry.
Pull Request Description
We don't seem to run
abortJobs
under a lock, and especially not under the write compilation lock, in other scenarios. This is causing some major slowdown when there is a long running execution or compilation, as currently experienced in the cloud.This should reduce chances of a timeout.
Also added an option to override the global executor. Currently it would always default to the runtime number of available process which may be suboptimal.
Important Notes
Pending testing on the impact it will have.
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.