Pull-based execution loop improvements #380

Dandandan · 2022-10-17T22:49:57Z

Which issue does this PR close?

Rationale for this change

This pattern is applied in other places as well for cpu/blocking tasks, so it seems good to apply here too.

Also we can switch to using semaphores per @tfeda recommendation to avoid the sleeping. This improves performance somewhat (5-15%) when there is only one slot available, as there is less waiting between the tasks.
Will reduce some waiting in other cases too.

What changes are included in this PR?

Are there any user-facing changes?

tfeda · 2022-10-19T15:16:58Z

ballista/executor/src/execution_loop.rs

@@ -162,7 +165,7 @@ async fn run_received_tasks<T: 'static + AsLogicalPlan, U: 'static + AsExecution
        task_id, job_id, stage_id, stage_attempt_num, partition_id, task_attempt_num
    );
    info!("Received task {}", task_identity);
-    available_tasks_slots.fetch_sub(1, Ordering::SeqCst);
+    let permit = available_task_slots.clone().acquire_owned().await.unwrap();


I'd be careful acquiring a permit after retrieving a task from the scheduler. I'm imagining a scenario where the executor retrieves a task to execute, but then that task sits waiting for a permit to open up. could we move this call up to poll_loop, before scheduler.poll_work() is called?

I can't see whether that happens, as we also wait until there is a slot available before polling (so at this point there should be at least one available.

But I can see if I can make it not need the first check at all.

Sorry about that, I didn't see the first check above. The problem I described wouldn't happen.

Another option is to pass the permit from the first check into run_received_tasks(), and then you wouldn't need this check. I think you would replace the available_task_slots argument with an OwnedSemaphorePermit.

Yeah I did something like that, but after polling.

I would prefer to see if we can keep it like this (acquiring+release before poll instead of acquiring directly), as I want to add the possibility to retrieve multiple tasks from the scheduler based on semaphore.available_permits() (and then acquire those permits later based on the nr. of tasks that are returned from the scheduler).

Dandandan · 2022-10-20T10:54:50Z

Merged it in after some good testing to reduce the open PRs

Use dedicated executor in execution loop

c830525

Dandandan requested a review from andygrove October 19, 2022 06:27

Switch to semaphore

d163239

Dandandan changed the title ~~Use dedicated executor in execution loop~~ Execution loop improvements Oct 19, 2022

Add line

e029b57

Dandandan changed the title ~~Execution loop improvements~~ Pull-based execution loop improvements Oct 19, 2022

Dandandan added 3 commits October 19, 2022 12:40

Move before spawn

66edb1d

Move

e87c083

Move after send

3400477

tfeda reviewed Oct 19, 2022

View reviewed changes

Dandandan added 2 commits October 19, 2022 17:39

Move acquire in the loop

a5d7530

Lint, naming

68c9e93

Dandandan merged commit f58d719 into apache:master Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull-based execution loop improvements #380

Pull-based execution loop improvements #380

Dandandan commented Oct 17, 2022 •

edited

Loading

tfeda Oct 19, 2022

Dandandan Oct 19, 2022

tfeda Oct 19, 2022 •

edited

Loading

Dandandan Oct 19, 2022 •

edited

Loading

Dandandan Oct 19, 2022

Dandandan commented Oct 20, 2022

Pull-based execution loop improvements #380

Pull-based execution loop improvements #380

Conversation

Dandandan commented Oct 17, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tfeda Oct 19, 2022

Choose a reason for hiding this comment

Dandandan Oct 19, 2022

Choose a reason for hiding this comment

tfeda Oct 19, 2022 • edited Loading

Choose a reason for hiding this comment

Dandandan Oct 19, 2022 • edited Loading

Choose a reason for hiding this comment

Dandandan Oct 19, 2022

Choose a reason for hiding this comment

Dandandan commented Oct 20, 2022

Dandandan commented Oct 17, 2022 •

edited

Loading

tfeda Oct 19, 2022 •

edited

Loading

Dandandan Oct 19, 2022 •

edited

Loading