Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LaunchMultiTask rpc interface for executor #209

Closed
yahoNanJing opened this issue Sep 14, 2022 · 5 comments · Fixed by #255
Closed

Add LaunchMultiTask rpc interface for executor #209

yahoNanJing opened this issue Sep 14, 2022 · 5 comments · Fixed by #255
Labels
enhancement New feature or request

Comments

@yahoNanJing
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Sometimes, the scheduler will schedule multiple tasks of the same stage to the same executors. Currently we have to create a task definition for each task, which may not be efficient.

Describe the solution you'd like

For the above case, it's better to provide a LaunchMultiTask rpc interface for the executor so that multiple tasks of the same stage can be just one MultiTaskDefinition to avoid the same plan be copied multiple times. The network io cost can be reduced.

Describe alternatives you've considered

Additional context

@yahoNanJing yahoNanJing added the enhancement New feature or request label Sep 14, 2022
@askoa
Copy link
Contributor

askoa commented Sep 14, 2022

@yahoNanJing I looked into this and just wanted to clarify something. I might be wrong, but looks like the current proto definition allows to send multiple tasks in a single launch task. See line #835

https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/core/proto/ballista.proto#L833-L837

However in the code, I see the tasks are launched one at a time. See line #427

https://github.com/apache/arrow-ballista/blob/2e1f5d619760d3b7acce225a166a9507f9efe9a1/ballista/rust/scheduler/src/state/task_manager.rs#L425-L429

Can this task be simplified as modifying the for loop in offer_reservation function to group tasks by executor_id and send them in a single launch_task?

@yahoNanJing
Copy link
Contributor Author

Hi @askoa, yes currently we can launch a bunch of tasks in a single grpc request. However, for the same stage, they don't share the information, like execution plan, etc.

I'd like to add something like this.

// A set of tasks in the same stage
message MultiTaskDefinition {
  PartitionIds task_ids = 1;
  bytes plan = 2;
  string session_id = 3;
  repeated KeyValuePair props = 4;
}

Then we can not only reduce the rpc request body size, but also reduce the execution plan deserialization cost.

@askoa
Copy link
Contributor

askoa commented Sep 15, 2022

@yahoNanJing Could you please check the protobuf changes in the draft PR and let me know if it looks okay?

@yahoNanJing
Copy link
Contributor Author

yahoNanJing commented Sep 16, 2022

Thanks @askoa. Left some comment on your draft. Are you planning to implement this or already have an implementation for this? Actually, we have already implemented a version for this and plan to contribute a patch for it.

@askoa
Copy link
Contributor

askoa commented Sep 16, 2022

@yahoNanJing I just started with protobuf changes and not implemented yet. I'll drop the draft PR. Thanks for letting me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants