Remove redundant fields in ExecutorManager #728

yahoNanJing · 2023-03-28T12:59:38Z

Which issue does this PR close?

Closes #723.

Rationale for this change

After the cluster state refactored by #658, the state cache part in ExecutorManager becomes redundant. It's better to remove them.

What changes are included in this PR?

The following fields in ExecutorManager are removed:

slots_policy
executor_metadata
executors_heartbeat
executor_data

And the RoundRobinLocal slot policy is removed.
And the SlotsPolicy is renamed to TaskDistribution.
Removed buggy executor check for pull-staged task scheduling

Are there any user-facing changes?

yahoNanJing · 2023-03-28T13:08:14Z

Hi @thinkharderdev, could you help review this PR?

thinkharderdev

I think most of the cleanup makes sense but I don't think we should remove the caching in ExecutorManager

thinkharderdev · 2023-03-28T15:58:29Z

ballista/scheduler/src/state/executor_manager.rs

So this removes the local in-memory cache of heartbeats which is there to avoid having to fetch them from shared state. This is indeed redundant in the case where you are using fully in-memory state but not if you are using shared state across multiple executors.

For example, we use redis for managing the cluster state and use pub/sub channels to propagate heartbeats to all executors so we don't have to constantly fetch them from redis

Agree. How about reusing InMemoryClusterState for the KeyValueState? There are two main reasons for removing the cache part in ExecutorManager:

It seems redundant for InMemoryClusterState

It's easily to miss some part for ensuring the correctness of the cached things.

Previously I thought it may be unnecessary to cache the cluster state when using state storage. If it's necessary, I thing it's better to introduce the InMemoryClusterState for the KeyValueState directly. Then for the cache part, we can focus on InMemoryClusterState. To achieve this, I will raise a few followup commits to this PR.

yahoNanJing · 2023-03-29T11:03:11Z

Hi @thinkharderdev, now I copy the cache logic into the KeyValueState, could you help have a check?

thinkharderdev · 2023-03-29T11:21:36Z

ballista/scheduler/src/cluster/kv.rs

@@ -57,6 +56,8 @@ pub struct KeyValueState<
 > {
    /// Underlying `KeyValueStore`
    store: S,
+    /// ExecutorHeartbeat cache, executor_id -> ExecutorHeartbeat
+    executor_heartbeats: Arc<DashMap<String, ExecutorHeartbeat>>,


I think we need to also cache ExecutorMetadata and ExecutorData here as well to preserve the functionality previously in ExecutorManager

For ExecutorMetadata, it can be cached. However, for ExecutorData, I don't think it should be cached, as it's frequently changed and it's hard to make it consistent with the one stored in backend storage when there are multiple active schedulers.

Actually, I don't think it's necessary to introduce multiple active schedulers in the Ballista cluster. Maybe only HA is needed. If so, we can cache the ExecutorData with executor available slots to avoid frequent visit to the backend storage.

Ah, yeah I don't think ExecutorData is used at all anymore actually. We use ExecutorTaskSlots to store the task slots for storing available slots for reservation purposes. So I think we should be fine just caching ExecutorMetadata

Actually, I don't think it's necessary to introduce multiple active schedulers in the Ballista cluster.

We run multiple active schedulers. The scheduler is doing a non-trivial amount of work in scheduling and it's important to us to be able to scale that layer horizontally.

For the horizontally scaling, actually, I haven't met any bottlenecks of the scheduler after introducing the optimizations previously. maybe single scheduler is enough.

We have :). The problem is mostly around the cost of plan serialization. We frequently see execution plans with 50k+ files and the plan is very large and causes a lot of CPU and memory overhead to serialize.

Aside from that, the other issue is that doing zero-downtime deployments is much easier with multiple active schedulers. It is a solvable problem using leader election but for our use case it was preferable to just run multiple active schedulers and solve both the deployment and scalability issue.

I think for the plan, it's also been cached and we only need to do the serialization only once for one query. So do you still have the problem?

The reason I do not prefer the multiple active schedulers is the contention for the execution slots, which will lead to inefficient slots updates and the infeasibility of future consistent hashing based task assignments.

If the multiple active schedulers is required, I think your current design of ExecutionReservation is reasonable. Since the consistent hashing based task assignments may be only possible for the single active scheduler without slots contention, we have to use different policy for different scheduler deployments

thinkharderdev

Nie work. Thanks @yahoNanJing!

kyotoYaho added 3 commits March 28, 2023 15:31

Remove redundant fields in ExecutorManager

27188c2

Remove RoundRobinLocal slot policy

37eb702

Rename SlotsPolicy to TaskDistribution

ee3a914

yahoNanJing force-pushed the issue-723 branch from d6d5f91 to ee3a914 Compare March 28, 2023 13:00

yahoNanJing requested a review from thinkharderdev March 28, 2023 13:01

thinkharderdev reviewed Mar 28, 2023

View reviewed changes

kyotoYaho added 2 commits March 29, 2023 17:38

Introduce executor heartbeat cache to KeyValueState

e222f81

Remove buggy executor check for pull-staged task scheduling

db4adc2

thinkharderdev reviewed Mar 29, 2023

View reviewed changes

Add ExecutorMetadata cache for KeyValueState

fa7d44b

thinkharderdev approved these changes Mar 30, 2023

View reviewed changes

yahoNanJing merged commit 0b496e5 into apache:main Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove redundant fields in ExecutorManager #728

Remove redundant fields in ExecutorManager #728

yahoNanJing commented Mar 28, 2023 •

edited

Loading

yahoNanJing commented Mar 28, 2023

thinkharderdev left a comment

thinkharderdev Mar 28, 2023

yahoNanJing Mar 28, 2023 •

edited

Loading

yahoNanJing commented Mar 29, 2023

thinkharderdev Mar 29, 2023

yahoNanJing Mar 30, 2023

thinkharderdev Mar 30, 2023

yahoNanJing Mar 30, 2023 •

edited

Loading

thinkharderdev Mar 30, 2023

yahoNanJing Mar 30, 2023 •

edited

Loading

yahoNanJing Mar 30, 2023 •

edited

Loading

thinkharderdev left a comment

Remove redundant fields in ExecutorManager #728

Remove redundant fields in ExecutorManager #728

Conversation

yahoNanJing commented Mar 28, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

yahoNanJing commented Mar 28, 2023

thinkharderdev left a comment

Choose a reason for hiding this comment

thinkharderdev Mar 28, 2023

Choose a reason for hiding this comment

yahoNanJing Mar 28, 2023 • edited Loading

Choose a reason for hiding this comment

yahoNanJing commented Mar 29, 2023

thinkharderdev Mar 29, 2023

Choose a reason for hiding this comment

yahoNanJing Mar 30, 2023

Choose a reason for hiding this comment

thinkharderdev Mar 30, 2023

Choose a reason for hiding this comment

yahoNanJing Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

thinkharderdev Mar 30, 2023

Choose a reason for hiding this comment

yahoNanJing Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

yahoNanJing Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

thinkharderdev left a comment

Choose a reason for hiding this comment

yahoNanJing commented Mar 28, 2023 •

edited

Loading

yahoNanJing Mar 28, 2023 •

edited

Loading

yahoNanJing Mar 30, 2023 •

edited

Loading

yahoNanJing Mar 30, 2023 •

edited

Loading

yahoNanJing Mar 30, 2023 •

edited

Loading