You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
After introducing #59, the previous cache layer is almost ineffective, which will downgrade the task scheduling performance very much especially for scheduling thousands of tasks.
Describe the solution you'd like
It's better to introduce CuratorTaskManager for make an active job be curated by only one scheduler. Then we can leverage cache for the active jobs to avoid serialization and deserialization cost.
To achieve this, we need the following things:
Introduce scheduler id for execution graph as its curator
Extract task status from ExecutionStage
Extract job status from ExecutionGraph
Introduce cache for the active execution graph in TaskManager
Make the executor grpc server able to know which scheduler the requests are from
Make the executor able to update task status to its curator scheduler
Error handling:
When one scheduler is dead and executors fail to update tasks to this scheduler, at the first stage, when other scheduler receives such task status update request, it can simply mark its related job failed. Later we can improve by stage-based recovering.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
yahoNanJing
changed the title
refine the scheduler state cache layer
Introduce CuratorTaskManager for make an active job be curated by only one scheduler
Aug 17, 2022
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
After introducing #59, the previous cache layer is almost ineffective, which will downgrade the task scheduling performance very much especially for scheduling thousands of tasks.
Describe the solution you'd like
It's better to introduce CuratorTaskManager for make an active job be curated by only one scheduler. Then we can leverage cache for the active jobs to avoid serialization and deserialization cost.
To achieve this, we need the following things:
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: