-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When a task is passed to the global scheduler, if it is not received,… #1106
When a task is passed to the global scheduler, if it is not received,… #1106
Conversation
There's still a bug in this PR. On one of the machines, I see the message
and the local scheduler on that machine appears defunct. |
Merged build finished. Test PASSed. |
Test PASSed. |
|
||
TaskSpec *spec = Task_task_spec(task); | ||
CHECK(ActorID_equal(TaskSpec_actor_id(spec), NIL_ACTOR_ID)); | ||
handle_task_submitted(state, state->algorithm_state, spec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting choice. So you want to recirculate the task through the local scheduler's new task handling mechanism, even though the decision has already been made to forward it to the global scheduler? So the assumption is that Redis is busy, so let's try to keep it local again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption is not that Redis is busy, but rather that the global scheduler has not started up yet.
We could alternatively simply call give_task_to_global_scheduler
again.
@@ -948,7 +961,12 @@ void give_task_to_global_scheduler(LocalSchedulerState *state, | |||
DCHECK(state->config.global_scheduler_exists); | |||
Task *task = Task_alloc(spec, task_spec_size, TASK_STATUS_WAITING, NIL_ID); | |||
DCHECK(state->db != NULL); | |||
task_table_add_task(state->db, task, NULL, NULL, NULL); | |||
auto retryInfo = RetryInfo{ | |||
.num_retries = 0, // This value is unused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if num_retries
is unused, what's the behavior on failure? How many retries (and what controls that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If task_table_add_task
fails (i.e., the global scheduler has not started yet), then give_task_to_global_scheduler_retry
will be called. Which will lead to another call to task_table_add_task
. This will happen indefinitely until the call to task_table_add_task
succeeds (or the local scheduler decides to keep the task locally).
@@ -6,7 +6,7 @@ include(${CMAKE_CURRENT_LIST_DIR}/cmake/Common.cmake) | |||
|
|||
add_subdirectory(redis_module) | |||
|
|||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC") | |||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC -g") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to compile with debug info all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size of the executables is the only one that comes to mind, which we don't care about that much
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
I would like to enhance the user experience for installing Helm chart RBAC in KubeRay. Since there is'n an existing document for this, I integrate these PRs including KubeRay #1190, #1162, #1106, and create a comprehensive document to assist users in quickly and easily setting up RBAC resource. --------- Signed-off-by: evalaiyc98 <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Yu-Chen Lai <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Archit Kulkarni <[email protected]> Co-authored-by: angelinalg <[email protected]>
I would like to enhance the user experience for installing Helm chart RBAC in KubeRay. Since there is'n an existing document for this, I integrate these PRs including KubeRay ray-project#1190, ray-project#1162, ray-project#1106, and create a comprehensive document to assist users in quickly and easily setting up RBAC resource. --------- Signed-off-by: evalaiyc98 <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Yu-Chen Lai <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Archit Kulkarni <[email protected]> Co-authored-by: angelinalg <[email protected]>
I would like to enhance the user experience for installing Helm chart RBAC in KubeRay. Since there is'n an existing document for this, I integrate these PRs including KubeRay #1190, #1162, #1106, and create a comprehensive document to assist users in quickly and easily setting up RBAC resource. --------- Signed-off-by: evalaiyc98 <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Yu-Chen Lai <[email protected]> Co-authored-by: Yu-Chen Lai <[email protected]> Co-authored-by: Archit Kulkarni <[email protected]> Co-authored-by: angelinalg <[email protected]>
… then try again.
This may solve #1105.
TODO before merging:
ray/src/global_scheduler/global_scheduler.cc
Lines 419 to 423 in 1e0ab3d