Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a task is passed to the global scheduler, if it is not received,… #1106

Merged
merged 5 commits into from
Oct 12, 2017
Merged

When a task is passed to the global scheduler, if it is not received,… #1106

merged 5 commits into from
Oct 12, 2017

Conversation

robertnishihara
Copy link
Collaborator

@robertnishihara robertnishihara commented Oct 11, 2017

… then try again.

This may solve #1105.

TODO before merging:

  • Update the comment in
    /* Subscribe to notifications about waiting tasks. TODO(rkn): this may need to
    * get tasks that were submitted to the database before the subscribe. */
    task_table_subscribe(g_state->db, NIL_ID, TASK_STATUS_WAITING,
    process_task_waiting, (void *) g_state, NULL, NULL,
    NULL);

@robertnishihara
Copy link
Collaborator Author

There's still a bug in this PR. On one of the machines, I see the message

[WARN] (/ray/src/common/state/redis.cc:914) No subscribers received the task_table_add message.

and the local scheduler on that machine appears defunct.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2086/
Test PASSed.


TaskSpec *spec = Task_task_spec(task);
CHECK(ActorID_equal(TaskSpec_actor_id(spec), NIL_ACTOR_ID));
handle_task_submitted(state, state->algorithm_state, spec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting choice. So you want to recirculate the task through the local scheduler's new task handling mechanism, even though the decision has already been made to forward it to the global scheduler? So the assumption is that Redis is busy, so let's try to keep it local again.

Copy link
Collaborator Author

@robertnishihara robertnishihara Oct 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is not that Redis is busy, but rather that the global scheduler has not started up yet.

We could alternatively simply call give_task_to_global_scheduler again.

@@ -948,7 +961,12 @@ void give_task_to_global_scheduler(LocalSchedulerState *state,
DCHECK(state->config.global_scheduler_exists);
Task *task = Task_alloc(spec, task_spec_size, TASK_STATUS_WAITING, NIL_ID);
DCHECK(state->db != NULL);
task_table_add_task(state->db, task, NULL, NULL, NULL);
auto retryInfo = RetryInfo{
.num_retries = 0, // This value is unused.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if num_retries is unused, what's the behavior on failure? How many retries (and what controls that)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If task_table_add_task fails (i.e., the global scheduler has not started yet), then give_task_to_global_scheduler_retry will be called. Which will lead to another call to task_table_add_task. This will happen indefinitely until the call to task_table_add_task succeeds (or the local scheduler decides to keep the task locally).

@@ -6,7 +6,7 @@ include(${CMAKE_CURRENT_LIST_DIR}/cmake/Common.cmake)

add_subdirectory(redis_module)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC -g")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to compile with debug info all the time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size of the executables is the only one that comes to mind, which we don't care about that much

@robertnishihara
Copy link
Collaborator Author

cc @stephanie-wang

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2090/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/2098/
Test PASSed.

@atumanov atumanov merged commit b585001 into ray-project:master Oct 12, 2017
@atumanov atumanov deleted the retrypassingtoglobal branch October 12, 2017 07:05
architkulkarni added a commit that referenced this pull request Oct 31, 2023
I would like to enhance the user experience for installing Helm chart RBAC in KubeRay. Since there is'n an existing document for this, I integrate these PRs including KubeRay #1190, #1162, #1106, and create a comprehensive document to assist users in quickly and easily setting up RBAC resource.

---------

Signed-off-by: evalaiyc98 <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Yu-Chen Lai <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Archit Kulkarni <[email protected]>
Co-authored-by: angelinalg <[email protected]>
kevin85421 added a commit to kevin85421/ray that referenced this pull request Nov 1, 2023
I would like to enhance the user experience for installing Helm chart RBAC in KubeRay. Since there is'n an existing document for this, I integrate these PRs including KubeRay ray-project#1190, ray-project#1162, ray-project#1106, and create a comprehensive document to assist users in quickly and easily setting up RBAC resource.

---------

Signed-off-by: evalaiyc98 <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Yu-Chen Lai <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Archit Kulkarni <[email protected]>
Co-authored-by: angelinalg <[email protected]>
vitsai pushed a commit that referenced this pull request Nov 2, 2023
I would like to enhance the user experience for installing Helm chart RBAC in KubeRay. Since there is'n an existing document for this, I integrate these PRs including KubeRay #1190, #1162, #1106, and create a comprehensive document to assist users in quickly and easily setting up RBAC resource.

---------

Signed-off-by: evalaiyc98 <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Yu-Chen Lai <[email protected]>
Co-authored-by: Yu-Chen Lai <[email protected]>
Co-authored-by: Archit Kulkarni <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants