Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RAY_CHECK for actor re-creation non-fatal #5553

Merged

Conversation

pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Aug 27, 2019

Why are these changes needed?

Related issue number

This is a workaround for #5524.

Linter

  • I've run scripts/format.sh to lint the changes in this PR.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16575/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16577/
Test PASSed.

Copy link
Contributor

@stephanie-wang stephanie-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized there is probably a simpler way to do this. Right now, the worker gets assigned an actor ID right away, before the async lookup to the GCS. That makes it available to execute tasks. Instead, we could assign the actor ID in the lookup callback.

RAY_LOG(WARNING) << "Actor not in reconstructing state, most likely it "
<< "died before creation handler could run. Actor state is "
<< actor_entry->second.GetState();
return actor_info_ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also should not add the actor table entry to the GCS, right?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16580/
Test PASSed.

@pcmoritz
Copy link
Contributor Author

Ok, good idea, let me try the other approach :)

@pcmoritz pcmoritz force-pushed the workaround-dead-actor-reconstruction branch from 6496add to 180220a Compare August 28, 2019 20:02
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16608/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16609/
Test PASSed.

@pcmoritz pcmoritz merged commit e9d2d04 into ray-project:master Aug 29, 2019
@pcmoritz pcmoritz deleted the workaround-dead-actor-reconstruction branch August 29, 2019 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants