-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gcs] Fix actor killing hang due to race condition #17634
Conversation
…7456)" (ray-project#17599)" This reverts commit 381ffdb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iycheng can you re-run the originally failing tests 3X times and lmk if that all passes?
<< " has been removed from creating map. Actor status " | ||
<< actor->GetState(); | ||
auto actor_id = status.ok() ? actor->GetActorID() : ActorID::Nil(); | ||
KillActorOnWorker(worker->GetAddress(), actor_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this another race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some small comments.
@rkooo567 I verified it. |
Retry one rllib test. Btw, is it possible to add an unit test on |
Let me investigate it next week. Feel a little bit tired on this PR. :( I've put it in my schedule. |
@iycheng sgtm! |
Can you add it to the sprint task? |
I did add that. |
…oject#18585)" (ray-project#18830)" This reverts commit 8dd3057.
Why are these changes needed?
Creating and killing an actor immediately will have a race condition.
It will leave the worker leak. This PR fixed it.
And there is another race condition:
This will leave the worker leaked too.
Related issue number
Closes #17369
Checks
scripts/format.sh
to lint the changes in this PR.