-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainer send term signal #11220
Trainer send term signal #11220
Conversation
… trainer_send_term_signal
… trainer_send_term_signal
… trainer_send_term_signal
… trainer_send_term_signal
… trainer_send_term_signal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -52,6 +52,9 @@ bool RequestSendHandler::Handle(const std::string& varname, | |||
if (varname == BATCH_BARRIER_MESSAGE) { | |||
VLOG(3) << "sync: recv batch barrier message"; | |||
rpc_server_->IncreaseBatchBarrier(kRequestSend); | |||
} else if (varname == COMPLETE_MESSAGE) { | |||
VLOG(3) << "sync: recv complete message"; | |||
rpc_server_->DecreaseClientNum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart!
Only |
@gongweibao You were right, we may need to add trainer id to the request when we want to make all RPC calls retriable. |
… trainer_send_term_signal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fix #11077
Call
exe.executor.complete()
to tell pserver to mark current trainer as finished.