-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design for remote parameter updater using cclient #2330
Conversation
|
||
Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md)) | ||
|
||
In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a member of parameter updater and call it's functions directly. In this design, we should also expose the api of RemoteParameterUpdater to python with swig. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hold a member -> hold the instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
Remote Parameter Updater manage parameters through remote parameter server with the client that communicate with pserver([The Client Library of Parameter Server Design Doc](pserver_client.md)) | ||
|
||
In PaddlePaddle Python V2 API, trainer is implemented in python, and the trainer will hold a member of parameter updater and call it's functions directly. In this design, we should also expose the api of RemoteParameterUpdater to python with swig. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should -> we will
感觉“we will"的语气更加确定,更合适design doc。"we should"在英语里我理解是我们应当做(但不一定立刻会做)。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
CI failed, probably need to use pre commit hook. |
… parameterupdater
… parameterupdater
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 such a fast implement!
go/pserver/cclient/CMakeLists.txt
Outdated
@@ -9,5 +9,14 @@ project(cxx_go C Go) | |||
include(golang) | |||
include(flags) | |||
|
|||
go_library(client STATIC) | |||
go_library(paddle_cclient STATIC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe named with pserver_cclient
will be better? since it is only used by the new trainer module and communicate with parameter server. paddle_cclient
would lead to an inference client at first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spend a lot of time working on CMake to make it work properly.@@
This is a good suggestion, I will rename it~
|
||
// create parameter server client. | ||
parameterClient_ = | ||
paddle_new_pserver_client((char *)pserverSpec_.c_str(), FLAGS_trainer_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a little different with current RemoteUpdater
here. we don't have a trainer id in paddle cloud.
LOG(INFO) << "finishBatch in, cost: " << cost; | ||
|
||
// send gradient to parameter server. | ||
paddle_send_grads(parameterClient_, *newGradients_, parameterSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable cost is useless here. cannot find anywhere update the newGradients_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall great!
@@ -32,17 +32,21 @@ function(GO_LIBRARY NAME BUILD_TYPE) | |||
# will use the local changes in Paddle rather than checkout Paddle | |||
# in github. | |||
add_custom_target(copyPaddle | |||
COMMAND ln -sf ${PADDLE_DIR} ${PADDLE_IN_GOPATH}) | |||
COMMAND rm -rf ${PADDLE_IN_GOPATH}/Paddle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange, -f
in ln
command force the link, so we probably don't need to remove the old link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方我换了一种link方式,在最后面指定了 目录名为Paddle,因为目前的做法,如果paddle所在目录改名字了,link的名字也会变,这样 go get就找不到目录了。
然后修改之后,在mac上两次编译会报Operation not permitted
,因为他试图在Paddle这个链接下再建一个链接,循环了。
${CMAKE_GO_FLAGS} ${GO_SOURCE} | ||
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}) | ||
|
||
add_custom_target(${NAME} ALL DEPENDS ${OUTPUT_DIR}/.timestamp ${ARGN}) | ||
add_dependencies(${NAME} goGet) | ||
add_dependencies(${NAME} copyPaddle) | ||
# add_dependencies(${NAME} goGet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It need to depend on go get to get all dependencies, are there anything wrong with this line?
|
||
// create parameter server client. | ||
parameterClient_ = | ||
paddle_new_pserver_client((char *)pserverSpec_.c_str(), FLAGS_trainer_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle_new_pserver_client
in cclient does not take any argument. I am curious how does this compile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/PaddlePaddle/Paddle/blob/develop/go/pserver/cclient/cclient.go#L103
func paddle_new_pserver_client(addrs *C.char, selected int) C.client {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's new, sorry I got confused.
paddle_get_params( | ||
parameterClient_, names_, newParameters_, (int)parameters_.size()); | ||
} | ||
paddle_finish_init_params(parameterClient_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only the trainer that inits params should call paddle_finish_init_params
. Please put this line to line 56.
Please see the graph in the linked section for reference: https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/pserver_client.md#trainer-selection-process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok~
num_passes, use_sparse_updater) | ||
else: | ||
parameter_updater = self.__create_new_remote_updater__( | ||
pserver_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cclient don't need pserver spec (the only required spec is parameter config), please see: https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/cluster_train/pserver_client.md#trainer-selection-process
Maybe you have an outdated version of cclient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the current code may be old here?
https://github.com/PaddlePaddle/Paddle/blob/develop/go/pserver/cclient/cclient.go#L103
func paddle_new_pserver_client(addrs *C.char, selected int) C.client {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's new, sorry I got confused.
closed and open a new pr #2413 |
refs: #2274
测试方法:
1,编译。
2,编译完成之后运行 go run ./go/cmd/pserver/pserver.go -port 3000 会在本地启动一个parameter server,监听3000端口。
3,python paddle/go/pserver/cclient/test/test_train.py