-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-node caffe #3252
Multi-node caffe #3252
Conversation
@cque7 Thanks for the effort! One of our major goal is to coordinate and reconcile different types of parallelism (data / model / hybrid, CPU / GPU) such as #2903 and #2219. One issue is to ensure data coordination in parallel. Currently in Caffe there are multiple types of data layers, some load data via random access (this is the best case to parallelize) and other can only do sequential access, and some generate data on the fly. Also, there can be more than one data layer in a net, so the multiple data layers need to coordinate with each other. This is one major difficulty we have for parallelism (achieving maximum possible scale in an elegant way without losing generality). I shall try to review this PR after CVPR deadline (11/06/15). |
Thanks @ronghanghu for the reply. We look forward to get comments from community to make it better. Another trick we used is called "dynamic pipelines". We maintain a pool of free solvers in each node (the solvers share parameters), and each message (pipeline) in the system is assigned to a unique msg_id. When a "forward" packets comes to a node, the node selects a free solver and associate it with the msg_id in the message. Doing this in the "forward" flow, we can create a virtual pipeline by connecting the sub-solvers in different nodes together. In backward, we just need to find the corresponding solver according to the msg_id, msg_id can also be aliased as pipeline_id here. [1] Alex, One weird trick for parallelizing convolutional neural networks |
@cque7 Are you doing this for Intel? |
@bhack yes, it's a part of our open source effort to make deep learning frameworks such as caffe work well on Intel platforms. |
@cque7 Nvidia, AMD and Intel around here. What do you think of DSL like Halide lang? |
/cc @longjon Any plan with Halide for single node vectorization for multi back end optimizzation and scheduling? |
@bhack scheduling is an interesting problem, in the patch I try to deal with this problem at system level. Each node in the system consists of a polling thread and several worker threads. Worker threads can run in CPU or GPU context. Polling thread is responsible for scheduling by dispatching the incoming message to different worker thread (each thread runs a different solver, parameters are shared among these threads). In nodes which run conv. layers or partial FC layers, the worker threads are only responsible for do forward and backward, and there is a Parameter thread which is used to communicate with parameter server and update the parameters. |
@cque7 Interesting. Have you seen Caffe experiments in http://arxiv.org/abs/1506.08272? At device level what are your plan? Rely on Opencl like in #2610? /cc @cypof @naibaf7 |
@bhack we are planing to test the proposal on devices like Intel's Xeon E3 servers which have integrated GPUs. For OCL implementations, we are looking at pull requests like #2610 , inside Intel we also have some OCL kernels which are optimized for Intel's platform. I think we will test the multi-node implementation on both of these kernels. |
@cque7 That's why i supposed if could be possible to define kernels in a DSL and let vendors optimize targets. I.e. Halide can already produce for Cuda, Opencl, SSE, Neon, Android Renderscript and starting on Apple Metal targets. Also OpenCV it is going in a similar direction. See opencv/opencv#5510 |
Closing this PR since we are moving it to #3441 |
We are working to scale caffe to hundreds of nodes in a cluster. This version of change contains:
Data parallel in convolution layers. Conv. weights are exchanged via parameter server.
Model parallel for fully connected layers. A model server is created as a centralized scheduler to split big models (e.g. Alex / VGG net) into smaller ones and generate routing tables to connect the full connected nodes together.