Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distributed demo stuck at 1st pass #7422

Closed
putcn opened this issue Jan 10, 2018 · 4 comments
Closed

distributed demo stuck at 1st pass #7422

putcn opened this issue Jan 10, 2018 · 4 comments
Assignees

Comments

@putcn
Copy link
Contributor

putcn commented Jan 10, 2018

I was testing notest_recognize_digits_conv_dist.py, looks the training will get stuck at 1st pass in the trainer. when I restart the trainer, it works.
I started trainer with the following command and output:

chenxi@idgsim-gpu-001:~$ export TRAINING_ROLE=TRAINER SERVER_ENDPOINT=127.0.0.1:6188 PSERVERS=127.0.0.1:6188 LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/lib/ CUDA_VISIBLE_DEVICES=1 GLOG_v=3;python notest_recognize_digits_conv_dist.py 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0110 14:58:10.510463 48981 init.cc:39] Init commandline: notest_recognize_digits_conv_dist.py --tryfromenv=use_pinned_memory,fraction_of_gpu_memory_to_use 
I0110 14:58:11.246450 48981 dynamic_loader.cc:67] Try to find library: libcublas.so from default system path.
I0110 14:58:11.477094 48981 dynamic_loader.cc:67] Try to find library: libcudnn.so from default system path.
I0110 14:58:11.834908 48981 op_desc.cc:348] CompileTime infer shape on gaussian_random
I0110 14:58:11.835577 48981 op_desc.cc:348] CompileTime infer shape on conv2d_cudnn
I0110 14:58:11.835769 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.835934 48981 op_desc.cc:348] CompileTime infer shape on elementwise_add
I0110 14:58:11.836058 48981 op_desc.cc:348] CompileTime infer shape on relu
I0110 14:58:11.836232 48981 op_desc.cc:348] CompileTime infer shape on pool2d
I0110 14:58:11.836442 48981 op_desc.cc:348] CompileTime infer shape on gaussian_random
I0110 14:58:11.836592 48981 op_desc.cc:348] CompileTime infer shape on conv2d_cudnn
I0110 14:58:11.836745 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.836877 48981 op_desc.cc:348] CompileTime infer shape on elementwise_add
I0110 14:58:11.836971 48981 op_desc.cc:348] CompileTime infer shape on relu
I0110 14:58:11.837116 48981 op_desc.cc:348] CompileTime infer shape on pool2d
I0110 14:58:11.837405 48981 op_desc.cc:348] CompileTime infer shape on uniform_random
I0110 14:58:11.837568 48981 op_desc.cc:348] CompileTime infer shape on mul
I0110 14:58:11.837586 48981 mul_op.cc:36] mul operator x.shape=-1, 50, 4, 4 y.shape=800, 10 x_num_col_dims=1 y_num_col_dims=1
I0110 14:58:11.837731 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.837872 48981 op_desc.cc:348] CompileTime infer shape on elementwise_add
I0110 14:58:11.837973 48981 op_desc.cc:348] CompileTime infer shape on softmax
I0110 14:58:11.838091 48981 op_desc.cc:348] CompileTime infer shape on cross_entropy
I0110 14:58:11.838289 48981 op_desc.cc:348] CompileTime infer shape on mean
I0110 14:58:11.839025 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.839105 48981 op_desc.cc:348] CompileTime infer shape on mean_grad
I0110 14:58:11.839166 48981 op_desc.cc:348] CompileTime infer shape on cross_entropy_grad
I0110 14:58:11.839227 48981 op_desc.cc:348] CompileTime infer shape on softmax_grad
I0110 14:58:11.839289 48981 op_desc.cc:348] CompileTime infer shape on elementwise_add_grad
I0110 14:58:11.839365 48981 op_desc.cc:348] CompileTime infer shape on mul_grad
I0110 14:58:11.839435 48981 op_desc.cc:348] CompileTime infer shape on pool2d_grad
I0110 14:58:11.839488 48981 op_desc.cc:348] CompileTime infer shape on relu_grad
I0110 14:58:11.839545 48981 op_desc.cc:348] CompileTime infer shape on elementwise_add_grad
I0110 14:58:11.839617 48981 op_desc.cc:348] CompileTime infer shape on conv2d_cudnn_grad
I0110 14:58:11.839681 48981 op_desc.cc:348] CompileTime infer shape on pool2d_grad
I0110 14:58:11.839732 48981 op_desc.cc:348] CompileTime infer shape on relu_grad
I0110 14:58:11.839788 48981 op_desc.cc:348] CompileTime infer shape on elementwise_add_grad
I0110 14:58:11.839851 48981 op_desc.cc:348] CompileTime infer shape on conv2d_cudnn_grad
I0110 14:58:11.840518 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.840672 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.840821 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.840970 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.841112 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.841251 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.841409 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.841543 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.841681 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.841814 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.841949 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.842084 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.842217 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.842348 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.842654 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.842831 48981 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:11.843003 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.843156 48981 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:11.843323 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.843477 48981 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:11.843636 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.843786 48981 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:11.843940 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.844089 48981 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:11.844249 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.844398 48981 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:11.844481 48981 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:11.844549 48981 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:11.844817 48981 op_desc.cc:348] CompileTime infer shape on top_k
I0110 14:58:11.844950 48981 op_desc.cc:348] CompileTime infer shape on accuracy
I0110 14:58:11.845085 48981 op_desc.cc:348] CompileTime infer shape on cast
I0110 14:58:11.845201 48981 op_desc.cc:348] CompileTime infer shape on cast
I0110 14:58:11.845293 48981 op_desc.cc:348] CompileTime infer shape on sum
I0110 14:58:11.845367 48981 op_desc.cc:348] CompileTime infer shape on sum
127.0.0.1:6188 TRAINER 127.0.0.1:6188

 starting trainer
I0110 14:58:11.876178 48981 block_desc.cc:102] deleting var beta1_pow_acc_0
I0110 14:58:11.876188 48981 block_desc.cc:102] deleting var beta2_pow_acc_0
I0110 14:58:11.876191 48981 block_desc.cc:102] deleting var conv2d_0.w_0@GRAD
I0110 14:58:11.876196 48981 block_desc.cc:102] deleting var learning_rate_0
I0110 14:58:11.876201 48981 block_desc.cc:102] deleting var moment1_0
I0110 14:58:11.876204 48981 block_desc.cc:102] deleting var moment2_0
I0110 14:58:11.876209 48981 block_desc.cc:102] deleting var conv2d_0.w_0
I0110 14:58:11.876215 48981 block_desc.cc:102] deleting var beta1_pow_acc_0
I0110 14:58:11.876219 48981 block_desc.cc:102] deleting var beta2_pow_acc_0
I0110 14:58:11.876224 48981 block_desc.cc:102] deleting var fc_0.b_0@GRAD
I0110 14:58:11.876229 48981 block_desc.cc:102] deleting var learning_rate_1
I0110 14:58:11.876232 48981 block_desc.cc:102] deleting var moment1_1
I0110 14:58:11.876236 48981 block_desc.cc:102] deleting var moment2_1
I0110 14:58:11.876241 48981 block_desc.cc:102] deleting var fc_0.b_0
I0110 14:58:11.876247 48981 block_desc.cc:102] deleting var beta1_pow_acc_0
I0110 14:58:11.876252 48981 block_desc.cc:102] deleting var beta2_pow_acc_0
I0110 14:58:11.876256 48981 block_desc.cc:102] deleting var conv2d_0.b_0@GRAD
I0110 14:58:11.876261 48981 block_desc.cc:102] deleting var learning_rate_2
I0110 14:58:11.876266 48981 block_desc.cc:102] deleting var moment1_2
I0110 14:58:11.876269 48981 block_desc.cc:102] deleting var moment2_2
I0110 14:58:11.876274 48981 block_desc.cc:102] deleting var conv2d_0.b_0
I0110 14:58:11.876279 48981 block_desc.cc:102] deleting var beta1_pow_acc_0
I0110 14:58:11.876283 48981 block_desc.cc:102] deleting var beta2_pow_acc_0
I0110 14:58:11.876286 48981 block_desc.cc:102] deleting var fc_0.w_0@GRAD
I0110 14:58:11.876291 48981 block_desc.cc:102] deleting var learning_rate_3
I0110 14:58:11.876296 48981 block_desc.cc:102] deleting var moment1_3
I0110 14:58:11.876299 48981 block_desc.cc:102] deleting var moment2_3
I0110 14:58:11.876304 48981 block_desc.cc:102] deleting var fc_0.w_0
I0110 14:58:11.876309 48981 block_desc.cc:102] deleting var beta1_pow_acc_0
I0110 14:58:11.876314 48981 block_desc.cc:102] deleting var beta2_pow_acc_0
I0110 14:58:11.876318 48981 block_desc.cc:102] deleting var conv2d_1.b_0@GRAD
I0110 14:58:11.876323 48981 block_desc.cc:102] deleting var learning_rate_4
I0110 14:58:11.876327 48981 block_desc.cc:102] deleting var moment1_4
I0110 14:58:11.876332 48981 block_desc.cc:102] deleting var moment2_4
I0110 14:58:11.876336 48981 block_desc.cc:102] deleting var conv2d_1.b_0
I0110 14:58:11.876341 48981 block_desc.cc:102] deleting var beta1_pow_acc_0
I0110 14:58:11.876345 48981 block_desc.cc:102] deleting var beta2_pow_acc_0
I0110 14:58:11.876350 48981 block_desc.cc:102] deleting var conv2d_1.w_0@GRAD
I0110 14:58:11.876355 48981 block_desc.cc:102] deleting var learning_rate_5
I0110 14:58:11.876359 48981 block_desc.cc:102] deleting var moment1_5
I0110 14:58:11.876364 48981 block_desc.cc:102] deleting var moment2_5
I0110 14:58:11.876368 48981 block_desc.cc:102] deleting var conv2d_1.w_0
I0110 14:58:11.876374 48981 block_desc.cc:102] deleting var beta1_pow_acc_0
I0110 14:58:11.876379 48981 block_desc.cc:102] deleting var beta2_pow_acc_0
I0110 14:58:11.877154 48981 scope.cc:43] Create variable learning_rate_1
I0110 14:58:11.877167 48981 executor.cc:81] Create Variable learning_rate_1 global, which pointer is 0x7f7ac8623e60
I0110 14:58:11.877174 48981 scope.cc:43] Create variable moment2_5
I0110 14:58:11.877179 48981 executor.cc:81] Create Variable moment2_5 global, which pointer is 0x7f7ac861bed0
I0110 14:58:11.877184 48981 scope.cc:43] Create variable moment2_4
I0110 14:58:11.877188 48981 executor.cc:81] Create Variable moment2_4 global, which pointer is 0x7f7ac862bc90
I0110 14:58:11.877193 48981 scope.cc:43] Create variable beta2_pow_acc_0
I0110 14:58:11.877198 48981 executor.cc:81] Create Variable beta2_pow_acc_0 global, which pointer is 0x7f7ac862bdc0
I0110 14:58:11.877203 48981 scope.cc:43] Create variable moment1_2
I0110 14:58:11.877208 48981 executor.cc:81] Create Variable moment1_2 global, which pointer is 0x7f7ac862c040
I0110 14:58:11.877213 48981 scope.cc:43] Create variable fc_0.w_0
I0110 14:58:11.877218 48981 executor.cc:81] Create Variable fc_0.w_0 global, which pointer is 0x7f7ac862c1b0
I0110 14:58:11.877223 48981 scope.cc:43] Create variable learning_rate_5
I0110 14:58:11.877228 48981 executor.cc:81] Create Variable learning_rate_5 global, which pointer is 0x7f7ac862c2e0
I0110 14:58:11.877233 48981 scope.cc:43] Create variable fc_0.b_0
I0110 14:58:11.877238 48981 executor.cc:81] Create Variable fc_0.b_0 global, which pointer is 0x7f7ac862c410
I0110 14:58:11.877243 48981 scope.cc:43] Create variable moment1_0
I0110 14:58:11.877248 48981 executor.cc:81] Create Variable moment1_0 global, which pointer is 0x7f7ac862c540
I0110 14:58:11.877252 48981 scope.cc:43] Create variable learning_rate_3
I0110 14:58:11.877257 48981 executor.cc:81] Create Variable learning_rate_3 global, which pointer is 0x7f7ac862c5a0
I0110 14:58:11.877264 48981 scope.cc:43] Create variable moment1_1
I0110 14:58:11.877267 48981 executor.cc:81] Create Variable moment1_1 global, which pointer is 0x7f7ac862c6f0
I0110 14:58:11.877272 48981 scope.cc:43] Create variable learning_rate_0
I0110 14:58:11.877276 48981 executor.cc:81] Create Variable learning_rate_0 global, which pointer is 0x7f7ac862c820
I0110 14:58:11.877291 48981 scope.cc:43] Create variable moment1_4
I0110 14:58:11.877296 48981 executor.cc:81] Create Variable moment1_4 global, which pointer is 0x7f7ac862c0e0
I0110 14:58:11.877301 48981 scope.cc:43] Create variable beta1_pow_acc_0
I0110 14:58:11.877305 48981 executor.cc:81] Create Variable beta1_pow_acc_0 global, which pointer is 0x7f7ac862c9a0
I0110 14:58:11.877310 48981 scope.cc:43] Create variable fetch
I0110 14:58:11.877315 48981 executor.cc:81] Create Variable fetch global, which pointer is 0x7f7ac862cc10
I0110 14:58:11.877321 48981 scope.cc:43] Create variable learning_rate_2
I0110 14:58:11.877326 48981 executor.cc:81] Create Variable learning_rate_2 global, which pointer is 0x7f7ac862cd80
I0110 14:58:11.877331 48981 scope.cc:43] Create variable moment2_0
I0110 14:58:11.877334 48981 executor.cc:81] Create Variable moment2_0 global, which pointer is 0x7f7ac862ceb0
I0110 14:58:11.877341 48981 scope.cc:43] Create variable conv2d_0.w_0
I0110 14:58:11.877344 48981 executor.cc:81] Create Variable conv2d_0.w_0 global, which pointer is 0x7f7ac862cfe0
I0110 14:58:11.877349 48981 scope.cc:43] Create variable moment2_1
I0110 14:58:11.877354 48981 executor.cc:81] Create Variable moment2_1 global, which pointer is 0x7f7ac862bac0
I0110 14:58:11.877359 48981 scope.cc:43] Create variable learning_rate_4
I0110 14:58:11.877364 48981 executor.cc:81] Create Variable learning_rate_4 global, which pointer is 0x7f7ac862bb20
I0110 14:58:11.877370 48981 scope.cc:43] Create variable feed
I0110 14:58:11.877374 48981 executor.cc:81] Create Variable feed global, which pointer is 0x7f7ac862d200
I0110 14:58:11.877379 48981 scope.cc:43] Create variable conv2d_0.b_0
I0110 14:58:11.877384 48981 executor.cc:81] Create Variable conv2d_0.b_0 global, which pointer is 0x7f7ac862d2a0
I0110 14:58:11.877390 48981 scope.cc:43] Create variable moment2_2
I0110 14:58:11.877395 48981 executor.cc:81] Create Variable moment2_2 global, which pointer is 0x7f7ac862d3b0
I0110 14:58:11.877400 48981 scope.cc:43] Create variable conv2d_1.w_0
I0110 14:58:11.877405 48981 executor.cc:81] Create Variable conv2d_1.w_0 global, which pointer is 0x7f7ac862d410
I0110 14:58:11.877410 48981 scope.cc:43] Create variable conv2d_1.b_0
I0110 14:58:11.877415 48981 executor.cc:81] Create Variable conv2d_1.b_0 global, which pointer is 0x7f7ac862b6f0
I0110 14:58:11.877419 48981 scope.cc:43] Create variable moment1_3
I0110 14:58:11.877424 48981 executor.cc:81] Create Variable moment1_3 global, which pointer is 0x7f7ac862b800
I0110 14:58:11.877429 48981 scope.cc:43] Create variable moment1_5
I0110 14:58:11.877434 48981 executor.cc:81] Create Variable moment1_5 global, which pointer is 0x7f7ac862f590
I0110 14:58:11.877439 48981 scope.cc:43] Create variable moment2_3
I0110 14:58:11.877444 48981 executor.cc:81] Create Variable moment2_3 global, which pointer is 0x7f7ac862f6a0
I0110 14:58:11.877465 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_5]}.
I0110 14:58:11.877684 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_4]}.
I0110 14:58:11.877754 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_3]}.
I0110 14:58:11.877784 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_2]}.
I0110 14:58:11.877806 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_1]}.
I0110 14:58:11.877830 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_0]}.
I0110 14:58:11.877854 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_5]}.
I0110 14:58:11.877952 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_5]}.
I0110 14:58:11.878051 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_4]}.
I0110 14:58:11.878075 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_4]}.
I0110 14:58:11.878098 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_3]}.
I0110 14:58:11.878140 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_3]}.
I0110 14:58:11.878188 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_2]}.
I0110 14:58:11.878212 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_2]}.
I0110 14:58:11.878235 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_1]}.
I0110 14:58:11.878258 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_1]}.
I0110 14:58:11.878279 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_0]}.
I0110 14:58:11.878309 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_0]}.
I0110 14:58:11.878332 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[beta2_pow_acc_0]}.
I0110 14:58:11.878355 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[beta1_pow_acc_0]}.
I0110 14:58:11.878377 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[fc_0.b_0]}.
I0110 14:58:11.878407 48981 executor.cc:102] Op(uniform_random), inputs:{}, outputs:{Out[fc_0.w_0]}.
I0110 14:58:11.878589 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[conv2d_1.b_0]}.
I0110 14:58:11.878620 48981 executor.cc:102] Op(gaussian_random), inputs:{}, outputs:{Out[conv2d_1.w_0]}.
I0110 14:58:11.879874 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[conv2d_0.b_0]}.
I0110 14:58:11.879906 48981 executor.cc:102] Op(gaussian_random), inputs:{}, outputs:{Out[conv2d_0.w_0]}.

 started trainer default program, starting passes
I0110 14:58:11.880482 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.880715 48981 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:11.880993 48981 executor.cc:81] Create Variable feed global, which pointer is 0x7f7ac862d200
I0110 14:58:11.881012 48981 scope.cc:43] Create variable accuracy_0_0_total
I0110 14:58:11.881019 48981 executor.cc:81] Create Variable accuracy_0_0_total global, which pointer is 0x7f7ac8632340
I0110 14:58:11.881026 48981 executor.cc:81] Create Variable fetch global, which pointer is 0x7f7ac862cc10
I0110 14:58:11.881032 48981 scope.cc:43] Create variable accuracy_0_1_correct
I0110 14:58:11.881038 48981 executor.cc:81] Create Variable accuracy_0_1_correct global, which pointer is 0x7f7ac86332d0
I0110 14:58:11.881063 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[accuracy_0_0_total]}.
I0110 14:58:11.881105 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[accuracy_0_1_correct]}.

 exe.run
I0110 14:58:12.008682 48981 feed_fetch_method.h:28] SetFeedVariable name=feed index=0
I0110 14:58:12.008818 48981 feed_fetch_method.h:28] SetFeedVariable name=feed index=1
I0110 14:58:12.009052 48981 executor.cc:81] Create Variable moment2_1 global, which pointer is 0x7f7ac862bac0
I0110 14:58:12.009070 48981 executor.cc:81] Create Variable fetch global, which pointer is 0x7f7ac862cc10
I0110 14:58:12.009076 48981 executor.cc:81] Create Variable moment2_0 global, which pointer is 0x7f7ac862ceb0
I0110 14:58:12.009081 48981 executor.cc:81] Create Variable beta2_pow_acc_0 global, which pointer is 0x7f7ac862bdc0
I0110 14:58:12.009088 48981 executor.cc:81] Create Variable beta1_pow_acc_0 global, which pointer is 0x7f7ac862c9a0
I0110 14:58:12.009095 48981 scope.cc:43] Create variable conv2d_0.w_0@GRAD
I0110 14:58:12.009101 48981 executor.cc:86] Create Variable conv2d_0.w_0@GRAD locally, which pointer is 0x7f7ac8665120
I0110 14:58:12.009111 48981 scope.cc:43] Create variable pool2d_0.tmp_0
I0110 14:58:12.009121 48981 executor.cc:86] Create Variable pool2d_0.tmp_0 locally, which pointer is 0x7f7ac8664df0
I0110 14:58:12.009130 48981 scope.cc:43] Create variable conv2d_0.b_0@GRAD
I0110 14:58:12.009137 48981 executor.cc:86] Create Variable conv2d_0.b_0@GRAD locally, which pointer is 0x7f7ac8665200
I0110 14:58:12.009146 48981 scope.cc:43] Create variable pixel
I0110 14:58:12.009153 48981 executor.cc:86] Create Variable pixel locally, which pointer is 0x7f7ac8665010
I0110 14:58:12.009162 48981 scope.cc:43] Create variable conv2d_0.tmp_0@GRAD
I0110 14:58:12.009171 48981 executor.cc:86] Create Variable conv2d_0.tmp_0@GRAD locally, which pointer is 0x7f7ac8665050
I0110 14:58:12.009178 48981 scope.cc:43] Create variable conv2d_1.b_0@GRAD
I0110 14:58:12.009186 48981 executor.cc:86] Create Variable conv2d_1.b_0@GRAD locally, which pointer is 0x7f7ac86651b0
I0110 14:58:12.009194 48981 executor.cc:81] Create Variable learning_rate_5 global, which pointer is 0x7f7ac862c2e0
I0110 14:58:12.009202 48981 scope.cc:43] Create variable cast_1.tmp_0
I0110 14:58:12.009210 48981 executor.cc:86] Create Variable cast_1.tmp_0 locally, which pointer is 0x7f7ac8665480
I0110 14:58:12.009219 48981 scope.cc:43] Create variable conv2d_1.tmp_0@GRAD
I0110 14:58:12.009227 48981 executor.cc:86] Create Variable conv2d_1.tmp_0@GRAD locally, which pointer is 0x7f7ac86655d0
I0110 14:58:12.009235 48981 scope.cc:43] Create variable accuracy_1.tmp_2
I0110 14:58:12.009243 48981 executor.cc:86] Create Variable accuracy_1.tmp_2 locally, which pointer is 0x7f7ac8665700
I0110 14:58:12.009251 48981 scope.cc:43] Create variable conv2d_1.tmp_1@GRAD
I0110 14:58:12.009259 48981 executor.cc:86] Create Variable conv2d_1.tmp_1@GRAD locally, which pointer is 0x7f7ac8664f00
I0110 14:58:12.009268 48981 executor.cc:81] Create Variable moment1_5 global, which pointer is 0x7f7ac862f590
I0110 14:58:12.009276 48981 scope.cc:43] Create variable accuracy_0.tmp_1
I0110 14:58:12.009310 48981 executor.cc:86] Create Variable accuracy_0.tmp_1 locally, which pointer is 0x7f7ac8663840
I0110 14:58:12.009320 48981 scope.cc:43] Create variable conv2d_1.tmp_2@GRAD
I0110 14:58:12.009327 48981 executor.cc:86] Create Variable conv2d_1.tmp_2@GRAD locally, which pointer is 0x7f7ac8665310
I0110 14:58:12.009336 48981 scope.cc:43] Create variable fc_0.w_0@GRAD
I0110 14:58:12.009344 48981 executor.cc:86] Create Variable fc_0.w_0@GRAD locally, which pointer is 0x7f7ac8663ae0
I0110 14:58:12.009352 48981 scope.cc:43] Create variable fc_0.tmp_2
I0110 14:58:12.009359 48981 executor.cc:86] Create Variable fc_0.tmp_2 locally, which pointer is 0x7f7ac8663b00
I0110 14:58:12.009368 48981 scope.cc:43] Create variable pool2d_0.tmp_0@GRAD
I0110 14:58:12.009377 48981 executor.cc:86] Create Variable pool2d_0.tmp_0@GRAD locally, which pointer is 0x7f7ac8667070
I0110 14:58:12.009384 48981 scope.cc:43] Create variable fc_0.b_0@GRAD
I0110 14:58:12.009392 48981 executor.cc:86] Create Variable fc_0.b_0@GRAD locally, which pointer is 0x7f7ac8667180
I0110 14:58:12.009400 48981 scope.cc:43] Create variable conv2d_0.tmp_2
I0110 14:58:12.009407 48981 executor.cc:86] Create Variable conv2d_0.tmp_2 locally, which pointer is 0x7f7ac8667290
I0110 14:58:12.009415 48981 scope.cc:43] Create variable conv2d_1.tmp_2
I0110 14:58:12.009423 48981 executor.cc:86] Create Variable conv2d_1.tmp_2 locally, which pointer is 0x7f7ac86673a0
I0110 14:58:12.009430 48981 scope.cc:43] Create variable fc_0.tmp_1
I0110 14:58:12.009438 48981 executor.cc:86] Create Variable fc_0.tmp_1 locally, which pointer is 0x7f7ac86674b0
I0110 14:58:12.009446 48981 executor.cc:81] Create Variable moment1_0 global, which pointer is 0x7f7ac862c540
I0110 14:58:12.009454 48981 scope.cc:43] Create variable conv2d_0.tmp_1@GRAD
I0110 14:58:12.009461 48981 executor.cc:86] Create Variable conv2d_0.tmp_1@GRAD locally, which pointer is 0x7f7ac86675e0
I0110 14:58:12.009470 48981 executor.cc:81] Create Variable learning_rate_3 global, which pointer is 0x7f7ac862c5a0
I0110 14:58:12.009479 48981 scope.cc:43] Create variable cross_entropy_0.tmp_0
I0110 14:58:12.009487 48981 executor.cc:86] Create Variable cross_entropy_0.tmp_0 locally, which pointer is 0x7f7ac8667710
I0110 14:58:12.009495 48981 executor.cc:81] Create Variable fc_0.w_0 global, which pointer is 0x7f7ac862c1b0
I0110 14:58:12.009505 48981 scope.cc:43] Create variable accuracy_1.tmp_1
I0110 14:58:12.009511 48981 executor.cc:86] Create Variable accuracy_1.tmp_1 locally, which pointer is 0x7f7ac8667840
I0110 14:58:12.009521 48981 scope.cc:43] Create variable mean_0.tmp_0
I0110 14:58:12.009528 48981 executor.cc:86] Create Variable mean_0.tmp_0 locally, which pointer is 0x7f7ac8667950
I0110 14:58:12.009537 48981 scope.cc:43] Create variable fc_0.tmp_0@GRAD
I0110 14:58:12.009546 48981 executor.cc:86] Create Variable fc_0.tmp_0@GRAD locally, which pointer is 0x7f7ac8667b30
I0110 14:58:12.009552 48981 executor.cc:81] Create Variable moment1_2 global, which pointer is 0x7f7ac862c040
I0110 14:58:12.009560 48981 executor.cc:81] Create Variable moment2_2 global, which pointer is 0x7f7ac862d3b0
I0110 14:58:12.009569 48981 executor.cc:81] Create Variable fc_0.b_0 global, which pointer is 0x7f7ac862c410
I0110 14:58:12.009577 48981 executor.cc:81] Create Variable conv2d_1.w_0 global, which pointer is 0x7f7ac862d410
I0110 14:58:12.009587 48981 executor.cc:81] Create Variable feed global, which pointer is 0x7f7ac862d200
I0110 14:58:12.009594 48981 executor.cc:81] Create Variable moment1_3 global, which pointer is 0x7f7ac862b800
I0110 14:58:12.009601 48981 executor.cc:81] Create Variable moment2_4 global, which pointer is 0x7f7ac862bc90
I0110 14:58:12.009611 48981 executor.cc:81] Create Variable learning_rate_4 global, which pointer is 0x7f7ac862bb20
I0110 14:58:12.009619 48981 scope.cc:43] Create variable accuracy_0.tmp_0
I0110 14:58:12.009627 48981 executor.cc:86] Create Variable accuracy_0.tmp_0 locally, which pointer is 0x7f7ac8667c60
I0110 14:58:12.009635 48981 scope.cc:43] Create variable fc_0.tmp_2@GRAD
I0110 14:58:12.009644 48981 executor.cc:86] Create Variable fc_0.tmp_2@GRAD locally, which pointer is 0x7f7ac8667d70
I0110 14:58:12.009652 48981 scope.cc:43] Create variable cast_0.tmp_0
I0110 14:58:12.009660 48981 executor.cc:86] Create Variable cast_0.tmp_0 locally, which pointer is 0x7f7ac8667e80
I0110 14:58:12.009668 48981 executor.cc:81] Create Variable learning_rate_2 global, which pointer is 0x7f7ac862cd80
I0110 14:58:12.009675 48981 executor.cc:81] Create Variable learning_rate_0 global, which pointer is 0x7f7ac862c820
I0110 14:58:12.009683 48981 executor.cc:81] Create Variable moment2_5 global, which pointer is 0x7f7ac861bed0
I0110 14:58:12.009691 48981 executor.cc:81] Create Variable accuracy_0_1_correct global, which pointer is 0x7f7ac86332d0
I0110 14:58:12.009699 48981 executor.cc:81] Create Variable learning_rate_1 global, which pointer is 0x7f7ac8623e60
I0110 14:58:12.009707 48981 scope.cc:43] Create variable conv2d_1.tmp_1
I0110 14:58:12.009713 48981 executor.cc:86] Create Variable conv2d_1.tmp_1 locally, which pointer is 0x7f7ac8667f90
I0110 14:58:12.009721 48981 executor.cc:81] Create Variable conv2d_1.b_0 global, which pointer is 0x7f7ac862b6f0
I0110 14:58:12.009742 48981 scope.cc:43] Create variable fc_0.tmp_0
I0110 14:58:12.009752 48981 executor.cc:86] Create Variable fc_0.tmp_0 locally, which pointer is 0x7f7ac86680a0
I0110 14:58:12.009758 48981 executor.cc:81] Create Variable accuracy_0_0_total global, which pointer is 0x7f7ac8632340
I0110 14:58:12.009766 48981 executor.cc:81] Create Variable conv2d_0.b_0 global, which pointer is 0x7f7ac862d2a0
I0110 14:58:12.009774 48981 scope.cc:43] Create variable conv2d_0.tmp_0
I0110 14:58:12.009780 48981 executor.cc:86] Create Variable conv2d_0.tmp_0 locally, which pointer is 0x7f7ac86681b0
I0110 14:58:12.009788 48981 scope.cc:43] Create variable conv2d_0.tmp_1
I0110 14:58:12.009794 48981 executor.cc:86] Create Variable conv2d_0.tmp_1 locally, which pointer is 0x7f7ac86682c0
I0110 14:58:12.009802 48981 scope.cc:43] Create variable conv2d_1.tmp_0
I0110 14:58:12.009809 48981 executor.cc:86] Create Variable conv2d_1.tmp_0 locally, which pointer is 0x7f7ac86683d0
I0110 14:58:12.009815 48981 executor.cc:81] Create Variable moment2_3 global, which pointer is 0x7f7ac862f6a0
I0110 14:58:12.009824 48981 scope.cc:43] Create variable label
I0110 14:58:12.009830 48981 executor.cc:86] Create Variable label locally, which pointer is 0x7f7ac86684e0
I0110 14:58:12.009837 48981 executor.cc:81] Create Variable moment1_1 global, which pointer is 0x7f7ac862c6f0
I0110 14:58:12.009845 48981 scope.cc:43] Create variable conv2d_0.tmp_2@GRAD
I0110 14:58:12.009852 48981 executor.cc:86] Create Variable conv2d_0.tmp_2@GRAD locally, which pointer is 0x7f7ac8668610
I0110 14:58:12.009860 48981 scope.cc:43] Create variable mean_0.tmp_0@GRAD
I0110 14:58:12.009867 48981 executor.cc:86] Create Variable mean_0.tmp_0@GRAD locally, which pointer is 0x7f7ac8668740
I0110 14:58:12.009874 48981 executor.cc:81] Create Variable conv2d_0.w_0 global, which pointer is 0x7f7ac862cfe0
I0110 14:58:12.009882 48981 scope.cc:43] Create variable pool2d_1.tmp_0
I0110 14:58:12.009888 48981 executor.cc:86] Create Variable pool2d_1.tmp_0 locally, which pointer is 0x7f7ac8668850
I0110 14:58:12.009896 48981 scope.cc:43] Create variable conv2d_1.w_0@GRAD
I0110 14:58:12.009903 48981 executor.cc:86] Create Variable conv2d_1.w_0@GRAD locally, which pointer is 0x7f7ac8668980
I0110 14:58:12.009910 48981 executor.cc:81] Create Variable moment1_4 global, which pointer is 0x7f7ac862c0e0
I0110 14:58:12.009918 48981 scope.cc:43] Create variable accuracy_1.tmp_0
I0110 14:58:12.009925 48981 executor.cc:86] Create Variable accuracy_1.tmp_0 locally, which pointer is 0x7f7ac8668ab0
I0110 14:58:12.009933 48981 scope.cc:43] Create variable cross_entropy_0.tmp_0@GRAD
I0110 14:58:12.009940 48981 executor.cc:86] Create Variable cross_entropy_0.tmp_0@GRAD locally, which pointer is 0x7f7ac8668bc0
I0110 14:58:12.009949 48981 scope.cc:43] Create variable pool2d_1.tmp_0@GRAD
I0110 14:58:12.009956 48981 executor.cc:86] Create Variable pool2d_1.tmp_0@GRAD locally, which pointer is 0x7f7ac8668d50
I0110 14:58:12.009964 48981 scope.cc:43] Create variable fc_0.tmp_1@GRAD
I0110 14:58:12.009971 48981 executor.cc:86] Create Variable fc_0.tmp_1@GRAD locally, which pointer is 0x7f7ac8668e60
I0110 14:58:12.009996 48981 executor.cc:102] Op(feed), inputs:{X[feed]}, outputs:{Out[label]}.
I0110 14:58:12.010010 48981 feed_op.cc:44] Feed Var feed's 1 column to var label
I0110 14:58:12.010038 48981 executor.cc:102] Op(feed), inputs:{X[feed]}, outputs:{Out[pixel]}.
I0110 14:58:12.010047 48981 feed_op.cc:44] Feed Var feed's 0 column to var pixel
I0110 14:58:12.010098 48981 executor.cc:102] Op(conv2d_cudnn), inputs:{Filter[conv2d_0.w_0], Input[pixel]}, outputs:{Output[conv2d_0.tmp_0]}.
I0110 14:58:12.032361 48981 executor.cc:102] Op(elementwise_add), inputs:{X[conv2d_0.tmp_0], Y[conv2d_0.b_0]}, outputs:{Out[conv2d_0.tmp_1]}.
I0110 14:58:12.034596 48981 executor.cc:102] Op(relu), inputs:{X[conv2d_0.tmp_1]}, outputs:{Out[conv2d_0.tmp_2]}.
I0110 14:58:12.035933 48981 executor.cc:102] Op(pool2d), inputs:{X[conv2d_0.tmp_2]}, outputs:{Out[pool2d_0.tmp_0]}.
I0110 14:58:12.038203 48981 executor.cc:102] Op(conv2d_cudnn), inputs:{Filter[conv2d_1.w_0], Input[pool2d_0.tmp_0]}, outputs:{Output[conv2d_1.tmp_0]}.
I0110 14:58:12.047538 48981 executor.cc:102] Op(elementwise_add), inputs:{X[conv2d_1.tmp_0], Y[conv2d_1.b_0]}, outputs:{Out[conv2d_1.tmp_1]}.
I0110 14:58:12.047978 48981 executor.cc:102] Op(relu), inputs:{X[conv2d_1.tmp_1]}, outputs:{Out[conv2d_1.tmp_2]}.
I0110 14:58:12.048127 48981 executor.cc:102] Op(pool2d), inputs:{X[conv2d_1.tmp_2]}, outputs:{Out[pool2d_1.tmp_0]}.
I0110 14:58:12.048753 48981 executor.cc:102] Op(mul), inputs:{X[pool2d_1.tmp_0], Y[fc_0.w_0]}, outputs:{Out[fc_0.tmp_0]}.
I0110 14:58:12.048770 48981 mul_op.cc:36] mul operator x.shape=50, 50, 4, 4 y.shape=800, 10 x_num_col_dims=1 y_num_col_dims=1
I0110 14:58:12.048893 48981 executor.cc:102] Op(elementwise_add), inputs:{X[fc_0.tmp_0], Y[fc_0.b_0]}, outputs:{Out[fc_0.tmp_1]}.
I0110 14:58:12.048930 48981 executor.cc:102] Op(softmax), inputs:{X[fc_0.tmp_1]}, outputs:{Out[fc_0.tmp_2]}.
I0110 14:58:12.049037 48981 executor.cc:102] Op(cross_entropy), inputs:{Label[label], X[fc_0.tmp_2]}, outputs:{Y[cross_entropy_0.tmp_0]}.
I0110 14:58:12.049074 48981 executor.cc:102] Op(mean), inputs:{X[cross_entropy_0.tmp_0]}, outputs:{Out[mean_0.tmp_0]}.
I0110 14:58:12.049108 48981 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[mean_0.tmp_0@GRAD]}.
I0110 14:58:12.049131 48981 executor.cc:102] Op(mean_grad), inputs:{Out@GRAD[mean_0.tmp_0@GRAD], X[cross_entropy_0.tmp_0]}, outputs:{X@GRAD[cross_entropy_0.tmp_0@GRAD]}.
I0110 14:58:12.049162 48981 executor.cc:102] Op(cross_entropy_grad), inputs:{Label[label], X[fc_0.tmp_2], Y[cross_entropy_0.tmp_0], Y@GRAD[cross_entropy_0.tmp_0@GRAD]}, outputs:{Label@GRAD[], X@GRAD[fc_0.tmp_2@GRAD]}.
I0110 14:58:12.049197 48981 executor.cc:102] Op(softmax_grad), inputs:{Out[fc_0.tmp_2], Out@GRAD[fc_0.tmp_2@GRAD], X[fc_0.tmp_1]}, outputs:{X@GRAD[fc_0.tmp_1@GRAD]}.
I0110 14:58:12.049257 48981 executor.cc:102] Op(elementwise_add_grad), inputs:{Out[fc_0.tmp_1], Out@GRAD[fc_0.tmp_1@GRAD], X[fc_0.tmp_0], Y[fc_0.b_0]}, outputs:{X@GRAD[fc_0.tmp_0@GRAD], Y@GRAD[fc_0.b_0@GRAD]}.
I0110 14:58:12.049314 48981 executor.cc:102] Op(mul_grad), inputs:{Out[fc_0.tmp_0], Out@GRAD[fc_0.tmp_0@GRAD], X[pool2d_1.tmp_0], Y[fc_0.w_0]}, outputs:{X@GRAD[pool2d_1.tmp_0@GRAD], Y@GRAD[fc_0.w_0@GRAD]}.
I0110 14:58:12.052873 48981 executor.cc:102] Op(pool2d_grad), inputs:{Out[pool2d_1.tmp_0], Out@GRAD[pool2d_1.tmp_0@GRAD], X[conv2d_1.tmp_2]}, outputs:{X@GRAD[conv2d_1.tmp_2@GRAD]}.
I0110 14:58:12.055135 48981 executor.cc:102] Op(relu_grad), inputs:{Out[conv2d_1.tmp_2], Out@GRAD[conv2d_1.tmp_2@GRAD], X[conv2d_1.tmp_1]}, outputs:{X@GRAD[conv2d_1.tmp_1@GRAD]}.
I0110 14:58:12.055800 48981 executor.cc:102] Op(elementwise_add_grad), inputs:{Out[conv2d_1.tmp_1], Out@GRAD[conv2d_1.tmp_1@GRAD], X[conv2d_1.tmp_0], Y[conv2d_1.b_0]}, outputs:{X@GRAD[conv2d_1.tmp_0@GRAD], Y@GRAD[conv2d_1.b_0@GRAD]}.
I0110 14:58:12.056299 48981 executor.cc:102] Op(conv2d_cudnn_grad), inputs:{Filter[conv2d_1.w_0], Input[pool2d_0.tmp_0], Output[conv2d_1.tmp_0], Output@GRAD[conv2d_1.tmp_0@GRAD]}, outputs:{Filter@GRAD[conv2d_1.w_0@GRAD], Input@GRAD[pool2d_0.tmp_0@GRAD]}.
I0110 14:58:12.074065 48981 executor.cc:102] Op(pool2d_grad), inputs:{Out[pool2d_0.tmp_0], Out@GRAD[pool2d_0.tmp_0@GRAD], X[conv2d_0.tmp_2]}, outputs:{X@GRAD[conv2d_0.tmp_2@GRAD]}.
I0110 14:58:12.077603 48981 executor.cc:102] Op(relu_grad), inputs:{Out[conv2d_0.tmp_2], Out@GRAD[conv2d_0.tmp_2@GRAD], X[conv2d_0.tmp_1]}, outputs:{X@GRAD[conv2d_0.tmp_1@GRAD]}.
I0110 14:58:12.080160 48981 executor.cc:102] Op(elementwise_add_grad), inputs:{Out[conv2d_0.tmp_1], Out@GRAD[conv2d_0.tmp_1@GRAD], X[conv2d_0.tmp_0], Y[conv2d_0.b_0]}, outputs:{X@GRAD[conv2d_0.tmp_0@GRAD], Y@GRAD[conv2d_0.b_0@GRAD]}.
I0110 14:58:12.082137 48981 executor.cc:102] Op(conv2d_cudnn_grad), inputs:{Filter[conv2d_0.w_0], Input[pixel], Output[conv2d_0.tmp_0], Output@GRAD[conv2d_0.tmp_0@GRAD]}, outputs:{Filter@GRAD[conv2d_0.w_0@GRAD], Input@GRAD[]}.
I0110 14:58:12.085556 48981 executor.cc:102] Op(top_k), inputs:{X[fc_0.tmp_2]}, outputs:{Indices[accuracy_1.tmp_1], Out[accuracy_1.tmp_0]}.
I0110 14:58:12.085644 48981 executor.cc:102] Op(accuracy), inputs:{Indices[accuracy_1.tmp_1], Label[label], Out[accuracy_1.tmp_0]}, outputs:{Accuracy[accuracy_1.tmp_2], Correct[accuracy_0.tmp_1], Total[accuracy_0.tmp_0]}.
I0110 14:58:12.085683 48981 executor.cc:102] Op(cast), inputs:{X[accuracy_0.tmp_0]}, outputs:{Out[cast_0.tmp_0]}.
I0110 14:58:12.085707 48981 executor.cc:102] Op(cast), inputs:{X[accuracy_0.tmp_1]}, outputs:{Out[cast_1.tmp_0]}.
I0110 14:58:12.085742 48981 executor.cc:102] Op(sum), inputs:{X[accuracy_0_0_total, cast_0.tmp_0]}, outputs:{Out[accuracy_0_0_total]}.
I0110 14:58:12.085777 48981 executor.cc:102] Op(sum), inputs:{X[accuracy_0_1_correct, cast_1.tmp_0]}, outputs:{Out[accuracy_0_1_correct]}.
I0110 14:58:12.087095 48981 executor.cc:102] Op(send), inputs:{X[conv2d_0.w_0@GRAD, fc_0.b_0@GRAD, conv2d_0.b_0@GRAD, fc_0.w_0@GRAD, conv2d_1.b_0@GRAD, conv2d_1.w_0@GRAD]}, outputs:{Out[conv2d_0.w_0, fc_0.b_0, conv2d_0.b_0, fc_0.w_0, conv2d_1.b_0, conv2d_1.w_0]}.

stared pserver with following command the output:

chenxi@idgsim-gpu-001:~$ export TRAINING_ROLE=PSERVER SERVER_ENDPOINT=127.0.0.1:6188 PSERVERS=127.0.0.1:6188 LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/lib/ CUDA_VISIBLE_DEVICES=1 GLOG_v=3;python notest_recognize_digits_conv_dist.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0110 14:58:02.877094 48863 init.cc:39] Init commandline: notest_recognize_digits_conv_dist.py --tryfromenv=use_pinned_memory,fraction_of_gpu_memory_to_use 
I0110 14:58:03.641646 48863 dynamic_loader.cc:67] Try to find library: libcublas.so from default system path.
I0110 14:58:03.908080 48863 dynamic_loader.cc:67] Try to find library: libcudnn.so from default system path.
I0110 14:58:04.295178 48863 op_desc.cc:348] CompileTime infer shape on gaussian_random
I0110 14:58:04.296227 48863 op_desc.cc:348] CompileTime infer shape on conv2d_cudnn
I0110 14:58:04.296417 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.296577 48863 op_desc.cc:348] CompileTime infer shape on elementwise_add
I0110 14:58:04.296701 48863 op_desc.cc:348] CompileTime infer shape on relu
I0110 14:58:04.296874 48863 op_desc.cc:348] CompileTime infer shape on pool2d
I0110 14:58:04.297075 48863 op_desc.cc:348] CompileTime infer shape on gaussian_random
I0110 14:58:04.297224 48863 op_desc.cc:348] CompileTime infer shape on conv2d_cudnn
I0110 14:58:04.297385 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.297518 48863 op_desc.cc:348] CompileTime infer shape on elementwise_add
I0110 14:58:04.297611 48863 op_desc.cc:348] CompileTime infer shape on relu
I0110 14:58:04.297760 48863 op_desc.cc:348] CompileTime infer shape on pool2d
I0110 14:58:04.298069 48863 op_desc.cc:348] CompileTime infer shape on uniform_random
I0110 14:58:04.298231 48863 op_desc.cc:348] CompileTime infer shape on mul
I0110 14:58:04.298249 48863 mul_op.cc:36] mul operator x.shape=-1, 50, 4, 4 y.shape=800, 10 x_num_col_dims=1 y_num_col_dims=1
I0110 14:58:04.298395 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.298535 48863 op_desc.cc:348] CompileTime infer shape on elementwise_add
I0110 14:58:04.298638 48863 op_desc.cc:348] CompileTime infer shape on softmax
I0110 14:58:04.298756 48863 op_desc.cc:348] CompileTime infer shape on cross_entropy
I0110 14:58:04.298976 48863 op_desc.cc:348] CompileTime infer shape on mean
I0110 14:58:04.299718 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.299795 48863 op_desc.cc:348] CompileTime infer shape on mean_grad
I0110 14:58:04.299855 48863 op_desc.cc:348] CompileTime infer shape on cross_entropy_grad
I0110 14:58:04.299913 48863 op_desc.cc:348] CompileTime infer shape on softmax_grad
I0110 14:58:04.299973 48863 op_desc.cc:348] CompileTime infer shape on elementwise_add_grad
I0110 14:58:04.300046 48863 op_desc.cc:348] CompileTime infer shape on mul_grad
I0110 14:58:04.300114 48863 op_desc.cc:348] CompileTime infer shape on pool2d_grad
I0110 14:58:04.300163 48863 op_desc.cc:348] CompileTime infer shape on relu_grad
I0110 14:58:04.300216 48863 op_desc.cc:348] CompileTime infer shape on elementwise_add_grad
I0110 14:58:04.300285 48863 op_desc.cc:348] CompileTime infer shape on conv2d_cudnn_grad
I0110 14:58:04.300346 48863 op_desc.cc:348] CompileTime infer shape on pool2d_grad
I0110 14:58:04.300395 48863 op_desc.cc:348] CompileTime infer shape on relu_grad
I0110 14:58:04.300447 48863 op_desc.cc:348] CompileTime infer shape on elementwise_add_grad
I0110 14:58:04.300508 48863 op_desc.cc:348] CompileTime infer shape on conv2d_cudnn_grad
I0110 14:58:04.301167 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.301332 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.301482 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.301633 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.301775 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.301915 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.302057 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.302191 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.302325 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.302459 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.302594 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.302731 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.302865 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.302996 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.303320 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.303520 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.303694 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.303853 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.304015 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.304172 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.304330 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.304486 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.304638 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.304792 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.304949 48863 op_desc.cc:348] CompileTime infer shape on fill_constant
I0110 14:58:04.305101 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.305186 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.305258 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.305541 48863 op_desc.cc:348] CompileTime infer shape on top_k
I0110 14:58:04.305678 48863 op_desc.cc:348] CompileTime infer shape on accuracy
I0110 14:58:04.305817 48863 op_desc.cc:348] CompileTime infer shape on cast
I0110 14:58:04.305933 48863 op_desc.cc:348] CompileTime infer shape on cast
I0110 14:58:04.306018 48863 op_desc.cc:348] CompileTime infer shape on sum
I0110 14:58:04.306092 48863 op_desc.cc:348] CompileTime infer shape on sum
127.0.0.1:6188 PSERVER 127.0.0.1:6188

 getting and starting pserver
I0110 14:58:04.337280 48863 op_desc.cc:348] CompileTime infer shape on sum
I0110 14:58:04.337381 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.337545 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.337797 48863 op_desc.cc:348] CompileTime infer shape on sum
I0110 14:58:04.337878 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.338029 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.338279 48863 op_desc.cc:348] CompileTime infer shape on sum
I0110 14:58:04.338359 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.338512 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.338757 48863 op_desc.cc:348] CompileTime infer shape on sum
I0110 14:58:04.338830 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.338977 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.339221 48863 op_desc.cc:348] CompileTime infer shape on sum
I0110 14:58:04.339294 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.339442 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.339684 48863 op_desc.cc:348] CompileTime infer shape on sum
I0110 14:58:04.339759 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.339910 48863 op_desc.cc:348] CompileTime infer shape on adam
I0110 14:58:04.340015 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.340109 48863 op_desc.cc:348] CompileTime infer shape on scale
I0110 14:58:04.341063 48863 scope.cc:43] Create variable learning_rate_1
I0110 14:58:04.341076 48863 executor.cc:81] Create Variable learning_rate_1 global, which pointer is 0x7f8fcd64e7e0
I0110 14:58:04.341083 48863 scope.cc:43] Create variable moment2_5
I0110 14:58:04.341085 48863 executor.cc:81] Create Variable moment2_5 global, which pointer is 0x7f8fcd6576e0
I0110 14:58:04.341089 48863 scope.cc:43] Create variable moment2_4
I0110 14:58:04.341092 48863 executor.cc:81] Create Variable moment2_4 global, which pointer is 0x7f8fcd65a010
I0110 14:58:04.341096 48863 scope.cc:43] Create variable beta2_pow_acc_0
I0110 14:58:04.341099 48863 executor.cc:81] Create Variable beta2_pow_acc_0 global, which pointer is 0x7f8fcd65a350
I0110 14:58:04.341104 48863 scope.cc:43] Create variable moment1_2
I0110 14:58:04.341109 48863 executor.cc:81] Create Variable moment1_2 global, which pointer is 0x7f8fcd65a540
I0110 14:58:04.341112 48863 scope.cc:43] Create variable fc_0.w_0
I0110 14:58:04.341118 48863 executor.cc:81] Create Variable fc_0.w_0 global, which pointer is 0x7f8fcd659fe0
I0110 14:58:04.341122 48863 scope.cc:43] Create variable learning_rate_5
I0110 14:58:04.341127 48863 executor.cc:81] Create Variable learning_rate_5 global, which pointer is 0x7f8fcd65a720
I0110 14:58:04.341131 48863 scope.cc:43] Create variable fc_0.b_0
I0110 14:58:04.341136 48863 executor.cc:81] Create Variable fc_0.b_0 global, which pointer is 0x7f8fcd65a870
I0110 14:58:04.341141 48863 scope.cc:43] Create variable moment1_0
I0110 14:58:04.341145 48863 executor.cc:81] Create Variable moment1_0 global, which pointer is 0x7f8fcd65aa60
I0110 14:58:04.341150 48863 scope.cc:43] Create variable learning_rate_3
I0110 14:58:04.341156 48863 executor.cc:81] Create Variable learning_rate_3 global, which pointer is 0x7f8fcd65ab70
I0110 14:58:04.341161 48863 scope.cc:43] Create variable moment1_1
I0110 14:58:04.341166 48863 executor.cc:81] Create Variable moment1_1 global, which pointer is 0x7f8fcd65aca0
I0110 14:58:04.341171 48863 scope.cc:43] Create variable learning_rate_0
I0110 14:58:04.341176 48863 executor.cc:81] Create Variable learning_rate_0 global, which pointer is 0x7f8fcd65ae00
I0110 14:58:04.341181 48863 scope.cc:43] Create variable moment1_4
I0110 14:58:04.341186 48863 executor.cc:81] Create Variable moment1_4 global, which pointer is 0x7f8fcd65afa0
I0110 14:58:04.341190 48863 scope.cc:43] Create variable beta1_pow_acc_0
I0110 14:58:04.341194 48863 executor.cc:81] Create Variable beta1_pow_acc_0 global, which pointer is 0x7f8fcd65b0b0
I0110 14:58:04.341199 48863 scope.cc:43] Create variable fetch
I0110 14:58:04.341205 48863 executor.cc:81] Create Variable fetch global, which pointer is 0x7f8fcd65b1e0
I0110 14:58:04.341210 48863 scope.cc:43] Create variable learning_rate_2
I0110 14:58:04.341214 48863 executor.cc:81] Create Variable learning_rate_2 global, which pointer is 0x7f8fcd65b2a0
I0110 14:58:04.341220 48863 scope.cc:43] Create variable moment2_0
I0110 14:58:04.341224 48863 executor.cc:81] Create Variable moment2_0 global, which pointer is 0x7f8fcd65b300
I0110 14:58:04.341230 48863 scope.cc:43] Create variable conv2d_0.w_0
I0110 14:58:04.341234 48863 executor.cc:81] Create Variable conv2d_0.w_0 global, which pointer is 0x7f8fcd65b440
I0110 14:58:04.341239 48863 scope.cc:43] Create variable moment2_1
I0110 14:58:04.341244 48863 executor.cc:81] Create Variable moment2_1 global, which pointer is 0x7f8fcd65b590
I0110 14:58:04.341249 48863 scope.cc:43] Create variable learning_rate_4
I0110 14:58:04.341253 48863 executor.cc:81] Create Variable learning_rate_4 global, which pointer is 0x7f8fcd65b780
I0110 14:58:04.341259 48863 scope.cc:43] Create variable feed
I0110 14:58:04.341264 48863 executor.cc:81] Create Variable feed global, which pointer is 0x7f8fcd65b890
I0110 14:58:04.341269 48863 scope.cc:43] Create variable conv2d_0.b_0
I0110 14:58:04.341274 48863 executor.cc:81] Create Variable conv2d_0.b_0 global, which pointer is 0x7f8fcd65b950
I0110 14:58:04.341289 48863 scope.cc:43] Create variable moment2_2
I0110 14:58:04.341294 48863 executor.cc:81] Create Variable moment2_2 global, which pointer is 0x7f8fcd65b9b0
I0110 14:58:04.341297 48863 scope.cc:43] Create variable conv2d_1.w_0
I0110 14:58:04.341300 48863 executor.cc:81] Create Variable conv2d_1.w_0 global, which pointer is 0x7f8fcd65cd30
I0110 14:58:04.341305 48863 scope.cc:43] Create variable conv2d_1.b_0
I0110 14:58:04.341308 48863 executor.cc:81] Create Variable conv2d_1.b_0 global, which pointer is 0x7f8fcd65cd90
I0110 14:58:04.341315 48863 scope.cc:43] Create variable moment1_3
I0110 14:58:04.341318 48863 executor.cc:81] Create Variable moment1_3 global, which pointer is 0x7f8fcd659bb0
I0110 14:58:04.341323 48863 scope.cc:43] Create variable moment1_5
I0110 14:58:04.341328 48863 executor.cc:81] Create Variable moment1_5 global, which pointer is 0x7f8fcd659cc0
I0110 14:58:04.341333 48863 scope.cc:43] Create variable moment2_3
I0110 14:58:04.341337 48863 executor.cc:81] Create Variable moment2_3 global, which pointer is 0x7f8fcd659dd0
I0110 14:58:04.341359 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_5]}.
I0110 14:58:04.341614 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_4]}.
I0110 14:58:04.341680 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_3]}.
I0110 14:58:04.341708 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_2]}.
I0110 14:58:04.341732 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_1]}.
I0110 14:58:04.341756 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[learning_rate_0]}.
I0110 14:58:04.341781 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_5]}.
I0110 14:58:04.341879 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_5]}.
I0110 14:58:04.341976 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_4]}.
I0110 14:58:04.342001 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_4]}.
I0110 14:58:04.342025 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_3]}.
I0110 14:58:04.342068 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_3]}.
I0110 14:58:04.342114 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_2]}.
I0110 14:58:04.342139 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_2]}.
I0110 14:58:04.342162 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_1]}.
I0110 14:58:04.342185 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_1]}.
I0110 14:58:04.342208 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment2_0]}.
I0110 14:58:04.342231 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[moment1_0]}.
I0110 14:58:04.342253 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[beta2_pow_acc_0]}.
I0110 14:58:04.342277 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[beta1_pow_acc_0]}.
I0110 14:58:04.342300 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[fc_0.b_0]}.
I0110 14:58:04.342331 48863 executor.cc:102] Op(uniform_random), inputs:{}, outputs:{Out[fc_0.w_0]}.
I0110 14:58:04.342516 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[conv2d_1.b_0]}.
I0110 14:58:04.342548 48863 executor.cc:102] Op(gaussian_random), inputs:{}, outputs:{Out[conv2d_1.w_0]}.
I0110 14:58:04.343786 48863 executor.cc:102] Op(fill_constant), inputs:{}, outputs:{Out[conv2d_0.b_0]}.
I0110 14:58:04.343816 48863 executor.cc:102] Op(gaussian_random), inputs:{}, outputs:{Out[conv2d_0.w_0]}.

 done running default program, starting pserver program
I0110 14:58:04.344673 48863 executor.cc:81] Create Variable fetch global, which pointer is 0x7f8fcd65b1e0
I0110 14:58:04.344692 48863 executor.cc:81] Create Variable feed global, which pointer is 0x7f8fcd65b890
I0110 14:58:04.344699 48863 executor.cc:81] Create Variable fc_0.w_0 global, which pointer is 0x7f8fcd659fe0
I0110 14:58:04.344705 48863 executor.cc:81] Create Variable fc_0.b_0 global, which pointer is 0x7f8fcd65a870
I0110 14:58:04.344710 48863 executor.cc:81] Create Variable conv2d_1.b_0 global, which pointer is 0x7f8fcd65cd90
I0110 14:58:04.344715 48863 executor.cc:81] Create Variable conv2d_1.w_0 global, which pointer is 0x7f8fcd65cd30
I0110 14:58:04.344722 48863 executor.cc:81] Create Variable conv2d_0.w_0 global, which pointer is 0x7f8fcd65b440
I0110 14:58:04.344728 48863 executor.cc:81] Create Variable conv2d_0.b_0 global, which pointer is 0x7f8fcd65b950
I0110 14:58:04.344841 48863 executor.cc:102] Op(recv), inputs:{RX[conv2d_0.w_0@GRAD, fc_0.b_0@GRAD, conv2d_0.b_0@GRAD, fc_0.w_0@GRAD, conv2d_1.b_0@GRAD, conv2d_1.w_0@GRAD]}, outputs:{}.
I0110 14:58:04.348551 48939 recv_op.cc:44] Server listening on 127.0.0.1:6188
I0110 14:58:12.088160 48863 recv_op.cc:110] recved grad: conv2d_0.w_0@GRAD updating param: conv2d_0.w_0
I0110 14:58:12.088197 48863 scope.cc:43] Create variable conv2d_0.w_0@GRAD
I0110 14:58:12.088244 48863 scope.cc:43] Create variable [email protected]_0
I0110 14:58:12.088526 48863 recv_op.cc:110] recved grad: fc_0.b_0@GRAD updating param: fc_0.b_0
I0110 14:58:12.088541 48863 scope.cc:43] Create variable fc_0.b_0@GRAD
I0110 14:58:12.088554 48863 scope.cc:43] Create variable [email protected]_0
I0110 14:58:12.088793 48863 recv_op.cc:110] recved grad: conv2d_0.b_0@GRAD updating param: conv2d_0.b_0
I0110 14:58:12.088809 48863 scope.cc:43] Create variable conv2d_0.b_0@GRAD
I0110 14:58:12.088819 48863 scope.cc:43] Create variable [email protected]_0
I0110 14:58:12.089401 48863 recv_op.cc:110] recved grad: fc_0.w_0@GRAD updating param: fc_0.w_0
I0110 14:58:12.089416 48863 scope.cc:43] Create variable fc_0.w_0@GRAD
I0110 14:58:12.089426 48863 scope.cc:43] Create variable [email protected]_0
I0110 14:58:12.089670 48863 recv_op.cc:110] recved grad: conv2d_1.b_0@GRAD updating param: conv2d_1.b_0
I0110 14:58:12.089702 48863 scope.cc:43] Create variable conv2d_1.b_0@GRAD
I0110 14:58:12.089723 48863 scope.cc:43] Create variable [email protected]_0
I0110 14:58:12.090672 48863 recv_op.cc:110] recved grad: conv2d_1.w_0@GRAD updating param: conv2d_1.w_0
I0110 14:58:12.090705 48863 scope.cc:43] Create variable conv2d_1.w_0@GRAD
I0110 14:58:12.090728 48863 scope.cc:43] Create variable [email protected]_0
@typhoonzero
Copy link
Contributor

Did you transpile the program to run as 2 trainers? if so you may need to start two trainer processes.

@putcn
Copy link
Contributor Author

putcn commented Jan 11, 2018

ah...got it, thanks @typhoonzero

@putcn putcn closed this as completed Jan 11, 2018
@putcn
Copy link
Contributor Author

putcn commented Jan 11, 2018

emmm, but in this case, if I only have one trainer, killed and stared for the 2nd time, shouldn't the training process get stuck at 2nd pass? anyway, I will create another issue to track this.

@typhoonzero
Copy link
Contributor

In the current implementation, parameter server side (recv_op) wait for N trainers sent variables before it can run the sub-program. In your case, if trainers==2 and start one trainer, killed and start it again will probably cause an error. The log above is stuck at the first time the send_op is called, a complete a total pass is not reached yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants