Theano-MPI started 3 workers for 1.updating Cifar10_model params through iterations and 2.exchange the params with BSP(cdd,nccl32) See output log. Using cuDNN version 5103 on context None Mapped name None to device cuda1: Tesla K80 (0000:06:00.0) INFO (theano.gof.compilelock): Waiting for existing lock by process '164550' (I am process '164552') INFO (theano.gof.compilelock): To manually release the lock, delete /home/mahe6562/.theano/compiledir_Linux-2.6-el6.x86_64-x86_64-with-centos-6.8-Final-x86_64-2.7.10-64/loc k_dir Using cuDNN version 5103 on context None Mapped name None to device cuda0: Tesla K80 (0000:05:00.0) Using cuDNN version 5103 on context None Mapped name None to device cuda2: Tesla K80 (0000:09:00.0) rank0: bad list is [], extended to 156 rank0: bad list is [], extended to 39 Cifar10_model Layer Subtract in (3, 32, 32, 256) --> out (3, 32, 32, 256) Layer Crop in [ 3 32 32 256] --> out (3, 28, 28, 256) Layer Dimshuffle in [ 3 28 28 256] --> out (256, 3, 28, 28) Layer Conv (cudnn) in [256 3 28 28] --> out (256, 64, 24, 24) Layer Pool in [256 64 24 24] --> out (256, 64, 12, 12) Layer Conv (cudnn) in [256 64 12 12] --> out (256, 128, 8, 8) Layer Pool in [256 128 8 8] --> out (256, 128, 4, 4) Layer Conv (cudnn) in [256 128 4 4] --> out (256, 64, 2, 2) Layer Flatten in [256 64 2 2] --> out (256, 256) Layer FC in [256 256] --> out (256, 256) Layer Dropout0.5 in [256 256] --> out (256, 256) Layer Softmax in [256 256] --> out (256, 10) [64 3 5 5] [64] [128 64 5 5] [128] [ 64 128 3 3] [64] [256 256] [256] [256 10] [10] model size 0.336 M floats compiling training function... compiling validation function... Compile time: 62.361 s calculating lr warming up power base: 1.246 learning rate 0.010000 will be used for epoch 0 120 2.009918 0.755859 time per 40 batches: 2.35 (train 1.17 sync 0.02 comm 1.03 wait 0.12) validation cost:1.6657 validation error:0.6108 validation top_5_error:0.1214 weights saved at epoch 0 global epoch 0 took 0.0011 h warming up lr from 0.010000 to 0.012457 learning rate 0.012457 will be used for epoch 1 120 1.655648 0.610254 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.93 wait 0.12) validation cost:1.5450 validation error:0.5634 validation top_5_error:0.0838 global epoch 1 took 0.0006 h warming up lr from 0.012457 to 0.015518 learning rate 0.015518 will be used for epoch 2 120 1.542073 0.560645 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.4305 validation error:0.5209 validation top_5_error:0.0740 global epoch 2 took 0.0006 h warming up lr from 0.015518 to 0.019332 learning rate 0.019332 will be used for epoch 3 120 1.540278 0.554492 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.3633 validation error:0.4873 validation top_5_error:0.0680 global epoch 3 took 0.0006 h warming up lr from 0.019332 to 0.024082 learning rate 0.024082 will be used for epoch 4 120 1.444963 0.513574 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.3301 validation error:0.4713 validation top_5_error:0.0636 global epoch 4 took 0.0006 h warming up lr from 0.024082 to 0.030000 learning rate 0.030000 will be used for epoch 5 120 1.498224 0.523730 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.93 wait 0.12) validation cost:1.3128 validation error:0.4594 validation top_5_error:0.0630 weights saved at epoch 5 global epoch 5 took 0.0006 h learning rate 0.030000 will be used for epoch 6 120 1.400473 0.483008 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.93 wait 0.12) validation cost:1.2170 validation error:0.4295 validation top_5_error:0.0547 global epoch 6 took 0.0006 h learning rate 0.030000 will be used for epoch 7 120 1.313909 0.456348 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.2440 validation error:0.4307 validation top_5_error:0.0584 global epoch 7 took 0.0006 h learning rate 0.030000 will be used for epoch 8 120 1.284307 0.439746 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.1455 validation error:0.3893 validation top_5_error:0.0514 global epoch 8 took 0.0006 h learning rate 0.030000 will be used for epoch 9 120 1.219355 0.413281 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.1351 validation error:0.3883 validation top_5_error:0.0477 global epoch 9 took 0.0006 h learning rate 0.030000 will be used for epoch 10 120 1.169670 0.399414 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.1562 validation error:0.4052 validation top_5_error:0.0453 weights saved at epoch 10 global epoch 10 took 0.0006 h learning rate 0.030000 will be used for epoch 11 120 1.157041 0.392773 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.93 wait 0.12) validation cost:1.0414 validation error:0.3583 validation top_5_error:0.0411 global epoch 11 took 0.0006 h learning rate 0.030000 will be used for epoch 12 120 1.135222 0.385156 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.0783 validation error:0.3669 validation top_5_error:0.0419 global epoch 12 took 0.0006 h learning rate 0.030000 will be used for epoch 13 120 1.090922 0.372363 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.93 wait 0.12) validation cost:1.0607 validation error:0.3585 validation top_5_error:0.0421 global epoch 13 took 0.0006 h learning rate 0.030000 will be used for epoch 14 120 1.094984 0.370020 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:1.0466 validation error:0.3618 validation top_5_error:0.0415 global epoch 14 took 0.0006 h learning rate 0.030000 will be used for epoch 15 120 1.052760 0.360742 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.9925 validation error:0.3403 validation top_5_error:0.0364 weights saved at epoch 15 global epoch 15 took 0.0006 h learning rate 0.030000 will be used for epoch 16 120 1.039231 0.349219 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.93 wait 0.12) validation cost:0.9789 validation error:0.3329 validation top_5_error:0.0360 global epoch 16 took 0.0006 h learning rate 0.030000 will be used for epoch 17 120 1.013065 0.335938 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.9555 validation error:0.3273 validation top_5_error:0.0371 global epoch 17 took 0.0006 h learning rate 0.030000 will be used for epoch 18 120 1.009837 0.345117 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.9416 validation error:0.3167 validation top_5_error:0.0368 global epoch 18 took 0.0006 h learning rate 0.030000 will be used for epoch 19 120 1.003021 0.331348 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.9421 validation error:0.3165 validation top_5_error:0.0335 global epoch 19 took 0.0006 h learning rate 0.030000 will be used for epoch 20 120 0.949186 0.320312 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8952 validation error:0.3026 validation top_5_error:0.0336 weights saved at epoch 20 global epoch 20 took 0.0006 h learning rate 0.030000 will be used for epoch 21 120 0.958846 0.321680 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.93 wait 0.12) validation cost:0.9592 validation error:0.3203 validation top_5_error:0.0364 global epoch 21 took 0.0006 h learning rate 0.030000 will be used for epoch 22 120 0.975641 0.327637 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.9420 validation error:0.3176 validation top_5_error:0.0366 global epoch 22 took 0.0006 h learning rate 0.030000 will be used for epoch 23 120 0.925220 0.308984 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.9404 validation error:0.3137 validation top_5_error:0.0351 global epoch 23 took 0.0006 h learning rate 0.030000 will be used for epoch 24 120 0.935118 0.313672 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8609 validation error:0.2832 validation top_5_error:0.0320 global epoch 24 took 0.0006 h learning rate 0.030000 will be used for epoch 25 120 0.874846 0.293359 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8738 validation error:0.2908 validation top_5_error:0.0335 weights saved at epoch 25 global epoch 25 took 0.0006 h learning rate 0.030000 will be used for epoch 26 120 0.909652 0.301074 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8264 validation error:0.2759 validation top_5_error:0.0253 global epoch 26 took 0.0006 h learning rate 0.030000 will be used for epoch 27 120 0.890976 0.298438 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8541 validation error:0.2847 validation top_5_error:0.0276 global epoch 27 took 0.0006 h learning rate 0.030000 will be used for epoch 28 120 0.902822 0.301855 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8562 validation error:0.2796 validation top_5_error:0.0321 global epoch 28 took 0.0006 h learning rate 0.030000 will be used for epoch 29 120 0.868439 0.288281 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8038 validation error:0.2721 validation top_5_error:0.0236 global epoch 29 took 0.0006 h learning rate 0.030000 will be used for epoch 30 120 0.828405 0.275781 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.7973 validation error:0.2660 validation top_5_error:0.0260 weights saved at epoch 30 global epoch 30 took 0.0006 h learning rate 0.030000 will be used for epoch 31 120 0.836192 0.279102 time per 40 batches: 1.41 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8056 validation error:0.2677 validation top_5_error:0.0265 global epoch 31 took 0.0006 h learning rate 0.030000 will be used for epoch 32 120 0.814471 0.275879 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.8093 validation error:0.2726 validation top_5_error:0.0256 global epoch 32 took 0.0006 h learning rate 0.030000 will be used for epoch 33 120 0.814167 0.270703 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7659 validation error:0.2537 validation top_5_error:0.0234 global epoch 33 took 0.0006 h learning rate 0.030000 will be used for epoch 34 120 0.816199 0.268457 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7863 validation error:0.2616 validation top_5_error:0.0258 global epoch 34 took 0.0006 h learning rate 0.030000 will be used for epoch 35 120 0.836342 0.278809 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.8057 validation error:0.2638 validation top_5_error:0.0252 weights saved at epoch 35 global epoch 35 took 0.0006 h learning rate 0.030000 will be used for epoch 36 120 0.814999 0.274316 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.7592 validation error:0.2496 validation top_5_error:0.0244 global epoch 36 took 0.0006 h learning rate 0.030000 will be used for epoch 37 120 0.800424 0.262012 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7627 validation error:0.2532 validation top_5_error:0.0253 global epoch 37 took 0.0006 h learning rate 0.030000 will be used for epoch 38 120 0.808048 0.263086 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.8022 validation error:0.2727 validation top_5_error:0.0270 global epoch 38 took 0.0006 h learning rate 0.030000 will be used for epoch 39 120 0.800019 0.262695 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7372 validation error:0.2451 validation top_5_error:0.0230 global epoch 39 took 0.0006 h learning rate 0.030000 will be used for epoch 40 120 0.788953 0.259863 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7314 validation error:0.2372 validation top_5_error:0.0237 weights saved at epoch 40 global epoch 40 took 0.0007 h learning rate 0.030000 will be used for epoch 41 120 0.790210 0.261621 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.7755 validation error:0.2491 validation top_5_error:0.0271 global epoch 41 took 0.0006 h learning rate 0.030000 will be used for epoch 42 120 0.775299 0.256152 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7031 validation error:0.2253 validation top_5_error:0.0218 global epoch 42 took 0.0006 h learning rate 0.030000 will be used for epoch 43 120 0.775341 0.254395 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7114 validation error:0.2379 validation top_5_error:0.0242 global epoch 43 took 0.0006 h learning rate 0.030000 will be used for epoch 44 120 0.803203 0.258789 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7398 validation error:0.2435 validation top_5_error:0.0233 global epoch 44 took 0.0006 h learning rate 0.030000 will be used for epoch 45 120 0.766534 0.250000 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7313 validation error:0.2346 validation top_5_error:0.0215 weights saved at epoch 45 global epoch 45 took 0.0006 h learning rate 0.030000 will be used for epoch 46 120 0.775018 0.251953 time per 40 batches: 1.42 (train 0.34 sync 0.02 comm 0.94 wait 0.12) validation cost:0.6963 validation error:0.2260 validation top_5_error:0.0219 global epoch 46 took 0.0006 h learning rate 0.030000 will be used for epoch 47 120 0.729339 0.245215 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7269 validation error:0.2363 validation top_5_error:0.0258 global epoch 47 took 0.0006 h learning rate 0.030000 will be used for epoch 48 120 0.720584 0.236328 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.6908 validation error:0.2200 validation top_5_error:0.0209 global epoch 48 took 0.0006 h learning rate 0.030000 will be used for epoch 49 120 0.729975 0.240039 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7690 validation error:0.2496 validation top_5_error:0.0231 global epoch 49 took 0.0006 h learning rate 0.030000 will be used for epoch 50 120 0.739729 0.240625 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.7197 validation error:0.2391 validation top_5_error:0.0244 weights saved at epoch 50 global epoch 50 took 0.0006 h learning rate 0.003000 will be used for epoch 51 120 0.624056 0.206641 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.6035 validation error:0.1981 validation top_5_error:0.0172 global epoch 51 took 0.0006 h learning rate 0.003000 will be used for epoch 52 120 0.563906 0.186035 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5849 validation error:0.1936 validation top_5_error:0.0164 global epoch 52 took 0.0006 h learning rate 0.003000 will be used for epoch 53 120 0.570724 0.194336 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5823 validation error:0.1916 validation top_5_error:0.0160 global epoch 53 took 0.0006 h learning rate 0.003000 will be used for epoch 54 120 0.569369 0.191504 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5668 validation error:0.1892 validation top_5_error:0.0153 global epoch 54 took 0.0006 h learning rate 0.003000 will be used for epoch 55 120 0.552977 0.187695 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5634 validation error:0.1851 validation top_5_error:0.0149 weights saved at epoch 55 global epoch 55 took 0.0006 h learning rate 0.003000 will be used for epoch 56 120 0.554474 0.186816 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5620 validation error:0.1872 validation top_5_error:0.0134 global epoch 56 took 0.0006 h learning rate 0.003000 will be used for epoch 57 120 0.529871 0.182910 time per 40 batches: 1.44 (train 0.34 sync 0.03 comm 0.95 wait 0.12) validation cost:0.5615 validation error:0.1853 validation top_5_error:0.0153 global epoch 57 took 0.0006 h learning rate 0.003000 will be used for epoch 58 120 0.516433 0.175488 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5533 validation error:0.1819 validation top_5_error:0.0148 global epoch 58 took 0.0006 h learning rate 0.003000 will be used for epoch 59 120 0.514316 0.174512 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.96 wait 0.12) validation cost:0.5509 validation error:0.1827 validation top_5_error:0.0144 global epoch 59 took 0.0006 h learning rate 0.003000 will be used for epoch 60 120 0.543264 0.185156 time per 40 batches: 1.44 (train 0.34 sync 0.03 comm 0.95 wait 0.12) validation cost:0.5457 validation error:0.1770 validation top_5_error:0.0138 weights saved at epoch 60 global epoch 60 took 0.0006 h learning rate 0.000300 will be used for epoch 61 120 0.501043 0.171484 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5402 validation error:0.1786 validation top_5_error:0.0130 global epoch 61 took 0.0006 h learning rate 0.000300 will be used for epoch 62 120 0.514684 0.174316 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.96 wait 0.12) validation cost:0.5402 validation error:0.1784 validation top_5_error:0.0134 global epoch 62 took 0.0006 h learning rate 0.000300 will be used for epoch 63 120 0.513651 0.174219 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5396 validation error:0.1798 validation top_5_error:0.0129 global epoch 63 took 0.0006 h learning rate 0.000300 will be used for epoch 64 120 0.509927 0.174219 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5393 validation error:0.1782 validation top_5_error:0.0131 global epoch 64 took 0.0006 h learning rate 0.000300 will be used for epoch 65 120 0.522993 0.177539 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.96 wait 0.12) validation cost:0.5385 validation error:0.1776 validation top_5_error:0.0132 weights saved at epoch 65 global epoch 65 took 0.0006 h learning rate 0.000030 will be used for epoch 66 120 0.513386 0.174902 time per 40 batches: 1.44 (train 0.34 sync 0.03 comm 0.95 wait 0.12) validation cost:0.5383 validation error:0.1780 validation top_5_error:0.0135 global epoch 66 took 0.0006 h learning rate 0.000030 will be used for epoch 67 120 0.517145 0.176563 time per 40 batches: 1.44 (train 0.34 sync 0.03 comm 0.96 wait 0.12) validation cost:0.5379 validation error:0.1779 validation top_5_error:0.0134 global epoch 67 took 0.0006 h learning rate 0.000030 will be used for epoch 68 120 0.514622 0.174023 time per 40 batches: 1.44 (train 0.34 sync 0.02 comm 0.96 wait 0.12) validation cost:0.5378 validation error:0.1778 validation top_5_error:0.0134 global epoch 68 took 0.0006 h learning rate 0.000030 will be used for epoch 69 120 0.498994 0.172168 time per 40 batches: 1.43 (train 0.34 sync 0.02 comm 0.95 wait 0.12) validation cost:0.5377 validation error:0.1782 validation top_5_error:0.0133 global epoch 69 took 0.0006 h Rule session 164548 terminated with return code: 0.