Error occurs when using multiple gpus #481

njoe9 · 2017-12-21T07:30:54Z

Hi, all:

I cannot apply tensor2tensor to train translation model with multiple gpus(worker_gpu=4) in a server.
The versions of tensorflow and t2t are 1.4 and 1.3.2 respectively.

The error as follows:

InvalidArgumentError (see above for traceback): Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 1) and num_split 4 [[Node: transformer/split = Split[T=DT_INT32, num_split=4, _device="/job:localhost/replica:0/task:0/device:CPU:0"](t ransformer/split/split_dim, input_fn/ExpandDims_1)]] [[Node: transformer/body/model/parallel_1/body/decoder/layer_4/self_attention/multihead_attention/output_transform/T ensordot/Gather/_5817 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_d evice="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_18318_transformer/body/mod el/parallel_1/body/decoder/layer_4/self_attention/multihead_attention/output_transform/Tensordot/Gather", tensor_type=DT_INT3 2, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]

The training script is the following:
t2t-trainer \ --data_dir=$DATA_DIR \ --problems=$PROBLEM \ --model=$MODEL \ --hparams_set=$HPARAMS \ --hparams='hidden_size=1024,batch_size=4096,num_heads=16,attention_key_channels=64,attention_value_channels=64' \ --train_steps=500000 \ --worker_gpu_memory_fraction=0.98 \ --worker_gpu=4 \ --output_dir=$TRAIN_DIR

Then what's the problem?

Thanks.

The text was updated successfully, but these errors were encountered:

mehmedes · 2017-12-21T11:03:37Z

This seems to refer to #266.
Try setting --schedule=train to disable evaluation or apply the workaround mentioned in #266.

rsepassi · 2017-12-21T17:42:02Z

Thank you @mehmedes. Closing in favor of #266

rsepassi closed this as completed Dec 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error occurs when using multiple gpus #481

Error occurs when using multiple gpus #481

njoe9 commented Dec 21, 2017 •

edited

Loading

mehmedes commented Dec 21, 2017

rsepassi commented Dec 21, 2017

Error occurs when using multiple gpus #481

Error occurs when using multiple gpus #481

Comments

njoe9 commented Dec 21, 2017 • edited Loading

mehmedes commented Dec 21, 2017

rsepassi commented Dec 21, 2017

njoe9 commented Dec 21, 2017 •

edited

Loading