You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to reproduce the results on WMT'14 ENDE datasets of "Attention is All You Need"?
I have followed the discussion in #637 but my OpenNMT version is 2.0.1.
This link tells me I need to set the sequence length to 100. However, the training process throws an exception that "Sequence is 12131 but PositionalEncoding is limited to 5000. See max_len argument."
I have checked the code and it seems that the seuqnece length is still too long?
My configuration is shown as follows:
I also tried to set the parameter -src_seq_length_trunc to 100, the training can be done successfully now.
Does that mean the -src_seq_length parameter is not working?
The text was updated successfully, but these errors were encountered:
I'm trying to reproduce the results on WMT'14 ENDE datasets of "Attention is All You Need"?
I have followed the discussion in #637 but my OpenNMT version is 2.0.1.
This link tells me I need to set the sequence length to 100. However, the training process throws an exception that "Sequence is 12131 but PositionalEncoding is limited to 5000. See max_len argument."
I have checked the code and it seems that the seuqnece length is still too long?
My configuration is shown as follows:
Where the samples will be written
save_data: wmt_ende_sp/transformer
Where the vocab(s) will be written
src_vocab: wmt_ende_sp/transformer.vocab.src
tgt_vocab: wmt_ende_sp/transformer.vocab.tgt
Prevent overwriting existing files in the folder
overwrite: True
src_seq_length: 100
tgt_seq_length: 100
share_vocab: True
Corpus opts:
data:
corpus:
path_src: wmt_ende_sp/train.en
path_tgt: wmt_ende_sp/train.de
valid:
path_src: wmt_ende_sp/valid.en
path_tgt: wmt_ende_sp/valid.de
Vocabulary files that were just created
src_vocab: wmt_ende_sp/transformer.vocab
tgt_vocab: wmt_ende_sp/transformer.vocab
src_vocab_size: 32000
tgt_vocab_size: 32000
Training
save_model: wmt_ende_sp/tf.model
save_checkpoint_steps: 10000
valid_steps: 10000
train_steps: 200000
Batching
batch_type: "tokens"
batch_size: 4096
max_generator_batches: 2
accum_count: [4]
accum_steps: [0]
Optimization
optim: "adam"
learning_rate: 2
warmup_steps: 8000
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: True
normalization: "tokens"
Model
encoder_type: transformer
decoder_type: transformer
position_encoding: True
layers: 6
heads: 8
rnn_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout: [0.1]
share_embeddings: True
Train on a single GPU
world_size: 2
gpu_ranks: [0, 1]
I also tried to set the parameter -src_seq_length_trunc to 100, the training can be done successfully now.
Does that mean the -src_seq_length parameter is not working?
The text was updated successfully, but these errors were encountered: