-
Notifications
You must be signed in to change notification settings - Fork 3.5k
The low score problem in transformer wmt32k #455
Comments
I am testing transformer with other dataset too . In LDC Chinese to English dataset , RNN with attention model got 34.5 bleus (on top of open source code) , and transformer model got 41.57(tensor2tensor), which improves the LDC CH-EN results . |
It seem that yes, see #444.
worker_gpu_memory_fraction is 0.95 by default. |
@martinpopel Thanks for your quickly reply . Devices possible , I will try it , and report the new result . |
@martinpopel I tried it , the bpe result on newstest2014 got 26.07 score (3 GPUs and batch_size=3072). The training is not ended . So , have you extract transformer as a separate module ? If so , is it possible to send the separate modult to me ? |
@liesun1994: I don't understand what do you mean by "extract transformer as a separate module". For training, I am using T2T without any changes.
As I think about it now, I must correct my previous claim. |
Sorry for my poor English 😆 , t2t contains a lot of tasks , and transformer may be the module we wanted . |
Just because the code is difficult to understand and modified . |
Hi, @martinpopel |
@njoe9: I save checkpoints each hour and average 8 or 16 (or even 32) checkpoints. In the early training phases it is better to average less checkpoints (probably because the older checkpoints produce notably worse BLEU than the most recent checkpoint). BTW: I think the discussion is diverging. The original question has been answered (the number of GPUs and batch size are important when comparing results after a given number of steps), so @liesun1994 can close this issue, to keep the list of open issues tidy. |
@njoe9 I think the script can solve your problem. https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/avg_checkpoints.py #317 |
Thanks. @martinpopel @liesun1994 |
@liesun1994 , as you mentioned In LDC Chinese to English dataset , transformer model got 41.57(tensor2tensor), wow, I only got 21.43... Could you pls share the configuration when you train LDC dataset? Thanks. |
@liesun1994 same here, quite interested in learning about what you did with the LDC Chinese to English dataset. Where can I download a copy as well? |
Hello
I am using the early version 1.2.9 of tensor2tensor. And I am trying to reproduce wmt results . When I am using two gpus and the following configuration , newstest2014 only got 23.50 bleus . The configuration is listed as follows:
I am using transformer_base_sigle_gpu and train_steps=400000,the model is trained for tokenized data set . All the other parameters are set to default . Does the number of GPUs have a great influence on the wmt results ? Or anything wrong with my configuration ? Thanks .
The text was updated successfully, but these errors were encountered: