-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Cannot train properly #148
Comments
How is your decoding set up? What's the |
I decode using the parameters on the guide (BEAM_SIZE=4 What I'm worrying about is that the trained model only output random sequences instead of translation. The below are examples outputted when the training process finished. INFO:tensorflow:Inference results INPUT: Protesters on Black Friday demanded a salary increase and complained that the cost of medical insurance provided by the corporation went from 30 to 100 dollars a month. |
What were your eval scores ( |
A little higher than totally random, I think. |
Then it's no wonder the inference doesn't work. You first need to train your model and check the evals. |
Yes, I just wonder why exactly following the guidance only leads to such result. It seems that the model trained for 250k iterations. |
@lukaszkaiser I also found this problem. I have already trained a good model with BLEU score 26.x. But When I upgrade the t2t version and train a tranformer_big model,it encounter two problems: (1) the first problem is the score is 0 when doing eval, the same with issue #121 (2) the second problem is the same with this issue as described by @aviczhl2. Test decoding means nothing, sometimes just empty for some sentences. I think this problems are not occasional, because different people both encounter this problems. So is there any ideas ? |
It might be related to the unicode issues we were trying to correct with python3. I think you need to remove your vocab files and data and re-generate them. It should be worth trying with 1.0.14, I'm re-running now and it looks not-random, but too early to be sure about the result. |
Re-generation makes sense. Thanks very much. |
I created a new summarizing problem with a new dataset of my own. I followed new problem with training walkthrough, but whenever I start training all we get in the train directory are:
After that no model or checkpoint is saved. In other words, my problem is not training on the generated data using t2t-datagen. Can anyone please guide me here @lukaszkaiser @aviczhl2. |
Hi @mainakchain -- Just to understand you correctly: you are not saying that that tutorial doesn't train -- you implemented your own problem class and that fails to train? Also what do the logs of t2t-trainer say -- maybe the training is just slow? |
Hi @afrozenator I have been at this for some days. I am not able to train my summarization model. At first I created a new problem, and tried registering and training using it. Recently, I shaped my data in the summarize_cnn_dailymail32k problem format and tried to train it with the predefined cnn_dailymail problem. But, it never outputs more than the above 5 files. (When I train other problems, I happen to get files like checkpoint, model.ckpt.* sometime after my training starts using t2t-trainer, which is so obvious). To my surprise, the problem training on my data (even with predefined daily mail problem) consumes all the cores of my system and even takes up most of my GPU RAM, without giving any info neither on the screen nor as output files. For getting a deeper understanding, I tried to put up the output_dir on tensorboard. All it showed was a graph of transformer architecture. The projector, scalar of other curves are not formed at all. Any kind of help is highly appreciated. @lukaszkaiser @aviczhl2 |
The example model tend to get a good result, but turn out to transform badly when I follow the exact steps of it. What is the case that the model cannot get a performance?
INFO:tensorflow:Inference results INPUT: Protesters on Black Friday demanded a salary increase and complained that the cost of medical insurance provided by the corporation went from 30 to 100 dollars a month.
INFO:tensorflow:Inference results OUTPUT: Das bedeutet, dass die meisten unserer Mitarbeiter und Mitarbeiter in der Lage sein werden, ihre Aufgaben zu erfüllen.
INFO:tensorflow:Inference results INPUT: Among these projects, he mentioned that five are in Peru and are located in the transverse axes of its territory, between the coast and Brazil, and two focus on increased connection with Ecuador, although he gave no further details.
INFO:tensorflow:Inference results OUTPUT: Das Hotel befindet sich in der Nähe des Hotels, nur wenige Minuten von der U-Bahnstation entfernt.
The text was updated successfully, but these errors were encountered: