Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

using ensemble models improves? #473

Closed
weitaizhang opened this issue Dec 14, 2017 · 12 comments
Closed

using ensemble models improves? #473

weitaizhang opened this issue Dec 14, 2017 · 12 comments
Labels

Comments

@weitaizhang
Copy link

hi guys,
did you try ensemble models in translation ? does that improve (like how much) or not ? I would appreciate it if some one could show your experiment results .

@weitaizhang
Copy link
Author

I am trying code with "trainer_utils_test.py" and will paste my conclusions later.

@liesun1994
Copy link

I am using transformer . When using avg_checkpoints , my results did not improves a lot actually . Like @edunov metioned , it got +1 bleus . But my experiments only get 0.1 improvement . I am using the same script (avg_checkpoints , and using last five checkpoints ). Can @martinpopel give me some useful suggestions ?

@martinpopel
Copy link
Contributor

@liesun1994: Note that this issue is about ensembling (several models decoding in parallel, voting on each token), not about checkpoint averaging.

As for checkpoint averaging, my experience is that it helps more in the early training stages, but even after weeks of training it still helps (about 0.3 BLEU on average). It depends a lot how frequent are your checkpoints - I prefer 1-hour intervals as I get better averaging results than with the default 10-minutes intervals. I usually use last 8 or 16 checkpoints. As you can see in the following graph, no averaging (orange curve) is almost always worse than 8-checkpoints (red) or 16-checkpoints (blue) averaging, although the size of the improvement changes as the no-avg curve fluctuates a lot (and the avg curves are more stable, as expected):
averaging-no-8-16

@liesun1994
Copy link

@martinpopel wow,I did not know the difference between ensemble and checkpoint average until now 😆 . The paper used last five checkpoints and got 27.30 bleus in newstest2014 . And I am using the code you mentioned in #458 (bpe , etc.) , the latest result is 26.47 in newstest2014 (single model). The gap is closer . But it can not achive 27.30 bleus . If we modify the code , our baseline is lower than the paper mentioned . Have you achived 27.3 bleus in newstest2014 ? REALLY THANKS .

@weitaizhang
Copy link
Author

weitaizhang commented Dec 18, 2017

@liesun1994 @martinpopel
I tried checkpoint averaging and did not get much improvement ,maybe a lit but not as much as we got using RNN . That's upset.
I am here trying to using ensemble models, as @martinpopel says, decoding parallel with several models . Did you guys tried that before? thx.

@weitaizhang
Copy link
Author

@liesun1994 @martinpopel
actually, I found that the decoding results is not exactly same if I use different batch size .It's somewhat different in some sentences .
Did you guys found that ? and know why?

@martinpopel
Copy link
Contributor

@weitaizhang: If you mean batch_size within training, then yes, it affects the results as discussed e.g. in #444 (comment)
If you mean --decode_batch_size, then it should be probably reported as an issue (I think I have seen it in some older version, but now I cannot replicate it).
And answering your previous question: I haven't tried ensembling in T2T, I think it would be great if you make it work and send a PR.

@weitaizhang
Copy link
Author

weitaizhang commented Dec 19, 2017

I think my codes is v1.2.8 and checked out on Nov. 13 and yes I mean with different decode_batch_size the decoded results is not exactly same.
Maybe it's a bug and fixed in later versions.I am trying to figure it out.

@weitaizhang
Copy link
Author

hi, guys.
I have completed with my translation tasks with ensemble models. But sorry I cannot sent a PR because my gpu machines are not connected with internet. the BLEU results on my tasks will have 1-2 BLEU improvements. hope that helps.

@liesun1994
Copy link

wow,it would be nice to send a PR when your machine works .

@cshanbo
Copy link
Contributor

cshanbo commented Mar 9, 2018

Hi @weitaizhang , great work! is the ensemble ready for making a PR? I think lots of people would like to see the progress.

@tan-xu
Copy link

tan-xu commented Mar 7, 2019

@weitaizhang Could you kindly share the codes to ensemble models? Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants