-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 15 replies
-
Hi, Your curves are not incorrect, you just have pushed the training too far! 😄 If you stop the training at the 13th epoch( The real-time and on-device constraints have forced us to have a much lower number of generator parameters in comparison to discriminators (1.9M vs 27.8M). Starting from that point, it was hard to reach a Nash equilibrium during the training. BUT, this does not prevent obtaining a performing generator. If you want to go further than a simple result reproduction, we have tried two interesting techniques from the Encodec Paper that helped to stabilize training and improved results:
We did not include those techniques as they were not part of our original paper. |
Beta Was this translation helpful? Give feedback.
Hi,
Your curves are not incorrect, you just have pushed the training too far! 😄 If you stop the training at the 13th epoch($\approx 200k$ steps), you will obtain the same model as the one in the project.
The real-time and on-device constraints have forced us to have a much lower number of generator parameters in comparison to discriminators (1.9M vs 27.8M). Starting from that point, it was hard to reach a Nash equilibrium during the training. BUT, this does not prevent obtaining a performing generator.
If you want to go further than a simple result reproduction, we have tried two interesting techniques from the Encodec Paper that helped to stabilize training and improved results: