You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there any document illustrating how many training steps used to obtain the pre-trained model? Some pretrained model seems far less than the start-of-the-art. For instance, the dqn model on BeamRider and Qbert only achieve 948.0 and 550.0. However, using other policies (e.g., PPO2 and ACKTR), such reward values could be 10,000+.
It would be better if you can provide these pre-trained models as a trustworthy baseline for benchmarking.
The text was updated successfully, but these errors were encountered:
Is there any document illustrating how many training steps used to obtain the pre-trained model? Some pretrained model seems far less than the start-of-the-art. For instance, the dqn model on BeamRider and Qbert only achieve 948.0 and 550.0. However, using other policies (e.g., PPO2 and ACKTR), such reward values could be 10,000+.
It would be better if you can provide these pre-trained models as a trustworthy baseline for benchmarking.
The text was updated successfully, but these errors were encountered: