Using Montezuma’s Revenge environment you will train an agent via Random Network Distillation (RND) algorithm.
In this environment, the observation is an RGB image of the screen, which is an array of shape (210, 160, 3). Given such information, the agent learns how to select best actions for maximizing the score.
18 discrete actions are available (see get_action_meanings()
), such as
0
- 'NOOP'1
- 'FIRE'2
- 'RIGHT'3
- 'LEFT'4
- 'RIGHTFIRE'5
- 'LEFTFIRE'
- Python 3.6
- PyTorch 1.1.0
- OpenCV Python
- OpenAI Gym (for Installation, see here)
-
parallel_envs/
--
atari_wrappers.py
: Taken from openai with minor edits for PyTorch. (Framestack, ClipReward, etc.)--
env_eval.py
: Wrapper for obtaining an original RGB frame and warped frame to 84x84. Mainly for visualization (see test_env.ipynb).--
monitor.py
: Recording rewards and episode lengths and so on.--
make_atari.py
: Creating a wrapped, monitored SubprocVecEnv for Atari.
- Y. Burda, et al. "Exploration by Random Network Distillation"
- openai blog : Reinforcement Learning with Prediction-Based Rewards
- openai code
- Y. Burda, et al. "Large-Scale Study of Curiosity-Driven Learning", (issues)
- On “solving” Montezuma’s Revenge (medium)
- RND-PyTorch (1) / (2)
- Obstacle Tower Environment / Alex Nichol Blog