Skip to content

"Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow

License

Notifications You must be signed in to change notification settings

semueller/NAF-tensorflow

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Normalized Advantage Functions (NAF) in TensorFlow

TensorFlow implementation of Continuous Deep q-Learning with Model-based Acceleration.

algorithm

Environments:

  • InvertedPendulum-v1
  • InvertedDoublePendulum-v1
  • Reacher-v1
  • HalfCheetah-v1
  • Swimmer-v1
  • Hopper-v1
  • Walker2d-v1
  • Ant-v1
  • HumanoidStandup-v1

Installation and Usage

The code depends on outdated software, until it is updated to work with current versions of gym/ tensorflow /mujoco, set up a custom virtualenv (eg with conda) for this and run setup.sh:

$ conda create --name naf python=2.7
$ source actiavate naf
$ ./setup.sh

To train a model for an environment with a continuous action space:

$ python main.py --env=InvertedPendulum-v1 --is_train=True
$ python main.py --env=InvertedPendulum-v1 --is_train=True --display=True

To test and record the screens with gym:

$ python main.py --env=InvertedPendulum-v1 --is_train=False
$ python main.py --env=InvertedPendulum-v1 --is_train=False --monitor=True

Results

Training details of Pendulum-v0 with different hyperparameters.

$ python main.py --env=Pendulum-v0 # dark green
$ python main.py --env=Pendulum-v0 --action_fn=tanh # light green
$ python main.py --env=Pendulum-v0 --use_batch_norm=True # yellow
$ python main.py --env=Pendulum-v0 --use_seperate_networks=True # green

Pendulum-v0_2016-07-15

References

Original Author

Taehoon Kim / @carpedm20 Original git

About

"Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.2%
  • Shell 3.8%