NeuralNetworkDeterministicPolicyGradient

###Introduction

Neural Network Deterministic policy gradient is a Julia implementation of a 2014 MDP solution technique (http://jmlr.org/proceedings/papers/v32/silver14.pdf) that uses neural networks for the actor and the critic, eliminating the need for feature engineering.

###Installation To install, simply run

Pkg.clone("https://github.com/cho3/NeuralNetworkDeterministicPolicyGradient")

Note: if you are behind a proxy server, you’ll need to configure git differently. The following commands should do it:

run(`git config --global http.proxy $http_proxy`) # where $http_proxy is your proxy server
run(`git config --global url."https://".insteadOf git://`) # forces git to use https

###Usage Once installed, a generic use case is as follows:

using NeuralNetworkDeterministicPolicyGradient

#define problem
gm = GenerativeModel(init,getNext,isEnd,reward)

#initialize solver components
actor,critic,param,solver,updater = easyInit(n,ub,lb,mem_size,cv,cw=0.,cth=0.,ActorLayers=[5.],CriticLayers=[5.],neuron_type="relu")

#train a policy
policy, qs = train(gm,trainRNG,actor,critic,param,solver,updater,time_horizon=500,num_episodes=10,eps=0.5,alpha=[alpha_th;alpha_w;alpha_v],gamma=0.99,natural=true,verbose=false,experience_replay=false)

#evaluate policy
R_avg, hists = runSim(gm,simRNG,actRNG,policy,time_horizon=500,recordHist = false,nSims=100,verbosde=true)

Below is a brief overview of the less clear parameters #####GenerativeModel

init(AbstractRNG)->State

getNext(AbstractRNG,State,Action)->State

isEnd(State)->Bool

reward(State,Action->Float

#####easyInit

n: dimensionality of state space

ub,lb: upper and lower bound of each action dimension (Float arrays)

mem_size: number of experience tuples to maintain in memory

cv,cw,cth: L2 regularization term on the value critic, advantage critic, and actor respectively

ActorLayers,CriticLayers: denotes the number of hidden layers and the size of each hidden layer as a factor of n

neuron_type: activation function in the hidden and input layers

It is recommended that the states and actions be represented as vectors of floats.

A more complete, ready-to-run example can be found in tests/DPGTest.ipynb More extensive documentation may be found in the /tex folder.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
test		test
tex		tex
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuralNetworkDeterministicPolicyGradient

About

Releases

Packages

Languages

License

cho3/NeuralNetworkDeterministicPolicyGradient

Folders and files

Latest commit

History

Repository files navigation

NeuralNetworkDeterministicPolicyGradient

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages