Python implementation of policy search and model training using
Depeweg, Stefan, et al. "Learning and policy search in stochastic dynamical systems with bayesian neural networks." arXiv preprint arXiv:1605.07127 (2016).
Requires the standard libraries for theano-based models and Lasagne (I use 0.2.dev)
-
Insert industrialbenchmark_python in environment/:
-
Download python version of industrialbenchmark.
-
Move to environment/industrialbenchmark
-
-
Generate batch of state transitions:
cd environment/ python make_data.py
will generate a training and test set stored in environment/out
X: Setpoint,A(t-14),..,A(t+1),R(t-15)..,R(t)
Y: R(t+1)
-
Model Training:
cd experiments/ python train_model.py 0.5
Will train a BNN using bb-alpha with alpha=0.5
After training model will be stored in experiments/models as pickle file
Code will run on GPU/CPU. Parameters are chosen conservatively for GPU use. Consider decreasing sample size to 25 for CPU use.
Expected training time (i5-6600K CPU @ 4.0GHz, GTX 1060): CPU:
50 samples: 21.5 hours
25 samples: 10.5 hoursGPU:
50 samples: 3.5 hours
25 samples: 2.0 hours -
Policy Training
cd experiments/ python train_controller.py 0.5
Will train a policy using model from step 2 (required a model exists in models/)
After training the policy will be stored in experiments/controller as pickle file.Code will run on CPU. For GPU use one should pass only indexes to train_func using givens. In our experiments no speedup was obtaiend from GPU use.
-
Policy Evaluation:
An example policy evaluation script is given in environment/eval_pol.pycd environment/ ipython from eval_pol import evaluate results = evalute('../experiments/controller/AD_1.0.p')
Some helpful tips:
model.bb_alpha.network.update_randomness(n_samples)
will sample n_samples from q(W) and resample the input noise.
For prediction use:
m,v = model.predict(np.tile(X,[n_samples,1,1]))
where X is n x d
m is n_samples x n x d
v is n_samples x n x d (constant output noise variance)