chainer implementatio of Averaged-DQN. This code is partly based on here.
By taking the average of the latst k parameters for estimaing the Q-function, Averaged-DQN stablizes the performance. If k is 1, this is essentially the same as standard DQN.
python averaged_dqn.py --K=k --Episode=episode
I check the estimation error of Q-function varying the value of k.
k=1 | k=2 | k=3 | k=5 | k=10 |
---|---|---|---|---|
53.98 | 10.27 | 1.43 | 1.42 | 0.69 |
By increasing the value of k, you can reduce estimation error.
Next, I checked the average reward for each episode.
k=1 | k=2 | k=3 | k=5 | k=10 |
---|---|---|---|---|
152.36 | 151.85 | 149.69 | 165.04 | 130.29 |
When setting the value of k to be 5, it shows the best performance.
The detail is described in averaged_dqn_analysis.ipynb.