Deep-Reinforcement-Learning-Snake-AI

环境依赖

Python 3.6.2
TensorFlow 1.3.0
Pygame
OpenCV

如何运行

python play.py 进行游戏
python train.py 训练模型

Deep Q network 算法

Initialize replay memory D to size N
Initialize action-value function Q with random weights
for episode = 1, M do
    Initialize state s_1
    for t = 1, T do
        With probability ϵ select random action a_t
        otherwise select a_t=max_a  Q(s_t,a; θ_i)
        Execute action a_t in emulator and observe r_t and s_(t+1)
        Store transition (s_t,a_t,r_t,s_(t+1)) in D
        Sample a minibatch of transitions (s_j,a_j,r_j,s_(j+1)) from D
        Set y_j:=
            r_j for terminal s_(j+1)
            r_j+γ*max_(a^' )  Q(s_(j+1),a'; θ_i) for non-terminal s_(j+1)
        Perform a gradient step on (y_j-Q(s_j,a_j; θ_i))^2 with respect to θ
    end for
end for

CNN 结构

效果展示

Frames 0 (0h)

Frames 1500000 (+8h)

Frames 3000000 (+20h)

Frames 11000000 (+56h)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
__pycache__		__pycache__
raw		raw
saved		saved
README.md		README.md
play.py		play.py
snaky.py		snaky.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep-Reinforcement-Learning-Snake-AI

环境依赖

如何运行

Deep Q network 算法

CNN 结构

效果展示

参考资料

About

Releases

Packages

Languages

Hugo1030/Deep-Reinforcement-Learning-Snake-AI

Folders and files

Latest commit

History

Repository files navigation

Deep-Reinforcement-Learning-Snake-AI

环境依赖

如何运行

Deep Q network 算法

CNN 结构

效果展示

参考资料

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages