Q-Learning

a) A simple tabular Q learning app for OpenAIGym CartPole v1 model

b) A simple deep Q learning app with one layer for OpenAIGym CartPole v1 model

It works for at least 400000 steps

c) A simple deep Q learning try with PyTorch for OpenAIGym CartPole v1 model

It doesn't work for more than one layer, It cannot be solved.

d) A simple Policy Gradient application for OpenAIGym CartPole v1 model

It generally converges and when it converges it works for at least 5000(because I only tried 5000), probably infinite steps. If not converges, run it one more times.