a) A simple tabular Q learning app for OpenAIGym CartPole v1 model
b) A simple deep Q learning app with one layer for OpenAIGym CartPole v1 model
- It works for at least 400000 steps
c) A simple deep Q learning try with PyTorch for OpenAIGym CartPole v1 model
- It doesn't work for more than one layer, It cannot be solved.
d) A simple Policy Gradient application for OpenAIGym CartPole v1 model
- It generally converges and when it converges it works for at least 5000(because I only tried 5000), probably infinite steps. If not converges, run it one more times.