Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Revise the unattainable reward_threshold to an attainable value (#2205)
**Issues:** The current `reward_threhold` for `FrozenLake-v0` and `FrozenLake8x8-v0` is too high to be attained. Commit: df515de @joschu **Solution:** Reduce the `reward_threhold` to make them attainable. **Reference:** Codes to calculate the theoretic optimal reward expectations: ```python import gym env = gym.make('FrozenLake-v0') print(env.observation_space.n) # 16 print(env.action_space.n) # 4 print(env.spec.reward_threshold) # 0.78, should be smaller print(env.spec.max_episode_steps) # 100 import numpy as np v = np.zeros((101, 16), dtype=float) q = np.zeros((101, 16, 4), dtype=float) pi = np.zeros((101, 16), dtype=float) for t in range(99, -1, -1): # backward for s in range(16): for a in range(4): for p, next_s, r, d in env.P[s][a]: q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s]) v[t, s] = q[t, s].max() pi[t, s] = q[t, s].argmax() print(v[0, 0]) # ~0.74 < 0.78 ``` ```python import gym env = gym.make('FrozenLake8x8-v0') print(env.observation_space.n) # 64 print(env.action_space.n) # 4 print(env.spec.reward_threshold) # 0.99, should be smaller print(env.spec.max_episode_steps) # 200 import numpy as np v = np.zeros((201, 64), dtype=float) q = np.zeros((201, 64, 4), dtype=float) pi = np.zeros((201, 64), dtype=float) for t in range(199, -1, -1): # backward for s in range(64): for a in range(4): for p, next_s, r, d in env.P[s][a]: q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s]) v[t, s] = q[t, s].max() pi[t, s] = q[t, s].argmax() print(v[0, 0]) # ~0.91 < 0.99 ```
- Loading branch information