Skip to content

Commit

Permalink
more reward thresholds
Browse files Browse the repository at this point in the history
  • Loading branch information
joschu committed May 25, 2016
1 parent 2853521 commit df515de
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions gym/envs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,13 +114,15 @@
entry_point='gym.envs.toy_text:FrozenLakeEnv',
kwargs={'map_name' : '4x4'},
timestep_limit=100,
reward_threshold=0.78, # optimum = .8196
)

register(
id='FrozenLake8x8-v0',
entry_point='gym.envs.toy_text:FrozenLakeEnv',
kwargs={'map_name' : '8x8'},
timestep_limit=200,
reward_threshold=0.99, # optimum = 1
)

register(
Expand All @@ -139,6 +141,7 @@
id='Taxi-v1',
entry_point='gym.envs.toy_text.taxi:TaxiEnv',
timestep_limit=200,
reward_threshold=9.7, # optimum = 10.2
)

# Mujoco
Expand Down

1 comment on commit df515de

@ZhiqingXiao
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joschu I do not think the optimum for FrozenLake-v0 is correct. According to my calculation, the optimal average episode reward is around 0.74, which is smaller than the threshold you have designated.

My calculation:

import gym
env = gym.make('FrozenLake-v0')
print(env.observation_space.n) # 16
print(env.action_space.n) # 4
print(env.spec.reward_threshold) # 0.78, should be smaller
print(env.spec.max_episode_steps) # 100

v = np.zeros((101, 16), dtype=float)
q = np.zeros((101, 16, 4), dtype=float)
pi = np.zeros((101, 16), dtype=float)
for t in range(99, -1, -1): # backward
    for s in range(16):
        for a in range(4):
            for p, next_s, r, d in env.P[s][a]:
                q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s])
        v[t, s] = q[t, s].max()
        pi[t, s] = q[t, s].argmax()
print(v[0, 0]) # ~0.74 < 0.78

Please sign in to comment.