Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedispReward returns reward greater than the reward_max #187

Closed
mlanden opened this issue Apr 13, 2021 · 3 comments · Fixed by #190
Closed

RedispReward returns reward greater than the reward_max #187

mlanden opened this issue Apr 13, 2021 · 3 comments · Fixed by #190
Labels
bug Something isn't working

Comments

@mlanden
Copy link

mlanden commented Apr 13, 2021

Environment

  • Grid2op version: 1.4.0
  • System: Red Hat Enterprise Linux Server 7.9 (Maipo)"
  • Additional system information

Bug description

The redispReward is return rewards that are greater than the maximum reward calculated.

How to reproduce

Create environment with default reward and compare the rewards returned to the max reward computed in the initiative function.


### Code snippet
<!--Expose the python code you want us to test-->
```python
import grid2op

env = grid2op.make()
_, reward, _, _ = env.step(env.action_space({}))

The output of the code snippet above

The reward returned is greater than the reward_max computed in initialize. 
@mlanden mlanden added the bug Something isn't working label Apr 13, 2021
@BDonnot
Copy link
Collaborator

BDonnot commented Apr 13, 2021

Hello,
Thanks for noticing, l will try to update it asap.

It should not be a problem, maybe in the mean time you can have a look at other environments, that should behave more normally i hope.

The other env. close to the default one is

import grid2op
env_name="l2rpn_case14_realistic"
env = grid2op.make(en_name) 

@mlanden
Copy link
Author

mlanden commented Apr 13, 2021

Unfortunately, that does not seem to be the case. with the ieee 14 bus, I still see

Max: 706.4000244140625, Reward: 1085.8973388671875 redispach 1085.8973388671875, overflow 0.0

where the overflow reward is what I'm adding, but it is not relevant here because it is 0. I am applying interpolation to this reward so the max should be accurate to get correct results from np.interp.

@BDonnot
Copy link
Collaborator

BDonnot commented Apr 14, 2021

For me, for later resuse:

import grid2op
env_name="l2rpn_case14_sandbox"
env = grid2op.make(env_name)
obs = env.reset()
obs, reward, done, info = env.step(env.action_space())
print(f"obtained reward: {reward:.2f}")
print(f"max reward: {env.reward_range[1]:.2f}")

So indeed there is problem with the way the reward max is computed for this specific case.

@BDonnot BDonnot mentioned this issue Apr 14, 2021
BDonnot added a commit that referenced this issue Apr 15, 2021
Proposing a fix for issue #187
Adding the doc, for issue #179
Adding other doc for issue #184
the documentation of the opponent

See changelog for more information
@BDonnot BDonnot linked a pull request Apr 15, 2021 that will close this issue
BDonnot added a commit that referenced this issue Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants