-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting different results from 'simulate' and 'step' #147
Comments
Hello, Thanks for notifying us. I could not reproduce this bug with pandapower on windows. I'll try with lightsim when i'll have access to a linux machine. For now then, the "easy fix" i would recommend is to switch back to using PandaPowerBackend :-( |
I also found the similar problem in some cases. Please help solve it. |
Hello,
I also run it from the
I'm a bit confused at to why this problem happens on your side. Can you try to uninstall and install grid2op from scratch again ? |
Hello, thank you for your response. Do you make sure that you have updated the env with the following code?
|
Oh you are right, my envs on this test machine were not updated. It would have saved me some time if you would have put this code in your script :-) Now i can reproduce the bug and start working on it. Thanks |
Hello, I found the problem and even have an "easy" solution that you can use with grid2op 1.2.2 (that will stay the default version for the codalab competition). Each time you use This would give, in your example (only end of the code), something like: obs = env.get_obs()
steps = 0
action_space = env.action_space
do_nothing = action_space({})
while steps < 1381:
if steps in actions_steps:
action = actions.pop(0)
else:
action = do_nothing
# if we remove the following 2 lines, the bug disappeared.
_, _, _, info_simulate = obs.simulate(action)
assert not info_simulate['is_illegal'] and not info_simulate['is_ambiguous']
obs, reward, done, info = env.step(action)
assert not info['is_illegal'] and not info['is_ambiguous']
steps += 1
if steps >= MAX_TIMESTEP:
break
# here
obs._obs_env._reset_to_orig_state()
obs_simulate, reward_simulate, done_simulate, info_simulate = obs.simulate(act12)
print('simulate --> is_illegal: {}, is_ambiguous: {}'.format(info_simulate['is_illegal'], info_simulate['is_ambiguous']))
# and if you want to do another "simulate" you need to do:
obs._obs_env._reset_to_orig_state() # <- redo that for "safety"
obs_simulate, reward_simulate, done_simulate, info_simulate = obs.simulate(act11)
obs, reward, done, info = env.step(act12)
print('step --> is_illegal: {}, is_ambiguous: {}'.format(info['is_illegal'],info['is_ambiguous'])) I know it's calling private functions and attributes, but exceptionally you have the right to use it. I am working on a fix that would do that automatically :-) and that will be ready in version 1.3.0 of grid2op (work in progress. Note that this 1.3.0 version will NOT be used to rank your agents in neurips (so prefer using the method described here) |
Thanks for your help! |
Few byg fixes for the gym_compat module
Environment
1.2.2
0.2.4
CentOS
Bug description
simulate
andstep
Code snippet
Current output
Expected output
Note
In these two cases (simulate during interacting or not), not only
info_simulate
but also returnedobs_simulate
are different.The text was updated successfully, but these errors were encountered: