-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nan Values on Observation and Action #389
Comments
Another run output that took some more time to crash:
|
Hello, Thanks for reaching out. Did you use the default pandapower backend or the faster one lightsim2grid? I suspect it's an error there. I'll have a look, probably first week of 2023 and see if I can reproduce this behavior. This does not look right. In the mean time, you can replace the nans (for example by 0.) in the environment you are using when you pass the observation to the agent (you overload the GymEnv class and customize the "reset" and "step" method to replace the nans) like class CustomGymEnv(GymEnv):
def step(self, act):
obs, reward, done, info = super().step(act)
# here put a code to remove the nans, of the observation
obs_modif =...
return obs_modif, reward, done, info
def reset(self) :
obs = super().reset()
# same as above
obs_modif =...
return obs_modif Thanks for spotting this bug Benjamin |
Good morning, sorry for the late response, i think i am not using lightsim2grid indeed. ill try with that and replacing the nan's. Merry christmas! |
Thanks for the update. It should be working with the default backend. So this is definitely something I'll have a look at. If it's in pandapower backend, it's likely in the way grid2op handles the observation. I'll try to see were this comes from. It's probably an attribute that is not updated when "done=True" (and if I remember correctly, library should not be using anything when "done=True" but I guess some framework (stable baselines for example) still uses them which cause the issue... Merry Christmas to you too 😊 |
IF SB3 is doing that i whould need to have a word with them... xD
it has been executing for 40 minutes with no crash. ill add a log to see when the nans ocur, and leave the experiment for 5 days. If after 5 days it does not crash i think its safe to say its stable. |
I think they do it but I'm not sure 😉 so better check before ^^
I modified a bit the code, because I'm not sure with your fix that the |
Ok, with the following Gymenv and using the pandas backend:
i get this output:
so im thinking the nans does not come from the gymenv becasue the log.debug functions does not get called... |
Oh, something come to mind. Are you sure you use observation representated as vector? You can consult the last notebook of the tutorial to help you convert observation space and retrieve vectors instead of dictionaries. |
Hi, i have changed the observation space of the gymenv with this:
is this what you mean? |
Yes if you did that then the fix above you fix the Nan from grid2op side at least. It might come from the converter maybe 🤔 I'll have a look when I can. |
There is no hurry, ill keep posting here my findings to have them logged or else ill forget them, but this can be solved next year with no problem :) thanks allot for all the help. |
After been executing for more than a week both examples of the lightsim backend failed:
But i think this is totaly unrelated to Grid2Op. the only diferent on the code is this:
So the porblem only happend with the normal Pandas backend. (i think) |
i am reruning the experiment with lightsimbackend but only one (with no custom gymenv) to see what happends |
It still runing 👍 |
Ok great to see that the problem is solved. I'll try to check where it arises in pandapower. Thanks |
Thanks to you for the sugestion. :) |
Im executing a more complex code but using the LightSimBackend and it crashes again:
:( |
I have re-runed the same code from my previous response but with CloseToOverflowReward on the environment, and it has been runing for 2 days with no issue. |
In the experiment that crashes, can you tell me which reward you were using ? Because indeed if you got "nan" as a reward then afterwards you might get "nan" pretty much everywhere. Thanks for investigating |
I had the same problem and doing debug I found that the theta node parameter was nan after converting the observation to graph in the simplest environment of all:
I solved it by forcing the nans to be 0, but since it is an angle on the phases of the voltages I don't know if I am biasing the agent too much. does it make sense that that theta is nan or is it a bug? |
Hello, No it's not normal at all, can you fill up an issue (bug) for that ? In reality theta (actually it's theta_or - theta_ex) is really closely linked with the active flow (p_or and p_ex) so it should not be 0. But i'm not sure using "theta" in a neural network is a good idea, it might be but it might be terrible (even without the bug) |
With your help, I finally managed to find the cause of the issue, which was caused by the I will try to adress it as soon as I can and it will be part of next release |
@pablo-ta can you try to install the development version: |
i was using the default one ( i did not specify any reward to the environment) |
Im on it. ill post my result in a day or two (to be sure that is working) |
Thanks a lot :-) I run a similar code to yours all night and it did not crash neither with pandapower nor with lightsim2grid, but unfortunately my laptop is not really made for that... So it's best if you can :-) Thanks ! |
It looks to be runing now. hasnst crash in a week (the simple code) |
Thanks a lot :-) And glad to hear it's finally working :-) |
I'm closing this issue as it appears to have been fixed |
Environment
1.8.0
Windows 10.
Installed libraries on python:
Bug description
I am trying to make a stable and dinamic SB3<->Grid2op conexion code. (that will work with citylearn too)
During training of the StableBaselines agent Nan values start to appear on the observation and action space untill all the action are Nan values and it crashes.
The crash time is random sometimes its 5 minutes other times is 5 hours, or anything in between
How to reproduce
Execute the code snippet
Code snippet
Current output
Expected output
The Nan values should not appear.
The text was updated successfully, but these errors were encountered: