Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan Values on Observation and Action #389

Closed
pablo-ta opened this issue Dec 19, 2022 · 29 comments
Closed

Nan Values on Observation and Action #389

pablo-ta opened this issue Dec 19, 2022 · 29 comments
Labels
bug Something isn't working

Comments

@pablo-ta
Copy link

Environment

  • Grid2op version: 1.8.0
  • System: Windows 10.
  • Python version: 3.9.12
  • Additional system information:
    Installed libraries on python:
absl-py==1.3.0
astunparse==1.6.3
attr==0.3.2
cachetools==5.2.0
certifi==2022.9.24
charset-normalizer==2.1.1
CityLearn==1.4.0
click==8.1.3
cloudpickle==2.2.0
colorama==0.4.6
contourpy==1.0.6
cycler==0.11.0
deepdiff==6.2.1
docker-pycreds==0.4.0
flatbuffers==22.11.23
fonttools==4.38.0
gast==0.4.0
gitdb==4.0.10
GitPython==3.1.29
google-auth==2.14.1
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
Grid2Op==1.7.2
grpcio==1.51.1
gym==0.24.1
gym-notices==0.0.8
h5py==3.7.0
idna==3.4
imageio-ffmpeg==0.4.7
importlib-metadata==4.13.0
keras==2.11.0
kiwisolver==1.4.4
l2rpn-baselines==0.6.0.post1
libclang==14.0.6
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.6.2
networkx==2.8.8
numpy==1.21.6
oauthlib==3.2.2
opt-einsum==3.3.0
ordered-set==4.1.0
packaging==21.3
pandapower==2.10.1
pandas==1.3.5
pathtools==0.1.2
patsy==0.5.3
Pillow==9.2.0
promise==2.3
protobuf==3.19.6
psutil==5.9.4
pyasn1==0.4.8
pyasn1-modules==0.2.8
pygame==2.1.0
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2022.6
PyYAML==6.0
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
scipy==1.9.3
sentry-sdk==1.11.1
setproctitle==1.3.2
shortuuid==1.0.11
simplejson==3.17.6
six==1.16.0
smmap==5.0.0
stable-baselines3==1.6.2
statsmodels==0.13.5
tensorboard==2.11.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.11.0
tensorflow-estimator==2.11.0
tensorflow-intel==2.11.0
tensorflow-io-gcs-filesystem==0.28.0
termcolor==2.1.1
torch==1.13.0+cu116
tqdm==4.64.1
typing_extensions==4.4.0
urllib3==1.26.13
wandb==0.13.6
Werkzeug==2.2.2
wrapt==1.14.1
zipp==3.11.0

Bug description

I am trying to make a stable and dinamic SB3<->Grid2op conexion code. (that will work with citylearn too)

During training of the StableBaselines agent Nan values start to appear on the observation and action space untill all the action are Nan values and it crashes.
The crash time is random sometimes its 5 minutes other times is 5 hours, or anything in between

How to reproduce

Execute the code snippet

Code snippet

import grid2op as grid2op
from grid2op.gym_compat import GymEnv, BoxGymObsSpace, DiscreteActSpace
from l2rpn_baselines.utils import GymAgent
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy

import logging

log = logging.getLogger(__name__)
log.setLevel(logging.DEBUG)


from typing import Optional


class SB3Agent(GymAgent):

    def __init__(self,
                 g2op_action_space,
                 gym_act_space,
                 gym_obs_space,
                 nn_type,
                 nn_path=None,
                 nn_kwargs=None,
                 custom_load_dict=None,
                 gymenv=None,
                 iter_num=None,
                 ):
        self._nn_type = nn_type
        if custom_load_dict is not None:
            self.custom_load_dict = custom_load_dict
        else:
            self.custom_load_dict = {}
        self._iter_num: Optional[int] = iter_num
        super().__init__(g2op_action_space, gym_act_space, gym_obs_space,
                         nn_path=nn_path, nn_kwargs=nn_kwargs,
                         gymenv=gymenv
                         )

    def get_act(self, gym_obs, reward, done):
        action, _ = self.nn_model.predict(gym_obs, deterministic=False)
        return action

    def load(self):
        custom_objects = {"action_space": self._gym_act_space,
                          "observation_space": self._gym_obs_space}
        for key, val in self.custom_load_dict.items():
            custom_objects[key] = val
        path_load = self._nn_path
        if self._iter_num is not None:
            path_load = path_load + f"_{self._iter_num}_steps"
        log.debug(F"loading agent from [{path_load}]")
        self.nn_model = self._nn_type.load(path_load,
                                           custom_objects=custom_objects,
                                           env=self.gymenv)

    def build(self):
        self.nn_model = self._nn_type(**self._nn_kwargs)

    def learn(self,
          total_timesteps=1,
          save_path=None,
          **learn_kwargs):

        if learn_kwargs is None:
            learn_kwargs={}
        # train it
        self.nn_model.learn(total_timesteps=total_timesteps,
                             eval_env=self.gymenv,
                            **learn_kwargs
                             )

        # save it
        if save_path is not None:
            self.nn_model.save(save_path)


if __name__ == "__main__":


    log_format = {
        "fmt": "{asctime} | {levelname:7s} | {name:24s} | {lineno:<4n} | {message}",
        "style": "{",
        "datefmt": '%m-%d %H:%M'
    }
    logFormatter = logging.Formatter(**log_format)
    rootLogger = logging.getLogger()
    rootLogger.handlers = []

    log_config = {
        "level": logging.DEBUG,
    }
    consoleHandler = logging.StreamHandler()
    consoleHandler.setLevel(log_config.get("level"))
    consoleHandler.setFormatter(logFormatter)
    rootLogger.addHandler(consoleHandler)
    log.debug("Loger Configured")


    import torch as th

    th.autograd.set_detect_anomaly(True)

    env = grid2op.make("l2rpn_wcci_2022", difficulty="competition")

    gymenv = GymEnv(env)

    if gymenv.observation_space:
        gymenv.observation_space.close()
    gymenv.observation_space = BoxGymObsSpace (env.observation_space)

    if gymenv.action_space:
        gymenv.action_space.close()
    gymenv.action_space =DiscreteActSpace(env.action_space)

    nn_kwargs = {
                    "env": gymenv,
                    "verbose": True,
                    "policy": MlpPolicy,
                    "policy_kwargs":{
                    "net_arch": [25, 25, 25, 25]
                    }
                }

    agent = SB3Agent(
        env.action_space,
        gymenv.action_space,
        gymenv.observation_space,
        nn_type=PPO,
        nn_kwargs=nn_kwargs,

    )

    agent.learn(total_timesteps=1000000)

Current output

\Repositorios\Minimal_playground\venv\Scripts\python.exe \Repositorios\Minimal_playground\grid2op_sb3.py 
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\Backend\PandaPowerBackend.py:31: UserWarning: Numba cannot be loaded. You will gain possibly massive speed if installing it by 
	\Repositorios\Minimal_playground\venv\Scripts\python.exe -m pip install numba

  warnings.warn(
Warning: Gym version v0.24.1 has a number of critical issues with `gym.make` such that environment observation and action spaces are incorrectly evaluated, raising incorrect errors and warning . It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
\Repositorios\Minimal_playground\venv\lib\site-packages\torch\utils\tensorboard\__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if not hasattr(tensorboard, "__version__") or LooseVersion(
12-19 10:57 | DEBUG   | __main__                 | 97   | Loger Configured
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:171: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "set_storage" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:171: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "curtail" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:171: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "redispatch" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 25.2     |
|    ep_rew_mean     | 2.7e+03  |
| time/              |          |
|    fps             | 14       |
|    iterations      | 1        |
|    time_elapsed    | 143      |
|    total_timesteps | 2048     |
---------------------------------
Traceback (most recent call last):
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 134, in <module>
    agent.learn(total_timesteps=1000000)
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 68, in learn
    self.nn_model.learn(total_timesteps=total_timesteps,
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 172, in collect_rollouts
    actions, values, log_probs = self.policy(obs_tensor)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 590, in forward
    distribution = self._get_action_dist_from_latent(latent_pi)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 609, in _get_action_dist_from_latent
    return self.action_dist.proba_distribution(action_logits=mean_actions)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\distributions.py", line 275, in proba_distribution
    self.distribution = Categorical(logits=action_logits)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 147833)) of distribution Categorical(logits: torch.Size([1, 147833])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0')

Process finished with exit code 1

Expected output

The Nan values should not appear.

@pablo-ta pablo-ta added the bug Something isn't working label Dec 19, 2022
@pablo-ta
Copy link
Author

Another run output that took some more time to crash:


\Repositorios\Minimal_playground\venv\Scripts\python.exe \Repositorios\Minimal_playground\grid2op_sb3.py 
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\Backend\PandaPowerBackend.py:31: UserWarning: Numba cannot be loaded. You will gain possibly massive speed if installing it by 
	\Repositorios\Minimal_playground\venv\Scripts\python.exe -m pip install numba

  warnings.warn(
Warning: Gym version v0.24.1 has a number of critical issues with `gym.make` such that environment observation and action spaces are incorrectly evaluated, raising incorrect errors and warning . It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
\Repositorios\Minimal_playground\venv\lib\site-packages\torch\utils\tensorboard\__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if not hasattr(tensorboard, "__version__") or LooseVersion(
12-19 11:53 | DEBUG   | __main__                 | 97   | Loger Configured
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "set_storage" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "redispatch" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "curtail" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 28.4     |
|    ep_rew_mean     | 3.21e+03 |
| time/              |          |
|    fps             | 14       |
|    iterations      | 1        |
|    time_elapsed    | 140      |
|    total_timesteps | 2048     |
---------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 28.4          |
|    ep_rew_mean          | 3.31e+03      |
| time/                   |               |
|    fps                  | 14            |
|    iterations           | 2             |
|    time_elapsed         | 285           |
|    total_timesteps      | 4096          |
| train/                  |               |
|    approx_kl            | 2.1710235e-05 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -11.9         |
|    explained_variance   | -3.16e-05     |
|    learning_rate        | 0.0003        |
|    loss                 | 1.08e+06      |
|    n_updates            | 10            |
|    policy_gradient_loss | -0.00281      |
|    value_loss           | 2.18e+06      |
-------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 27.6         |
|    ep_rew_mean          | 3.16e+03     |
| time/                   |              |
|    fps                  | 14           |
|    iterations           | 3            |
|    time_elapsed         | 432          |
|    total_timesteps      | 6144         |
| train/                  |              |
|    approx_kl            | 3.085169e-05 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -11.9        |
|    explained_variance   | 0            |
|    learning_rate        | 0.0003       |
|    loss                 | 1.17e+06     |
|    n_updates            | 20           |
|    policy_gradient_loss | -0.00379     |
|    value_loss           | 2.34e+06     |
------------------------------------------
Traceback (most recent call last):
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 134, in <module>
    agent.learn(total_timesteps=1000000)
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 68, in learn
    self.nn_model.learn(total_timesteps=total_timesteps,
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 172, in collect_rollouts
    actions, values, log_probs = self.policy(obs_tensor)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 590, in forward
    distribution = self._get_action_dist_from_latent(latent_pi)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 609, in _get_action_dist_from_latent
    return self.action_dist.proba_distribution(action_logits=mean_actions)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\distributions.py", line 275, in proba_distribution
    self.distribution = Categorical(logits=action_logits)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 147833)) of distribution Categorical(logits: torch.Size([1, 147833])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0')

Process finished with exit code 1

@BDonnot
Copy link
Collaborator

BDonnot commented Dec 19, 2022

Hello,

Thanks for reaching out.

Did you use the default pandapower backend or the faster one lightsim2grid?

I suspect it's an error there.

I'll have a look, probably first week of 2023 and see if I can reproduce this behavior. This does not look right.

In the mean time, you can replace the nans (for example by 0.) in the environment you are using when you pass the observation to the agent (you overload the GymEnv class and customize the "reset" and "step" method to replace the nans) like

class CustomGymEnv(GymEnv):

    def step(self, act):
        obs, reward, done, info = super().step(act) 
        # here put a code to remove the nans, of the observation 
        obs_modif =... 
        return obs_modif, reward, done, info
    def reset(self) :
        obs = super().reset()
        # same as above 
        obs_modif =... 
        return obs_modif 

Thanks for spotting this bug

Benjamin

@pablo-ta
Copy link
Author

pablo-ta commented Dec 22, 2022

Good morning, sorry for the late response, i think i am not using lightsim2grid indeed. ill try with that and replacing the nan's.
ill post here the result of the 3 experiments.

Merry christmas!

@BDonnot
Copy link
Collaborator

BDonnot commented Dec 22, 2022

Thanks for the update.

It should be working with the default backend. So this is definitely something I'll have a look at.

If it's in pandapower backend, it's likely in the way grid2op handles the observation. I'll try to see were this comes from. It's probably an attribute that is not updated when "done=True" (and if I remember correctly, library should not be using anything when "done=True" but I guess some framework (stable baselines for example) still uses them which cause the issue...

Merry Christmas to you too 😊

@pablo-ta
Copy link
Author

IF SB3 is doing that i whould need to have a word with them... xD
With the custom env:


import numpy as np
class CustomGymEnv(GymEnv):

    def step(self, act):
        obs, reward, done, info = super().step(act)
        # here put a code to remove the nans, of the observation
        obs_modif = obs[np.isnan(obs)] = 0
        return obs_modif, reward, done, info
    def reset(self) :
        obs = super().reset()
        # same as above
        obs_modif = obs[np.isnan(obs)] = 0
        return obs_modif

it has been executing for 40 minutes with no crash. ill add a log to see when the nans ocur, and leave the experiment for 5 days. If after 5 days it does not crash i think its safe to say its stable.

@BDonnot
Copy link
Collaborator

BDonnot commented Dec 22, 2022

IF SB3 is doing that i whould need to have a word with them... xD

I think they do it but I'm not sure 😉 so better check before ^^

import numpy as np
class CustomGymEnv(GymEnv):

    def step(self, act):
        obs, reward, done, info = super().step(act)
        # here put a code to remove the nans, of the observation
        obs[np.isnan(obs)] = 0
        return obs, reward, done, info
    def reset(self) :
        obs = super().reset()
        # same as above
        obs[np.isnan(obs)] = 0
        return obs

I modified a bit the code, because I'm not sure with your fix that the obs_modif is the thing you want it to be (obs is modified in-place in your code, I'm not sure it returns something and if does, what it returns... Numpy shenanigans...

@pablo-ta
Copy link
Author

Ok, with the following Gymenv and using the pandas backend:

import numpy as np
class CustomGymEnv(GymEnv):

    def step(self, act):
        obs, reward, done, info = super().step(act)
        # here put a code to remove the nans, of the observation
        if np.any(np.isnan(obs)):
            log.debug(F"Nans found during STEP: {obs}")
        obs[np.isnan(obs)] = 0
        return obs, reward, done, info
    def reset(self) :
        obs = super().reset()
        # same as above
        if np.any(np.isnan(obs)):
            log.debug(F"Nans found during RESET: {obs}")
        obs[np.isnan(obs)] = 0
        return obs

i get this output:


\Repositorios\Minimal_playground\venv\Scripts\python.exe \Repositorios\Minimal_playground\grid2op_sb3.py 
Console output is saving to: C:\Program Files\JetBrains\PyCharm 2022.2.2\jbr\bin\grid2op_sb3.log
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\Backend\PandaPowerBackend.py:31: UserWarning: Numba cannot be loaded. You will gain possibly massive speed if installing it by 
	\Repositorios\Minimal_playground\venv\Scripts\python.exe -m pip install numba

  warnings.warn(
12-22 12:49 | DEBUG   | __main__                 | 114  | Loger Configured
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "curtail" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "set_storage" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "redispatch" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 26.3     |
|    ep_rew_mean     | 2.95e+03 |
| time/              |          |
|    fps             | 15       |
|    iterations      | 1        |
|    time_elapsed    | 129      |
|    total_timesteps | 2048     |
---------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 29.5          |
|    ep_rew_mean          | 3.4e+03       |
| time/                   |               |
|    fps                  | 15            |
|    iterations           | 2             |
|    time_elapsed         | 258           |
|    total_timesteps      | 4096          |
| train/                  |               |
|    approx_kl            | 2.3974513e-05 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -11.9         |
|    explained_variance   | -4.66e-05     |
|    learning_rate        | 0.0003        |
|    loss                 | 9.91e+05      |
|    n_updates            | 10            |
|    policy_gradient_loss | -0.0029       |
|    value_loss           | 2.15e+06      |
-------------------------------------------
Traceback (most recent call last):
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 151, in <module>
    agent.learn(total_timesteps=1000000)
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 68, in learn
    self.nn_model.learn(total_timesteps=total_timesteps,
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 172, in collect_rollouts
    actions, values, log_probs = self.policy(obs_tensor)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 590, in forward
    distribution = self._get_action_dist_from_latent(latent_pi)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 609, in _get_action_dist_from_latent
    return self.action_dist.proba_distribution(action_logits=mean_actions)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\distributions.py", line 275, in proba_distribution
    self.distribution = Categorical(logits=action_logits)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 147833)) of distribution Categorical(logits: torch.Size([1, 147833])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0')

Process finished with exit code 1

so im thinking the nans does not come from the gymenv becasue the log.debug functions does not get called...

@BDonnot
Copy link
Collaborator

BDonnot commented Dec 22, 2022

Oh, something come to mind. Are you sure you use observation representated as vector?
Or are they represented by dictionary (default)?

You can consult the last notebook of the tutorial to help you convert observation space and retrieve vectors instead of dictionaries.

@pablo-ta
Copy link
Author

Hi, i have changed the observation space of the gymenv with this:

gymenv.observation_space = BoxGymObsSpace (env.observation_space)
and with this line:
print(gymenv.observation_space.sample())i get this output:

[4.6165291e-02 3.3105618e-01 3.5263571e-01 ... 1.2340667e+00 4.8806679e-02
 6.2379614e+02]

is this what you mean?

@BDonnot
Copy link
Collaborator

BDonnot commented Dec 22, 2022

Yes if you did that then the fix above you fix the Nan from grid2op side at least.

It might come from the converter maybe 🤔

I'll have a look when I can.

@pablo-ta
Copy link
Author

There is no hurry, ill keep posting here my findings to have them logged or else ill forget them, but this can be solved next year with no problem :) thanks allot for all the help.
im executing now with the ligthsim backend with the customgymenv and withought. ill post here the results.

@pablo-ta
Copy link
Author

pablo-ta commented Jan 3, 2023

After been executing for more than a week both examples of the lightsim backend failed:

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 244          |
|    ep_rew_mean          | 3.6e+04      |
| time/                   |              |
|    fps                  | 91           |
|    iterations           | 17064        |
|    time_elapsed         | 383730       |
|    total_timesteps      | 34947072     |
| train/                  |              |
|    approx_kl            | 2.652334e-05 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.09        |
|    explained_variance   | -1.19e-07    |
|    learning_rate        | 0.0003       |
|    loss                 | 1.56e+06     |
|    n_updates            | 170630       |
|    policy_gradient_loss | -0.000148    |
|    value_loss           | 3.23e+06     |
------------------------------------------
Traceback (most recent call last):
  File "\Repositorios\Minimal_playground\grid2op_sb3_lightsim2grid.py", line 152, in <module>
    agent.learn(total_timesteps=100000000)
  File "\Repositorios\Minimal_playground\grid2op_sb3_lightsim2grid.py", line 69, in learn
    self.nn_model.learn(total_timesteps=total_timesteps,
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 210, in collect_rollouts
    rollout_buffer.add(self._last_obs, actions, rewards, self._last_episode_starts, values, log_probs)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\buffers.py", line 443, in add
    self.values[self.pos] = value.clone().cpu().numpy().flatten()
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

But i think this is totaly unrelated to Grid2Op.

the only diferent on the code is this:

env = grid2op.make("l2rpn_wcci_2022", difficulty="competition",backend=LightSimBackend())

So the porblem only happend with the normal Pandas backend. (i think)

@pablo-ta
Copy link
Author

pablo-ta commented Jan 4, 2023

i am reruning the experiment with lightsimbackend but only one (with no custom gymenv) to see what happends

@pablo-ta
Copy link
Author

It still runing 👍
total_timesteps | 62978048

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 10, 2023

Ok great to see that the problem is solved.

I'll try to check where it arises in pandapower.

Thanks

@pablo-ta
Copy link
Author

Thanks to you for the sugestion. :)

@pablo-ta
Copy link
Author

Im executing a more complex code but using the LightSimBackend and it crashes again:


venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1, 69)) of distribution Normal(loc: torch.Size([1, 69]), scale: torch.Size([1, 69])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])

:(

@pablo-ta
Copy link
Author

I have re-runed the same code from my previous response but with CloseToOverflowReward on the environment, and it has been runing for 2 days with no issue.

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 23, 2023

In the experiment that crashes, can you tell me which reward you were using ? Because indeed if you got "nan" as a reward then afterwards you might get "nan" pretty much everywhere.

Thanks for investigating

@EloyAnguiano
Copy link

I had the same problem and doing debug I found that the theta node parameter was nan after converting the observation to graph in the simplest environment of all:

(Pdb) G.nodes
NodeView((0, 1, 2, 3, 4, 5, 6))
(Pdb) G.nodes[6]
{'p': -7.2, 'q': -5.2, 'v': 0.0, 'sub_id': 2, 'theta': nan, 'cooldown': 0}

I solved it by forcing the nans to be 0, but since it is an angle on the phases of the voltages I don't know if I am biasing the agent too much. does it make sense that that theta is nan or is it a bug?

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 23, 2023

Hello,

No it's not normal at all, can you fill up an issue (bug) for that ?

In reality theta (actually it's theta_or - theta_ex) is really closely linked with the active flow (p_or and p_ex) so it should not be 0.

But i'm not sure using "theta" in a neural network is a good idea, it might be but it might be terrible (even without the bug)

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 24, 2023

With your help, I finally managed to find the cause of the issue, which was caused by the theta vectors as pointed out above.

I will try to adress it as soon as I can and it will be part of next release

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 24, 2023

@pablo-ta can you try to install the development version:
pip install Grid2Op==1.8.2.dev0
And re run your experiment with pandapower backend ? Just to make sure everything is fixed and I did not missed another bug somewhere else :-)

@pablo-ta
Copy link
Author

In the experiment that crashes, can you tell me which reward you were using ? Because indeed if you got "nan" as a reward then afterwards you might get "nan" pretty much everywhere.

Thanks for investigating

i was using the default one ( i did not specify any reward to the environment)

@pablo-ta
Copy link
Author

@pablo-ta can you try to install the development version: pip install Grid2Op==1.8.2.dev0 And re run your experiment with pandapower backend ? Just to make sure everything is fixed and I did not missed another bug somewhere else :-)

Im on it. ill post my result in a day or two (to be sure that is working)

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 25, 2023

Thanks a lot :-) I run a similar code to yours all night and it did not crash neither with pandapower nor with lightsim2grid, but unfortunately my laptop is not really made for that... So it's best if you can :-)

Thanks !

@pablo-ta
Copy link
Author

It looks to be runing now. hasnst crash in a week (the simple code)
im gona run the complex one using that version and the ligthsimbackend

@BDonnot
Copy link
Collaborator

BDonnot commented Jan 31, 2023

Thanks a lot :-) And glad to hear it's finally working :-)

@BDonnot
Copy link
Collaborator

BDonnot commented Jun 6, 2023

I'm closing this issue as it appears to have been fixed

@BDonnot BDonnot closed this as completed Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants