Nan Values on Observation and Action #389

pablo-ta · 2022-12-19T11:02:50Z

Environment

Grid2op version: 1.8.0
System: Windows 10.
Python version: 3.9.12
Additional system information:
Installed libraries on python:

absl-py==1.3.0
astunparse==1.6.3
attr==0.3.2
cachetools==5.2.0
certifi==2022.9.24
charset-normalizer==2.1.1
CityLearn==1.4.0
click==8.1.3
cloudpickle==2.2.0
colorama==0.4.6
contourpy==1.0.6
cycler==0.11.0
deepdiff==6.2.1
docker-pycreds==0.4.0
flatbuffers==22.11.23
fonttools==4.38.0
gast==0.4.0
gitdb==4.0.10
GitPython==3.1.29
google-auth==2.14.1
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
Grid2Op==1.7.2
grpcio==1.51.1
gym==0.24.1
gym-notices==0.0.8
h5py==3.7.0
idna==3.4
imageio-ffmpeg==0.4.7
importlib-metadata==4.13.0
keras==2.11.0
kiwisolver==1.4.4
l2rpn-baselines==0.6.0.post1
libclang==14.0.6
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.6.2
networkx==2.8.8
numpy==1.21.6
oauthlib==3.2.2
opt-einsum==3.3.0
ordered-set==4.1.0
packaging==21.3
pandapower==2.10.1
pandas==1.3.5
pathtools==0.1.2
patsy==0.5.3
Pillow==9.2.0
promise==2.3
protobuf==3.19.6
psutil==5.9.4
pyasn1==0.4.8
pyasn1-modules==0.2.8
pygame==2.1.0
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2022.6
PyYAML==6.0
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
scipy==1.9.3
sentry-sdk==1.11.1
setproctitle==1.3.2
shortuuid==1.0.11
simplejson==3.17.6
six==1.16.0
smmap==5.0.0
stable-baselines3==1.6.2
statsmodels==0.13.5
tensorboard==2.11.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.11.0
tensorflow-estimator==2.11.0
tensorflow-intel==2.11.0
tensorflow-io-gcs-filesystem==0.28.0
termcolor==2.1.1
torch==1.13.0+cu116
tqdm==4.64.1
typing_extensions==4.4.0
urllib3==1.26.13
wandb==0.13.6
Werkzeug==2.2.2
wrapt==1.14.1
zipp==3.11.0

Bug description

I am trying to make a stable and dinamic SB3<->Grid2op conexion code. (that will work with citylearn too)

During training of the StableBaselines agent Nan values start to appear on the observation and action space untill all the action are Nan values and it crashes.
The crash time is random sometimes its 5 minutes other times is 5 hours, or anything in between

How to reproduce

Execute the code snippet

Code snippet

import grid2op as grid2op
from grid2op.gym_compat import GymEnv, BoxGymObsSpace, DiscreteActSpace
from l2rpn_baselines.utils import GymAgent
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy

import logging

log = logging.getLogger(__name__)
log.setLevel(logging.DEBUG)


from typing import Optional


class SB3Agent(GymAgent):

    def __init__(self,
                 g2op_action_space,
                 gym_act_space,
                 gym_obs_space,
                 nn_type,
                 nn_path=None,
                 nn_kwargs=None,
                 custom_load_dict=None,
                 gymenv=None,
                 iter_num=None,
                 ):
        self._nn_type = nn_type
        if custom_load_dict is not None:
            self.custom_load_dict = custom_load_dict
        else:
            self.custom_load_dict = {}
        self._iter_num: Optional[int] = iter_num
        super().__init__(g2op_action_space, gym_act_space, gym_obs_space,
                         nn_path=nn_path, nn_kwargs=nn_kwargs,
                         gymenv=gymenv
                         )

    def get_act(self, gym_obs, reward, done):
        action, _ = self.nn_model.predict(gym_obs, deterministic=False)
        return action

    def load(self):
        custom_objects = {"action_space": self._gym_act_space,
                          "observation_space": self._gym_obs_space}
        for key, val in self.custom_load_dict.items():
            custom_objects[key] = val
        path_load = self._nn_path
        if self._iter_num is not None:
            path_load = path_load + f"_{self._iter_num}_steps"
        log.debug(F"loading agent from [{path_load}]")
        self.nn_model = self._nn_type.load(path_load,
                                           custom_objects=custom_objects,
                                           env=self.gymenv)

    def build(self):
        self.nn_model = self._nn_type(**self._nn_kwargs)

    def learn(self,
          total_timesteps=1,
          save_path=None,
          **learn_kwargs):

        if learn_kwargs is None:
            learn_kwargs={}
        # train it
        self.nn_model.learn(total_timesteps=total_timesteps,
                             eval_env=self.gymenv,
                            **learn_kwargs
                             )

        # save it
        if save_path is not None:
            self.nn_model.save(save_path)


if __name__ == "__main__":


    log_format = {
        "fmt": "{asctime} | {levelname:7s} | {name:24s} | {lineno:<4n} | {message}",
        "style": "{",
        "datefmt": '%m-%d %H:%M'
    }
    logFormatter = logging.Formatter(**log_format)
    rootLogger = logging.getLogger()
    rootLogger.handlers = []

    log_config = {
        "level": logging.DEBUG,
    }
    consoleHandler = logging.StreamHandler()
    consoleHandler.setLevel(log_config.get("level"))
    consoleHandler.setFormatter(logFormatter)
    rootLogger.addHandler(consoleHandler)
    log.debug("Loger Configured")


    import torch as th

    th.autograd.set_detect_anomaly(True)

    env = grid2op.make("l2rpn_wcci_2022", difficulty="competition")

    gymenv = GymEnv(env)

    if gymenv.observation_space:
        gymenv.observation_space.close()
    gymenv.observation_space = BoxGymObsSpace (env.observation_space)

    if gymenv.action_space:
        gymenv.action_space.close()
    gymenv.action_space =DiscreteActSpace(env.action_space)

    nn_kwargs = {
                    "env": gymenv,
                    "verbose": True,
                    "policy": MlpPolicy,
                    "policy_kwargs":{
                    "net_arch": [25, 25, 25, 25]
                    }
                }

    agent = SB3Agent(
        env.action_space,
        gymenv.action_space,
        gymenv.observation_space,
        nn_type=PPO,
        nn_kwargs=nn_kwargs,

    )

    agent.learn(total_timesteps=1000000)

Current output

\Repositorios\Minimal_playground\venv\Scripts\python.exe \Repositorios\Minimal_playground\grid2op_sb3.py 
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\Backend\PandaPowerBackend.py:31: UserWarning: Numba cannot be loaded. You will gain possibly massive speed if installing it by 
	\Repositorios\Minimal_playground\venv\Scripts\python.exe -m pip install numba

  warnings.warn(
Warning: Gym version v0.24.1 has a number of critical issues with `gym.make` such that environment observation and action spaces are incorrectly evaluated, raising incorrect errors and warning . It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
\Repositorios\Minimal_playground\venv\lib\site-packages\torch\utils\tensorboard\__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if not hasattr(tensorboard, "__version__") or LooseVersion(
12-19 10:57 | DEBUG   | __main__                 | 97   | Loger Configured
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:171: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "set_storage" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:171: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "curtail" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:171: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "redispatch" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 25.2     |
|    ep_rew_mean     | 2.7e+03  |
| time/              |          |
|    fps             | 14       |
|    iterations      | 1        |
|    time_elapsed    | 143      |
|    total_timesteps | 2048     |
---------------------------------
Traceback (most recent call last):
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 134, in <module>
    agent.learn(total_timesteps=1000000)
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 68, in learn
    self.nn_model.learn(total_timesteps=total_timesteps,
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 172, in collect_rollouts
    actions, values, log_probs = self.policy(obs_tensor)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 590, in forward
    distribution = self._get_action_dist_from_latent(latent_pi)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 609, in _get_action_dist_from_latent
    return self.action_dist.proba_distribution(action_logits=mean_actions)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\distributions.py", line 275, in proba_distribution
    self.distribution = Categorical(logits=action_logits)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 147833)) of distribution Categorical(logits: torch.Size([1, 147833])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0')

Process finished with exit code 1

Expected output

The Nan values should not appear.

The text was updated successfully, but these errors were encountered:

pablo-ta · 2022-12-19T11:06:48Z

Another run output that took some more time to crash:


\Repositorios\Minimal_playground\venv\Scripts\python.exe \Repositorios\Minimal_playground\grid2op_sb3.py 
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\Backend\PandaPowerBackend.py:31: UserWarning: Numba cannot be loaded. You will gain possibly massive speed if installing it by 
	\Repositorios\Minimal_playground\venv\Scripts\python.exe -m pip install numba

  warnings.warn(
Warning: Gym version v0.24.1 has a number of critical issues with `gym.make` such that environment observation and action spaces are incorrectly evaluated, raising incorrect errors and warning . It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
\Repositorios\Minimal_playground\venv\lib\site-packages\torch\utils\tensorboard\__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if not hasattr(tensorboard, "__version__") or LooseVersion(
12-19 11:53 | DEBUG   | __main__                 | 97   | Loger Configured
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "set_storage" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "redispatch" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "curtail" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 28.4     |
|    ep_rew_mean     | 3.21e+03 |
| time/              |          |
|    fps             | 14       |
|    iterations      | 1        |
|    time_elapsed    | 140      |
|    total_timesteps | 2048     |
---------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 28.4          |
|    ep_rew_mean          | 3.31e+03      |
| time/                   |               |
|    fps                  | 14            |
|    iterations           | 2             |
|    time_elapsed         | 285           |
|    total_timesteps      | 4096          |
| train/                  |               |
|    approx_kl            | 2.1710235e-05 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -11.9         |
|    explained_variance   | -3.16e-05     |
|    learning_rate        | 0.0003        |
|    loss                 | 1.08e+06      |
|    n_updates            | 10            |
|    policy_gradient_loss | -0.00281      |
|    value_loss           | 2.18e+06      |
-------------------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 27.6         |
|    ep_rew_mean          | 3.16e+03     |
| time/                   |              |
|    fps                  | 14           |
|    iterations           | 3            |
|    time_elapsed         | 432          |
|    total_timesteps      | 6144         |
| train/                  |              |
|    approx_kl            | 3.085169e-05 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -11.9        |
|    explained_variance   | 0            |
|    learning_rate        | 0.0003       |
|    loss                 | 1.17e+06     |
|    n_updates            | 20           |
|    policy_gradient_loss | -0.00379     |
|    value_loss           | 2.34e+06     |
------------------------------------------
Traceback (most recent call last):
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 134, in <module>
    agent.learn(total_timesteps=1000000)
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 68, in learn
    self.nn_model.learn(total_timesteps=total_timesteps,
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 172, in collect_rollouts
    actions, values, log_probs = self.policy(obs_tensor)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 590, in forward
    distribution = self._get_action_dist_from_latent(latent_pi)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 609, in _get_action_dist_from_latent
    return self.action_dist.proba_distribution(action_logits=mean_actions)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\distributions.py", line 275, in proba_distribution
    self.distribution = Categorical(logits=action_logits)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 147833)) of distribution Categorical(logits: torch.Size([1, 147833])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0')

Process finished with exit code 1

BDonnot · 2022-12-19T12:28:51Z

Hello,

Thanks for reaching out.

Did you use the default pandapower backend or the faster one lightsim2grid?

I suspect it's an error there.

I'll have a look, probably first week of 2023 and see if I can reproduce this behavior. This does not look right.

In the mean time, you can replace the nans (for example by 0.) in the environment you are using when you pass the observation to the agent (you overload the GymEnv class and customize the "reset" and "step" method to replace the nans) like

class CustomGymEnv(GymEnv):

    def step(self, act):
        obs, reward, done, info = super().step(act) 
        # here put a code to remove the nans, of the observation 
        obs_modif =... 
        return obs_modif, reward, done, info
    def reset(self) :
        obs = super().reset()
        # same as above 
        obs_modif =... 
        return obs_modif

Thanks for spotting this bug

Benjamin

pablo-ta · 2022-12-22T10:16:34Z

Good morning, sorry for the late response, i think i am not using lightsim2grid indeed. ill try with that and replacing the nan's.
ill post here the result of the 3 experiments.

Merry christmas!

BDonnot · 2022-12-22T10:48:35Z

Thanks for the update.

It should be working with the default backend. So this is definitely something I'll have a look at.

If it's in pandapower backend, it's likely in the way grid2op handles the observation. I'll try to see were this comes from. It's probably an attribute that is not updated when "done=True" (and if I remember correctly, library should not be using anything when "done=True" but I guess some framework (stable baselines for example) still uses them which cause the issue...

Merry Christmas to you too 😊

pablo-ta · 2022-12-22T11:02:57Z

IF SB3 is doing that i whould need to have a word with them... xD
With the custom env:


import numpy as np
class CustomGymEnv(GymEnv):

    def step(self, act):
        obs, reward, done, info = super().step(act)
        # here put a code to remove the nans, of the observation
        obs_modif = obs[np.isnan(obs)] = 0
        return obs_modif, reward, done, info
    def reset(self) :
        obs = super().reset()
        # same as above
        obs_modif = obs[np.isnan(obs)] = 0
        return obs_modif

it has been executing for 40 minutes with no crash. ill add a log to see when the nans ocur, and leave the experiment for 5 days. If after 5 days it does not crash i think its safe to say its stable.

BDonnot · 2022-12-22T11:12:21Z

IF SB3 is doing that i whould need to have a word with them... xD

I think they do it but I'm not sure 😉 so better check before ^^

import numpy as np
class CustomGymEnv(GymEnv):

    def step(self, act):
        obs, reward, done, info = super().step(act)
        # here put a code to remove the nans, of the observation
        obs[np.isnan(obs)] = 0
        return obs, reward, done, info
    def reset(self) :
        obs = super().reset()
        # same as above
        obs[np.isnan(obs)] = 0
        return obs

I modified a bit the code, because I'm not sure with your fix that the obs_modif is the thing you want it to be (obs is modified in-place in your code, I'm not sure it returns something and if does, what it returns... Numpy shenanigans...

pablo-ta · 2022-12-22T11:59:25Z

Ok, with the following Gymenv and using the pandas backend:

import numpy as np
class CustomGymEnv(GymEnv):

    def step(self, act):
        obs, reward, done, info = super().step(act)
        # here put a code to remove the nans, of the observation
        if np.any(np.isnan(obs)):
            log.debug(F"Nans found during STEP: {obs}")
        obs[np.isnan(obs)] = 0
        return obs, reward, done, info
    def reset(self) :
        obs = super().reset()
        # same as above
        if np.any(np.isnan(obs)):
            log.debug(F"Nans found during RESET: {obs}")
        obs[np.isnan(obs)] = 0
        return obs

i get this output:


\Repositorios\Minimal_playground\venv\Scripts\python.exe \Repositorios\Minimal_playground\grid2op_sb3.py 
Console output is saving to: C:\Program Files\JetBrains\PyCharm 2022.2.2\jbr\bin\grid2op_sb3.log
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\Backend\PandaPowerBackend.py:31: UserWarning: Numba cannot be loaded. You will gain possibly massive speed if installing it by 
	\Repositorios\Minimal_playground\venv\Scripts\python.exe -m pip install numba

  warnings.warn(
12-22 12:49 | DEBUG   | __main__                 | 114  | Loger Configured
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "curtail" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "set_storage" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
\Repositorios\Minimal_playground\venv\lib\site-packages\grid2op\gym_compat\discrete_gym_actspace.py:231: UserWarning: The class "DiscreteActSpace" should mainly be used to consider only discrete actions (eg. set_line_status, set_bus or change_bus). Though it is possible to use "redispatch" when building it, be aware that this continuous action will be treated as discrete by splitting it into bins. Consider using the "BoxGymActSpace" for these attributes.
  warnings.warn(
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 26.3     |
|    ep_rew_mean     | 2.95e+03 |
| time/              |          |
|    fps             | 15       |
|    iterations      | 1        |
|    time_elapsed    | 129      |
|    total_timesteps | 2048     |
---------------------------------
-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 29.5          |
|    ep_rew_mean          | 3.4e+03       |
| time/                   |               |
|    fps                  | 15            |
|    iterations           | 2             |
|    time_elapsed         | 258           |
|    total_timesteps      | 4096          |
| train/                  |               |
|    approx_kl            | 2.3974513e-05 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -11.9         |
|    explained_variance   | -4.66e-05     |
|    learning_rate        | 0.0003        |
|    loss                 | 9.91e+05      |
|    n_updates            | 10            |
|    policy_gradient_loss | -0.0029       |
|    value_loss           | 2.15e+06      |
-------------------------------------------
Traceback (most recent call last):
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 151, in <module>
    agent.learn(total_timesteps=1000000)
  File "\Repositorios\Minimal_playground\grid2op_sb3.py", line 68, in learn
    self.nn_model.learn(total_timesteps=total_timesteps,
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 172, in collect_rollouts
    actions, values, log_probs = self.policy(obs_tensor)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 590, in forward
    distribution = self._get_action_dist_from_latent(latent_pi)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\policies.py", line 609, in _get_action_dist_from_latent
    return self.action_dist.proba_distribution(action_logits=mean_actions)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\distributions.py", line 275, in proba_distribution
    self.distribution = Categorical(logits=action_logits)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 147833)) of distribution Categorical(logits: torch.Size([1, 147833])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0')

Process finished with exit code 1

so im thinking the nans does not come from the gymenv becasue the log.debug functions does not get called...

BDonnot · 2022-12-22T12:22:24Z

Oh, something come to mind. Are you sure you use observation representated as vector?
Or are they represented by dictionary (default)?

You can consult the last notebook of the tutorial to help you convert observation space and retrieve vectors instead of dictionaries.

pablo-ta · 2022-12-22T13:29:53Z

Hi, i have changed the observation space of the gymenv with this:

gymenv.observation_space = BoxGymObsSpace (env.observation_space)
and with this line:
print(gymenv.observation_space.sample())i get this output:

[4.6165291e-02 3.3105618e-01 3.5263571e-01 ... 1.2340667e+00 4.8806679e-02
 6.2379614e+02]

is this what you mean?

BDonnot · 2022-12-22T16:01:37Z

Yes if you did that then the fix above you fix the Nan from grid2op side at least.

It might come from the converter maybe 🤔

I'll have a look when I can.

pablo-ta · 2022-12-23T09:48:07Z

There is no hurry, ill keep posting here my findings to have them logged or else ill forget them, but this can be solved next year with no problem :) thanks allot for all the help.
im executing now with the ligthsim backend with the customgymenv and withought. ill post here the results.

pablo-ta · 2023-01-03T09:21:27Z

After been executing for more than a week both examples of the lightsim backend failed:

------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 244          |
|    ep_rew_mean          | 3.6e+04      |
| time/                   |              |
|    fps                  | 91           |
|    iterations           | 17064        |
|    time_elapsed         | 383730       |
|    total_timesteps      | 34947072     |
| train/                  |              |
|    approx_kl            | 2.652334e-05 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.09        |
|    explained_variance   | -1.19e-07    |
|    learning_rate        | 0.0003       |
|    loss                 | 1.56e+06     |
|    n_updates            | 170630       |
|    policy_gradient_loss | -0.000148    |
|    value_loss           | 3.23e+06     |
------------------------------------------
Traceback (most recent call last):
  File "\Repositorios\Minimal_playground\grid2op_sb3_lightsim2grid.py", line 152, in <module>
    agent.learn(total_timesteps=100000000)
  File "\Repositorios\Minimal_playground\grid2op_sb3_lightsim2grid.py", line 69, in learn
    self.nn_model.learn(total_timesteps=total_timesteps,
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\ppo\ppo.py", line 317, in learn
    return super().learn(
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 262, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 210, in collect_rollouts
    rollout_buffer.add(self._last_obs, actions, rewards, self._last_episode_starts, values, log_probs)
  File "\Repositorios\Minimal_playground\venv\lib\site-packages\stable_baselines3\common\buffers.py", line 443, in add
    self.values[self.pos] = value.clone().cpu().numpy().flatten()
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

But i think this is totaly unrelated to Grid2Op.

the only diferent on the code is this:

env = grid2op.make("l2rpn_wcci_2022", difficulty="competition",backend=LightSimBackend())

So the porblem only happend with the normal Pandas backend. (i think)

pablo-ta · 2023-01-04T14:56:11Z

i am reruning the experiment with lightsimbackend but only one (with no custom gymenv) to see what happends

pablo-ta · 2023-01-10T10:45:58Z

It still runing 👍
total_timesteps | 62978048

BDonnot · 2023-01-10T10:49:19Z

Ok great to see that the problem is solved.

I'll try to check where it arises in pandapower.

Thanks

pablo-ta · 2023-01-11T10:30:12Z

Thanks to you for the sugestion. :)

pablo-ta · 2023-01-21T00:34:29Z

Im executing a more complex code but using the LightSimBackend and it crashes again:


venv\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1, 69)) of distribution Normal(loc: torch.Size([1, 69]), scale: torch.Size([1, 69])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])

:(

pablo-ta · 2023-01-23T00:39:36Z

I have re-runed the same code from my previous response but with CloseToOverflowReward on the environment, and it has been runing for 2 days with no issue.

BDonnot · 2023-01-23T07:47:51Z

In the experiment that crashes, can you tell me which reward you were using ? Because indeed if you got "nan" as a reward then afterwards you might get "nan" pretty much everywhere.

Thanks for investigating

EloyAnguiano · 2023-01-23T10:01:40Z

I had the same problem and doing debug I found that the theta node parameter was nan after converting the observation to graph in the simplest environment of all:

(Pdb) G.nodes
NodeView((0, 1, 2, 3, 4, 5, 6))
(Pdb) G.nodes[6]
{'p': -7.2, 'q': -5.2, 'v': 0.0, 'sub_id': 2, 'theta': nan, 'cooldown': 0}

I solved it by forcing the nans to be 0, but since it is an angle on the phases of the voltages I don't know if I am biasing the agent too much. does it make sense that that theta is nan or is it a bug?

BDonnot · 2023-01-23T11:07:36Z

Hello,

No it's not normal at all, can you fill up an issue (bug) for that ?

In reality theta (actually it's theta_or - theta_ex) is really closely linked with the active flow (p_or and p_ex) so it should not be 0.

But i'm not sure using "theta" in a neural network is a good idea, it might be but it might be terrible (even without the bug)

BDonnot · 2023-01-24T10:51:53Z

With your help, I finally managed to find the cause of the issue, which was caused by the theta vectors as pointed out above.

I will try to adress it as soon as I can and it will be part of next release

BDonnot · 2023-01-24T12:50:22Z

@pablo-ta can you try to install the development version:
pip install Grid2Op==1.8.2.dev0
And re run your experiment with pandapower backend ? Just to make sure everything is fixed and I did not missed another bug somewhere else :-)

pablo-ta · 2023-01-25T10:32:20Z

In the experiment that crashes, can you tell me which reward you were using ? Because indeed if you got "nan" as a reward then afterwards you might get "nan" pretty much everywhere.

Thanks for investigating

i was using the default one ( i did not specify any reward to the environment)

pablo-ta · 2023-01-25T10:33:06Z

@pablo-ta can you try to install the development version: pip install Grid2Op==1.8.2.dev0 And re run your experiment with pandapower backend ? Just to make sure everything is fixed and I did not missed another bug somewhere else :-)

Im on it. ill post my result in a day or two (to be sure that is working)

BDonnot · 2023-01-25T10:48:35Z

Thanks a lot :-) I run a similar code to yours all night and it did not crash neither with pandapower nor with lightsim2grid, but unfortunately my laptop is not really made for that... So it's best if you can :-)

Thanks !

pablo-ta · 2023-01-31T11:12:31Z

It looks to be runing now. hasnst crash in a week (the simple code)
im gona run the complex one using that version and the ligthsimbackend

BDonnot · 2023-01-31T12:46:50Z

Thanks a lot :-) And glad to hear it's finally working :-)

BDonnot · 2023-06-06T12:23:27Z

I'm closing this issue as it appears to have been fixed

pablo-ta added the bug Something isn't working label Dec 19, 2022

BDonnot referenced this issue in BDonnot/Grid2Op Jan 24, 2023

adressing rte-france#389 and part of rte-france#334

f2cbad3

BDonnot mentioned this issue Jun 6, 2023

Ready for version 1.9.0 #460

Merged

BDonnot closed this as completed Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nan Values on Observation and Action #389

Nan Values on Observation and Action #389

pablo-ta commented Dec 19, 2022

pablo-ta commented Dec 19, 2022

BDonnot commented Dec 19, 2022 •

edited

Loading

pablo-ta commented Dec 22, 2022 •

edited

Loading

BDonnot commented Dec 22, 2022

pablo-ta commented Dec 22, 2022

BDonnot commented Dec 22, 2022

pablo-ta commented Dec 22, 2022

BDonnot commented Dec 22, 2022

pablo-ta commented Dec 22, 2022

BDonnot commented Dec 22, 2022

pablo-ta commented Dec 23, 2022

pablo-ta commented Jan 3, 2023

pablo-ta commented Jan 4, 2023

pablo-ta commented Jan 10, 2023

BDonnot commented Jan 10, 2023

pablo-ta commented Jan 11, 2023

pablo-ta commented Jan 21, 2023

pablo-ta commented Jan 23, 2023

BDonnot commented Jan 23, 2023

EloyAnguiano commented Jan 23, 2023

BDonnot commented Jan 23, 2023

BDonnot commented Jan 24, 2023

BDonnot commented Jan 24, 2023

pablo-ta commented Jan 25, 2023

pablo-ta commented Jan 25, 2023

BDonnot commented Jan 25, 2023

pablo-ta commented Jan 31, 2023

BDonnot commented Jan 31, 2023

BDonnot commented Jun 6, 2023

Nan Values on Observation and Action #389

Nan Values on Observation and Action #389

Comments

pablo-ta commented Dec 19, 2022

Environment

Bug description

How to reproduce

Code snippet

Current output

Expected output

pablo-ta commented Dec 19, 2022

BDonnot commented Dec 19, 2022 • edited Loading

pablo-ta commented Dec 22, 2022 • edited Loading

BDonnot commented Dec 22, 2022

pablo-ta commented Dec 22, 2022

BDonnot commented Dec 22, 2022

pablo-ta commented Dec 22, 2022

BDonnot commented Dec 22, 2022

pablo-ta commented Dec 22, 2022

BDonnot commented Dec 22, 2022

pablo-ta commented Dec 23, 2022

pablo-ta commented Jan 3, 2023

pablo-ta commented Jan 4, 2023

pablo-ta commented Jan 10, 2023

BDonnot commented Jan 10, 2023

pablo-ta commented Jan 11, 2023

pablo-ta commented Jan 21, 2023

pablo-ta commented Jan 23, 2023

BDonnot commented Jan 23, 2023

EloyAnguiano commented Jan 23, 2023

BDonnot commented Jan 23, 2023

BDonnot commented Jan 24, 2023

BDonnot commented Jan 24, 2023

pablo-ta commented Jan 25, 2023

pablo-ta commented Jan 25, 2023

BDonnot commented Jan 25, 2023

pablo-ta commented Jan 31, 2023

BDonnot commented Jan 31, 2023

BDonnot commented Jun 6, 2023

BDonnot commented Dec 19, 2022 •

edited

Loading

pablo-ta commented Dec 22, 2022 •

edited

Loading