Merge pull request #153 from BDonnot/bd_dev

fix a bug when limiting the action automatically in the env
BDonnot · Apr 22, 2022 · 87f305d · 87f305d
2 parents d468241 + a07e87b
commit 87f305d
Show file tree

Hide file tree

Showing 19 changed files with 107 additions and 94 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -15,6 +15,16 @@ assignees: ''
 ## Bug description
 <!--A clear and concise description of what the bug is.-->
 
+
+<!--A good method to find and fix bugs is explained here https://adv-r.hadley.nz/debugging.html#debugging-strategy 
+(it's written for R, but this section is generic for most computer languages)
+-->
+<!--We cannot do steps 1, 2 and 3 for you, the closer you get to a concise piece of code highlighting the bug
+the less time we'll spenf understanding it, and fixing it. And the more robust will be the fix as we'll most likely
+write unit test to make sure the bug does not reappear in the future. This is why we insist on having 
+"A clear and concise description of what the bug is"-->
+
+
 ## How to reproduce
 <!--Explain in detail how to reproduce your issue. The easier it will be for us to
 reproduce it, the faster we will be able to work on this.-->

diff --git a/getting_started/04_TrainingAnAgent.ipynb b/getting_started/04_TrainingAnAgent.ipynb
@@ -12,7 +12,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "It is recommended to have a look at the [00_basic_functionalities](00_basic_functionalities.ipynb), [02_Observation](02_Observation.ipynb) and [03_Action](03_Action.ipynb) notebooks before getting into this one."
+    "It is recommended to have a look at the [00_SmallExample](00_SmallExample.ipynb), [02_Observation](02_Observation.ipynb) and [03_Action](03_Action.ipynb) notebooks before getting into this one."
    ]
   },
   {
@@ -22,16 +22,17 @@
     "**Objectives**\n",
     "\n",
     "In this notebook we will expose :\n",
-    "* how to use the \"converters\": these allow to link several different representations of the actions (for example as `Action` objects or integers).\n",
+    "* how to  make grid2op compatible with *gym* RL framework (short introduction to *gym_compat* module)\n",
+    "* how to transform grid2op actions / observations with gym \"spaces\" (https://gym.openai.com/docs/#spaces)\n",
     "* how to train a (naive) Agent using reinforcement learning.\n",
-    "* how to inspect (rapidly) the action taken by the Agent.\n",
+    "* how to inspect (rapidly) the actions taken by the Agent.\n",
     "\n",
-    "**NB** In this tutorial, we train an Agent inspired from this blog post: [deep-reinforcement-learning-tutorial-with-open-ai-gym](https://towardsdatascience.com/deep-reinforcement-learning-tutorial-with-open-ai-gym-c0de4471f368). Many other different reinforcement learning tutorials exist. The code presented in this notebook only aims at demonstrating how to use the Grid2Op functionalities to train a Deep Reinforcement learning Agent and inspect its behaviour, but not at building a very smart agent. Nothing about the performance, training strategy, type of Agent, meta parameters, etc, should be retained as a common practice.\n",
+    "**NB** In this tutorial, we will use the \n",
     "\n",
     "<font size=\"3\" color=\"red\">This notebook do not cover the use of existing RL frameworks. Please consult the [11_IntegrationWithExistingRLFrameworks](11_IntegrationWithExistingRLFrameworks.ipynb) for such information! </font>\n",
     "\n",
     "\n",
-    "**Don't hesitate to check the grid2op module grid2op.gym_compat for a closer integration between grid2op and openAI gym.** This topic is not covered in this notebook.\n",
+    "**Don't hesitate to check the grid2op module grid2op.gym_compat for a closer integration between grid2op and openAI gym. This module is documented at https://grid2op.readthedocs.io/en/latest/gym.html** \n",
     "\n"
    ]
   },
@@ -44,7 +45,7 @@
     "\n",
     "Cell will look like:\n",
     "```python\n",
-    "!pip install grid2op[optional]  # for use with google colab (grid2Op is not installed by default)\n",
+    "!pip install grid2op[optional]  # for use with google colab (grid2op is not installed by default)\n",
     "```\n",
     "<img src=\"https://colab.research.google.com/assets/colab-badge.svg\" width=\"200\">"
    ]
@@ -60,7 +61,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -71,9 +72,19 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Impossible to automatically add a menu / table of content to this notebook.\n",
+      "You can download \"jyquickhelper\" package with: \n",
+      "\"pip install jyquickhelper\"\n"
+     ]
+    }
+   ],
    "source": [
     "res = None\n",
     "try:\n",
@@ -90,10 +101,12 @@
    "source": [
     "## 0) Good practice\n",
     "\n",
-    "As in other machine learning tasks, we highly recommend, before even trying to train an agent, to split the \"chronics\" (ie the episode data) into 3 datasets:\n",
+    "### A. defining a training, validation and test sets\n",
+    "\n",
+    "As in other machine learning tasks, we highly recommend, before even trying to train an agent, to split the the \"episode data\" (*eg* what are the loads / generations for each load / generator) into 3 datasets:\n",
     "- \"train\" use to train the agent\n",
     "- \"val\" use to validate the hyper parameters\n",
-    "- \"test\" at which you would look only once to report the agent performance in a scientific paper (for example)\n",
+    "- \"test\" at which you would look **only once** to report the agent performance in a scientific paper (for example)\n",
     "\n",
     "Grid2op lets you do that with relative ease:\n",
     "\n",
@@ -125,14 +138,40 @@
     "env = grid2op.make(env_name+\"_train\")\n",
     "```\n",
     "\n",
-    "Be carefull, on windows you might run into issues. Don't hesitate to have a look at the documentation of this funciton if this the case (see https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split and https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split_random)"
+    "Be carefull, on windows you might run into issues. Don't hesitate to have a look at the documentation of this funciton if this the case (see https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split and https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split_random)\n",
+    "\n",
+    "### B. Not spending all of your time loading data...\n",
+    "\n",
+    "In most grid2op environment, the \"data\" are loaded from the hard drive.\n",
+    "\n",
+    "From experience, what happens (especially at the beginning of training) is that your agent survives a few steps (so taking a few milliseconds) before a game over. At this stage you will call `env.reset()` which will load the data of the next scenario.\n",
+    "\n",
+    "This is the default behaviour and it is far from \"optimal\" (more time is spent loading data than performing actual useful computation). To that end, we encourage you:\n",
+    "- to use a \"caching\" mechanism, for example with `MultifolderWithCache` class\n",
+    "- to read the data by small \"chunk\" (`env.chronics_handler.set_chunk_size(...)`). \n",
+    "\n",
+    "More information is provided in https://grid2op.readthedocs.io/en/latest/environment.html#optimize-the-data-pipeline\n",
+    "\n",
+    "### C. Use a fast simulator\n",
+    "\n",
+    "Grid2op will use a \"backend\" to compute the powerflows and be able to return the next observation (after `env.step(...)`). These \"backends\" can be faster. For example, we strongly encourage you to use the \"lightsim2grid\" backend.\n",
+    "\n",
+    "You can install it with `pip install lightsim2grid`\n",
+    "\n",
+    "And use it with:\n",
+    "```python\n",
+    "import grid2op\n",
+    "from lightsim2grid import LightSimBackend\n",
+    "env_name = \"l2rpn_case14_sandbox\"\n",
+    "env = grid2op.make(env_name+\"_train\", backend=LightSimBackend(), ...)\n",
+    "```"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## I) Manipulating action representation"
+    "## I) Action representation"
    ]
   },
   {
@@ -143,11 +182,18 @@
     "\n",
     "The downside of this approach is that machine learning methods, especially in deep learning, often prefer to deal with vectors rather than with \"complex\" objects. Indeed, as we covered in the previous tutorials on the platform, we saw that building our own actions can be tedious and can sometime require important knowledge of the powergrid.\n",
     "\n",
-    "On the contrary, in most of the standard Reinforcement Learning environments, actions have a higher representation. For example in pacman, there are 4 different types of actions: turn left, turn right, go up and do down. This allows for easy sampling (if you need to achieve an uniform sampling, you simply need to randomly pick a number between 0 and 3 included) and an easy representation: each action can be represented as a different component of a vector of dimension 4 [because there are 4 actions]. \n",
+    "On the contrary, in most of the standard Reinforcement Learning environments, actions have a higher level representation. For example in pacman, there are 4 different types of actions: \"turn left\", \"turn right\", \"go up\" and \"go down\". This allows for easy sampling (if you need to achieve an uniform sampling, you simply need to randomly pick a number between 0 and 3 included) and an easy representation: each action can be represented as a different component of a vector of dimension 4 [because there are 4 actions]. \n",
+    "\n",
+    "On the other hand, this representation is not \"human friendly\". It is quite convenient in the case of pacman because the action space is rather small, making it possible to remember which action corresponds to which component, but in the case of the grid2op package, there are hundreds or even thousands of actions. We suppose that we do not really care about this here, as tutorials on Reinforcement Learning with discrete action space often assume that actions are labeled with integers (such as in pacman for example).\n",
+    "\n",
+    "Converting grid2op actions into \"machine readable\" ones is the major difficulty as there is no unique ways to do so. In grid2op we offer some pre defined \"functions\" to do so:\n",
     "\n",
-    "On the other hand, this representation is not \"human friendly\". It is quite convenient in the case of pacman because the action space is rather small, making it possible to remember which action corresponds to which component, but in the case of the grid2op package, there are hundreds or even thousands of actions, making it impossible to remember which component corresponds to which action. We suppose that we do not really care about this here, as tutorials on Reinforcement Learning with discrete action space often assume that actions are labeled with integers (such as in pacman for example).\n",
+    "- `BoxGymObsSpace` will convert the action space into a gym \"Box\". It is rather straightforward, especially for **continuous** type of actions (such as *redispatching*, *curtailment* or actions on *storage units*). Representing the discrete actions (on powerlines and on substation) is not an easy task with them. We would not recommend to use them if your focus is on topology. More information on https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.BoxGymActSpace\n",
+    "- `MultiDiscreteActSpace` is similar to `BoxGymObsSpace` but mainly focused on the **discrete** actions (*lines status* and *substation reconfiguration*). Actions are represented with a gym \"MultiDiscrete\" space. It allows to perform any number of actions you want (which might be illegal) but comes with little restrictions. It handles continuous actions through \"binning\" (which is not ideal but doable). We recommend using this transformation if the algorithm you want to use is able to deal with \"MultiDiscrete\" gym action type. More information is given at https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.MultiDiscreteActSpace\n",
+    "- `DiscreteActSpace` is similar to `MultiDiscreteActSpace` in the sense that it focuses on **discrete** actions. It comes with a main restriction though: you can only do one action. For example, you cannot \"modify a substation\" AND \"disconnect a powerline\" with the same action. More information is provided at https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.DiscreteActSpace. We recommend to use it if you want to focus on **discrete** actions and the algorithm you want to use is not able to deal with `MultiDiscreteActSpace`.\n",
+    "- You can also fully customize the way you \"represent\" the action. More information is given in the notebook [11_IntegrationWithExistingRLFrameworks](11_IntegrationWithExistingRLFrameworks.ipynb)\n",
     "\n",
-    "However, to allow RL agent to train more easily, we allow to make some \"[Converters](https://grid2op.readthedocs.io/en/latest/converters.html)\" whose roles are to allow an agent to deal with a custom representation of the action space. The class [AgentWithConverter](https://grid2op.readthedocs.io/en/latest/agent.html#grid2op.Agent.AgentWithConverter) is perfect for such usage."
+    "In the next section we will show an agent working with `DiscreteActSpace`. The code showed can be easily adapted with the other type of actions."
    ]
   },
   {

diff --git a/grid2op/Environment/BaseEnv.py b/grid2op/Environment/BaseEnv.py
@@ -2332,7 +2332,17 @@ def _aux_readjust_storage_after_limiting(self, total_storage):
             # cause a problem right now)
             new_act_storage = 1.0 * self._storage_power_prev
             sum_this_step = new_act_storage.sum()
-            modif_storage = new_act_storage * total_storage / sum_this_step
+            if abs(sum_this_step) > 1e-1:
+                modif_storage = new_act_storage * total_storage / sum_this_step
+            else:
+                # TODO: this is not cover by any test :-(
+                # it happens when you do an action too strong, then a do nothing,
+                # then you decrease the limit to rapidly 
+                # (game over would jappen after at least one do nothing)
+
+                # In this case I reset it completely or do I ? I don't really
+                # know what to do !
+                modif_storage = new_act_storage  # or self._storage_power ???
 
         # handle self._storage_power and self._storage_current_charge
         coeff_p_to_E = (
@@ -2732,7 +2742,6 @@ def step(self, action: BaseAction) -> Tuple[BaseObservation, float, bool, dict]:
         Actually, it will be in a "game over" state (see :class:`grid2op.Observation.BaseObservation.set_game_over`).
 
         """
-
         if self.__closed:
             raise EnvError("This environment is closed. You cannot use it anymore.")
 

diff --git a/grid2op/gym_compat/continuous_to_discrete.py b/grid2op/gym_compat/continuous_to_discrete.py
@@ -43,7 +43,7 @@ class ContinuousToDiscreteConverter(BaseGymAttrConverter):
     - 1 encodes all numbers in  [-6, -2)
     - 2 encode all numbers in [-2, 2)
     - 3 encode all numbers in [2, 6)
-    - 3 encode all numbers in [6, 10]
+    - 4 encode all numbers in [6, 10]
 
     And reciprocally, this action with :
 

diff --git a/grid2op/gym_compat/multidiscrete_gym_actspace.py b/grid2op/gym_compat/multidiscrete_gym_actspace.py
@@ -39,15 +39,14 @@ class MultiDiscreteActSpace(MultiDiscrete):
       or "CONNECT TO BUSBAR 2" and affecting to which busbar an object is connected
     - "change_bus": `dim_topo` dimensions, each containing 2 choices: "CHANGE", "DONT CHANGE" and affect
       to which busbar an element is connected
-    - "redispatch": `n_gen` dimensions, each containing a certain number of choices depending on the value
-      of the keyword argument `nb_bins["redispatch"]` (by default 7) and will be 1 for non dispatchable generator
-    - "curtail": `n_gen` dimensions, each containing a certain number of choices depending on the value
-      of the keyword argument `nb_bins["curtail"]` (by default 7) and will be 1 for non renewable generator. This is
+    - "redispatch": `sum(env.gen_redispatchable)` dimensions, each containing a certain number of choices depending on the value
+      of the keyword argument `nb_bins["redispatch"]` (by default 7).
+    - "curtail": `sum(env.gen_renewable)` dimensions, each containing a certain number of choices depending on the value
+      of the keyword argument `nb_bins["curtail"]` (by default 7). This is
       the "conversion to discrete action"
       of the curtailment action.
-    - "curtail_mw": completely equivalent to "curtail" for this representation. This is the "conversion to
-      discrete action"
-      of the curtailment action.
+    - "curtail_mw": `sum(env.gen_renewable)` dimensions, completely equivalent to "curtail" for this representation. 
+      This is the "conversion to discrete action" of the curtailment action.
     - "set_storage": `n_storage` dimensions, each containing a certain number of choices depending on the value
       of the keyword argument `nb_bins["set_storage"]` (by default 7). This is the "conversion to discrete action"
       of the action on storage units.
@@ -72,16 +71,16 @@ class MultiDiscreteActSpace(MultiDiscrete):
         "line_change_status", "one_sub_change" or "change_bus".
 
         Combining a "set" and "change" on the same element will most likely lead to an "ambiguous action". Indeed
-        what grid2op can do if you "tell element A to go to bus 1" and "tell element A2 to go to bus 2 if it was
-        to 1 and to move to bus 1 if it was on bus 2". It's not clear at all.
+        what grid2op can do if you "tell element A to go to bus 1" and "tell the same element A to switch to bus 2 if it was
+        to 1 and to move to bus 1 if it was on bus 2". It's not clear at all (hence the "ambiguous").
 
         No error will be thrown if you mix this, this is your absolute right, be aware it might not
         lead to the result you expect though.
 
-    .. warning::
+    .. note::
 
-        The arguments "set_bus", "sub_set_bus" and "one_sub_set" will all perform "set_bus" action. The only
-        difference if "how you represent this action":
+        The arguments "set_bus", "sub_set_bus" and "one_sub_set" will all perform "set_bus" actions. The only
+        difference if "how you represent these actions":
 
         - In "set_bus" each component represent a single element of the grid. When you sample an action
           with this keyword you will possibly change all the elements of the grid at once (this is likely to

diff --git a/grid2op/tests/test_GridObjects.py b/grid2op/tests/test_GridObjects.py
@@ -17,8 +17,6 @@
 from grid2op.Backend.EducPandaPowerBackend import EducPandaPowerBackend
 from grid2op.Exceptions import EnvError
 
-import pdb
-
 
 class TestAuxFunctions(unittest.TestCase):
     def setUp(self) -> None:

diff --git a/grid2op/tests/test_MakeEnv.py b/grid2op/tests/test_MakeEnv.py
@@ -7,10 +7,8 @@
 # This file is part of Grid2Op, Grid2Op a testbed platform to model sequential decision making in power systems.
 
 import os
-import sys
 import unittest
 import warnings
-import time
 import numpy as np
 import pdb
 

diff --git a/grid2op/tests/test_MultiProcess.py b/grid2op/tests/test_MultiProcess.py
@@ -6,7 +6,6 @@
 # SPDX-License-Identifier: MPL-2.0
 # This file is part of Grid2Op, Grid2Op a testbed platform to model sequential decision making in power systems.
 
-import pdb
 import warnings
 from grid2op.tests.helper_path_test import *
 

diff --git a/grid2op/tests/test_ObsPlusAct.py b/grid2op/tests/test_ObsPlusAct.py
@@ -5,8 +5,7 @@
 # you can obtain one at http://mozilla.org/MPL/2.0/.
 # SPDX-License-Identifier: MPL-2.0
 # This file is part of Grid2Op, Grid2Op a testbed platform to model sequential decision making in power systems.
-import copy
-import re
+
 import warnings
 from grid2op.tests.helper_path_test import *
 
@@ -16,9 +15,7 @@
 from grid2op.Exceptions import *
 from grid2op.Action import *
 from grid2op.Parameters import Parameters
-from grid2op.Rules import RulesChecker, AlwaysLegal
-from grid2op.Space.space_utils import save_to_dict
-from grid2op.tests.test_Action import _get_action_grid_class
+from grid2op.Rules import  AlwaysLegal
 
 import pdb