Skip to content

Commit

Permalink
Merge pull request #153 from BDonnot/bd_dev
Browse files Browse the repository at this point in the history
fix a bug when limiting the action automatically in the env
  • Loading branch information
BDonnot authored Apr 22, 2022
2 parents d468241 + a07e87b commit 87f305d
Show file tree
Hide file tree
Showing 19 changed files with 107 additions and 94 deletions.
10 changes: 10 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,16 @@ assignees: ''
## Bug description
<!--A clear and concise description of what the bug is.-->


<!--A good method to find and fix bugs is explained here https://adv-r.hadley.nz/debugging.html#debugging-strategy
(it's written for R, but this section is generic for most computer languages)
-->
<!--We cannot do steps 1, 2 and 3 for you, the closer you get to a concise piece of code highlighting the bug
the less time we'll spenf understanding it, and fixing it. And the more robust will be the fix as we'll most likely
write unit test to make sure the bug does not reappear in the future. This is why we insist on having
"A clear and concise description of what the bug is"-->


## How to reproduce
<!--Explain in detail how to reproduce your issue. The easier it will be for us to
reproduce it, the faster we will be able to work on this.-->
Expand Down
78 changes: 62 additions & 16 deletions getting_started/04_TrainingAnAgent.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"It is recommended to have a look at the [00_basic_functionalities](00_basic_functionalities.ipynb), [02_Observation](02_Observation.ipynb) and [03_Action](03_Action.ipynb) notebooks before getting into this one."
"It is recommended to have a look at the [00_SmallExample](00_SmallExample.ipynb), [02_Observation](02_Observation.ipynb) and [03_Action](03_Action.ipynb) notebooks before getting into this one."
]
},
{
Expand All @@ -22,16 +22,17 @@
"**Objectives**\n",
"\n",
"In this notebook we will expose :\n",
"* how to use the \"converters\": these allow to link several different representations of the actions (for example as `Action` objects or integers).\n",
"* how to make grid2op compatible with *gym* RL framework (short introduction to *gym_compat* module)\n",
"* how to transform grid2op actions / observations with gym \"spaces\" (https://gym.openai.com/docs/#spaces)\n",
"* how to train a (naive) Agent using reinforcement learning.\n",
"* how to inspect (rapidly) the action taken by the Agent.\n",
"* how to inspect (rapidly) the actions taken by the Agent.\n",
"\n",
"**NB** In this tutorial, we train an Agent inspired from this blog post: [deep-reinforcement-learning-tutorial-with-open-ai-gym](https://towardsdatascience.com/deep-reinforcement-learning-tutorial-with-open-ai-gym-c0de4471f368). Many other different reinforcement learning tutorials exist. The code presented in this notebook only aims at demonstrating how to use the Grid2Op functionalities to train a Deep Reinforcement learning Agent and inspect its behaviour, but not at building a very smart agent. Nothing about the performance, training strategy, type of Agent, meta parameters, etc, should be retained as a common practice.\n",
"**NB** In this tutorial, we will use the \n",
"\n",
"<font size=\"3\" color=\"red\">This notebook do not cover the use of existing RL frameworks. Please consult the [11_IntegrationWithExistingRLFrameworks](11_IntegrationWithExistingRLFrameworks.ipynb) for such information! </font>\n",
"\n",
"\n",
"**Don't hesitate to check the grid2op module grid2op.gym_compat for a closer integration between grid2op and openAI gym.** This topic is not covered in this notebook.\n",
"**Don't hesitate to check the grid2op module grid2op.gym_compat for a closer integration between grid2op and openAI gym. This module is documented at https://grid2op.readthedocs.io/en/latest/gym.html** \n",
"\n"
]
},
Expand All @@ -44,7 +45,7 @@
"\n",
"Cell will look like:\n",
"```python\n",
"!pip install grid2op[optional] # for use with google colab (grid2Op is not installed by default)\n",
"!pip install grid2op[optional] # for use with google colab (grid2op is not installed by default)\n",
"```\n",
"<img src=\"https://colab.research.google.com/assets/colab-badge.svg\" width=\"200\">"
]
Expand All @@ -60,7 +61,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -71,9 +72,19 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Impossible to automatically add a menu / table of content to this notebook.\n",
"You can download \"jyquickhelper\" package with: \n",
"\"pip install jyquickhelper\"\n"
]
}
],
"source": [
"res = None\n",
"try:\n",
Expand All @@ -90,10 +101,12 @@
"source": [
"## 0) Good practice\n",
"\n",
"As in other machine learning tasks, we highly recommend, before even trying to train an agent, to split the \"chronics\" (ie the episode data) into 3 datasets:\n",
"### A. defining a training, validation and test sets\n",
"\n",
"As in other machine learning tasks, we highly recommend, before even trying to train an agent, to split the the \"episode data\" (*eg* what are the loads / generations for each load / generator) into 3 datasets:\n",
"- \"train\" use to train the agent\n",
"- \"val\" use to validate the hyper parameters\n",
"- \"test\" at which you would look only once to report the agent performance in a scientific paper (for example)\n",
"- \"test\" at which you would look **only once** to report the agent performance in a scientific paper (for example)\n",
"\n",
"Grid2op lets you do that with relative ease:\n",
"\n",
Expand Down Expand Up @@ -125,14 +138,40 @@
"env = grid2op.make(env_name+\"_train\")\n",
"```\n",
"\n",
"Be carefull, on windows you might run into issues. Don't hesitate to have a look at the documentation of this funciton if this the case (see https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split and https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split_random)"
"Be carefull, on windows you might run into issues. Don't hesitate to have a look at the documentation of this funciton if this the case (see https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split and https://grid2op.readthedocs.io/en/latest/environment.html#grid2op.Environment.Environment.train_val_split_random)\n",
"\n",
"### B. Not spending all of your time loading data...\n",
"\n",
"In most grid2op environment, the \"data\" are loaded from the hard drive.\n",
"\n",
"From experience, what happens (especially at the beginning of training) is that your agent survives a few steps (so taking a few milliseconds) before a game over. At this stage you will call `env.reset()` which will load the data of the next scenario.\n",
"\n",
"This is the default behaviour and it is far from \"optimal\" (more time is spent loading data than performing actual useful computation). To that end, we encourage you:\n",
"- to use a \"caching\" mechanism, for example with `MultifolderWithCache` class\n",
"- to read the data by small \"chunk\" (`env.chronics_handler.set_chunk_size(...)`). \n",
"\n",
"More information is provided in https://grid2op.readthedocs.io/en/latest/environment.html#optimize-the-data-pipeline\n",
"\n",
"### C. Use a fast simulator\n",
"\n",
"Grid2op will use a \"backend\" to compute the powerflows and be able to return the next observation (after `env.step(...)`). These \"backends\" can be faster. For example, we strongly encourage you to use the \"lightsim2grid\" backend.\n",
"\n",
"You can install it with `pip install lightsim2grid`\n",
"\n",
"And use it with:\n",
"```python\n",
"import grid2op\n",
"from lightsim2grid import LightSimBackend\n",
"env_name = \"l2rpn_case14_sandbox\"\n",
"env = grid2op.make(env_name+\"_train\", backend=LightSimBackend(), ...)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## I) Manipulating action representation"
"## I) Action representation"
]
},
{
Expand All @@ -143,11 +182,18 @@
"\n",
"The downside of this approach is that machine learning methods, especially in deep learning, often prefer to deal with vectors rather than with \"complex\" objects. Indeed, as we covered in the previous tutorials on the platform, we saw that building our own actions can be tedious and can sometime require important knowledge of the powergrid.\n",
"\n",
"On the contrary, in most of the standard Reinforcement Learning environments, actions have a higher representation. For example in pacman, there are 4 different types of actions: turn left, turn right, go up and do down. This allows for easy sampling (if you need to achieve an uniform sampling, you simply need to randomly pick a number between 0 and 3 included) and an easy representation: each action can be represented as a different component of a vector of dimension 4 [because there are 4 actions]. \n",
"On the contrary, in most of the standard Reinforcement Learning environments, actions have a higher level representation. For example in pacman, there are 4 different types of actions: \"turn left\", \"turn right\", \"go up\" and \"go down\". This allows for easy sampling (if you need to achieve an uniform sampling, you simply need to randomly pick a number between 0 and 3 included) and an easy representation: each action can be represented as a different component of a vector of dimension 4 [because there are 4 actions]. \n",
"\n",
"On the other hand, this representation is not \"human friendly\". It is quite convenient in the case of pacman because the action space is rather small, making it possible to remember which action corresponds to which component, but in the case of the grid2op package, there are hundreds or even thousands of actions. We suppose that we do not really care about this here, as tutorials on Reinforcement Learning with discrete action space often assume that actions are labeled with integers (such as in pacman for example).\n",
"\n",
"Converting grid2op actions into \"machine readable\" ones is the major difficulty as there is no unique ways to do so. In grid2op we offer some pre defined \"functions\" to do so:\n",
"\n",
"On the other hand, this representation is not \"human friendly\". It is quite convenient in the case of pacman because the action space is rather small, making it possible to remember which action corresponds to which component, but in the case of the grid2op package, there are hundreds or even thousands of actions, making it impossible to remember which component corresponds to which action. We suppose that we do not really care about this here, as tutorials on Reinforcement Learning with discrete action space often assume that actions are labeled with integers (such as in pacman for example).\n",
"- `BoxGymObsSpace` will convert the action space into a gym \"Box\". It is rather straightforward, especially for **continuous** type of actions (such as *redispatching*, *curtailment* or actions on *storage units*). Representing the discrete actions (on powerlines and on substation) is not an easy task with them. We would not recommend to use them if your focus is on topology. More information on https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.BoxGymActSpace\n",
"- `MultiDiscreteActSpace` is similar to `BoxGymObsSpace` but mainly focused on the **discrete** actions (*lines status* and *substation reconfiguration*). Actions are represented with a gym \"MultiDiscrete\" space. It allows to perform any number of actions you want (which might be illegal) but comes with little restrictions. It handles continuous actions through \"binning\" (which is not ideal but doable). We recommend using this transformation if the algorithm you want to use is able to deal with \"MultiDiscrete\" gym action type. More information is given at https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.MultiDiscreteActSpace\n",
"- `DiscreteActSpace` is similar to `MultiDiscreteActSpace` in the sense that it focuses on **discrete** actions. It comes with a main restriction though: you can only do one action. For example, you cannot \"modify a substation\" AND \"disconnect a powerline\" with the same action. More information is provided at https://grid2op.readthedocs.io/en/latest/gym.html#grid2op.gym_compat.DiscreteActSpace. We recommend to use it if you want to focus on **discrete** actions and the algorithm you want to use is not able to deal with `MultiDiscreteActSpace`.\n",
"- You can also fully customize the way you \"represent\" the action. More information is given in the notebook [11_IntegrationWithExistingRLFrameworks](11_IntegrationWithExistingRLFrameworks.ipynb)\n",
"\n",
"However, to allow RL agent to train more easily, we allow to make some \"[Converters](https://grid2op.readthedocs.io/en/latest/converters.html)\" whose roles are to allow an agent to deal with a custom representation of the action space. The class [AgentWithConverter](https://grid2op.readthedocs.io/en/latest/agent.html#grid2op.Agent.AgentWithConverter) is perfect for such usage."
"In the next section we will show an agent working with `DiscreteActSpace`. The code showed can be easily adapted with the other type of actions."
]
},
{
Expand Down
13 changes: 11 additions & 2 deletions grid2op/Environment/BaseEnv.py
Original file line number Diff line number Diff line change
Expand Up @@ -2332,7 +2332,17 @@ def _aux_readjust_storage_after_limiting(self, total_storage):
# cause a problem right now)
new_act_storage = 1.0 * self._storage_power_prev
sum_this_step = new_act_storage.sum()
modif_storage = new_act_storage * total_storage / sum_this_step
if abs(sum_this_step) > 1e-1:
modif_storage = new_act_storage * total_storage / sum_this_step
else:
# TODO: this is not cover by any test :-(
# it happens when you do an action too strong, then a do nothing,
# then you decrease the limit to rapidly
# (game over would jappen after at least one do nothing)

# In this case I reset it completely or do I ? I don't really
# know what to do !
modif_storage = new_act_storage # or self._storage_power ???

# handle self._storage_power and self._storage_current_charge
coeff_p_to_E = (
Expand Down Expand Up @@ -2732,7 +2742,6 @@ def step(self, action: BaseAction) -> Tuple[BaseObservation, float, bool, dict]:
Actually, it will be in a "game over" state (see :class:`grid2op.Observation.BaseObservation.set_game_over`).
"""

if self.__closed:
raise EnvError("This environment is closed. You cannot use it anymore.")

Expand Down
2 changes: 1 addition & 1 deletion grid2op/gym_compat/continuous_to_discrete.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class ContinuousToDiscreteConverter(BaseGymAttrConverter):
- 1 encodes all numbers in [-6, -2)
- 2 encode all numbers in [-2, 2)
- 3 encode all numbers in [2, 6)
- 3 encode all numbers in [6, 10]
- 4 encode all numbers in [6, 10]
And reciprocally, this action with :
Expand Down
23 changes: 11 additions & 12 deletions grid2op/gym_compat/multidiscrete_gym_actspace.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,14 @@ class MultiDiscreteActSpace(MultiDiscrete):
or "CONNECT TO BUSBAR 2" and affecting to which busbar an object is connected
- "change_bus": `dim_topo` dimensions, each containing 2 choices: "CHANGE", "DONT CHANGE" and affect
to which busbar an element is connected
- "redispatch": `n_gen` dimensions, each containing a certain number of choices depending on the value
of the keyword argument `nb_bins["redispatch"]` (by default 7) and will be 1 for non dispatchable generator
- "curtail": `n_gen` dimensions, each containing a certain number of choices depending on the value
of the keyword argument `nb_bins["curtail"]` (by default 7) and will be 1 for non renewable generator. This is
- "redispatch": `sum(env.gen_redispatchable)` dimensions, each containing a certain number of choices depending on the value
of the keyword argument `nb_bins["redispatch"]` (by default 7).
- "curtail": `sum(env.gen_renewable)` dimensions, each containing a certain number of choices depending on the value
of the keyword argument `nb_bins["curtail"]` (by default 7). This is
the "conversion to discrete action"
of the curtailment action.
- "curtail_mw": completely equivalent to "curtail" for this representation. This is the "conversion to
discrete action"
of the curtailment action.
- "curtail_mw": `sum(env.gen_renewable)` dimensions, completely equivalent to "curtail" for this representation.
This is the "conversion to discrete action" of the curtailment action.
- "set_storage": `n_storage` dimensions, each containing a certain number of choices depending on the value
of the keyword argument `nb_bins["set_storage"]` (by default 7). This is the "conversion to discrete action"
of the action on storage units.
Expand All @@ -72,16 +71,16 @@ class MultiDiscreteActSpace(MultiDiscrete):
"line_change_status", "one_sub_change" or "change_bus".
Combining a "set" and "change" on the same element will most likely lead to an "ambiguous action". Indeed
what grid2op can do if you "tell element A to go to bus 1" and "tell element A2 to go to bus 2 if it was
to 1 and to move to bus 1 if it was on bus 2". It's not clear at all.
what grid2op can do if you "tell element A to go to bus 1" and "tell the same element A to switch to bus 2 if it was
to 1 and to move to bus 1 if it was on bus 2". It's not clear at all (hence the "ambiguous").
No error will be thrown if you mix this, this is your absolute right, be aware it might not
lead to the result you expect though.
.. warning::
.. note::
The arguments "set_bus", "sub_set_bus" and "one_sub_set" will all perform "set_bus" action. The only
difference if "how you represent this action":
The arguments "set_bus", "sub_set_bus" and "one_sub_set" will all perform "set_bus" actions. The only
difference if "how you represent these actions":
- In "set_bus" each component represent a single element of the grid. When you sample an action
with this keyword you will possibly change all the elements of the grid at once (this is likely to
Expand Down
2 changes: 0 additions & 2 deletions grid2op/tests/test_GridObjects.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
from grid2op.Backend.EducPandaPowerBackend import EducPandaPowerBackend
from grid2op.Exceptions import EnvError

import pdb


class TestAuxFunctions(unittest.TestCase):
def setUp(self) -> None:
Expand Down
2 changes: 0 additions & 2 deletions grid2op/tests/test_MakeEnv.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,8 @@
# This file is part of Grid2Op, Grid2Op a testbed platform to model sequential decision making in power systems.

import os
import sys
import unittest
import warnings
import time
import numpy as np
import pdb

Expand Down
1 change: 0 additions & 1 deletion grid2op/tests/test_MultiProcess.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
# SPDX-License-Identifier: MPL-2.0
# This file is part of Grid2Op, Grid2Op a testbed platform to model sequential decision making in power systems.

import pdb
import warnings
from grid2op.tests.helper_path_test import *

Expand Down
7 changes: 2 additions & 5 deletions grid2op/tests/test_ObsPlusAct.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
# you can obtain one at http://mozilla.org/MPL/2.0/.
# SPDX-License-Identifier: MPL-2.0
# This file is part of Grid2Op, Grid2Op a testbed platform to model sequential decision making in power systems.
import copy
import re

import warnings
from grid2op.tests.helper_path_test import *

Expand All @@ -16,9 +15,7 @@
from grid2op.Exceptions import *
from grid2op.Action import *
from grid2op.Parameters import Parameters
from grid2op.Rules import RulesChecker, AlwaysLegal
from grid2op.Space.space_utils import save_to_dict
from grid2op.tests.test_Action import _get_action_grid_class
from grid2op.Rules import AlwaysLegal

import pdb

Expand Down
Loading

0 comments on commit 87f305d

Please sign in to comment.