[Bug]: PPO doesn't work correctly with MultiDiscrete action spaces with "start" parameter #1718
Closed
5 tasks done
Labels
custom gym env
Issue related to Custom Gym Env
documentation
Improvements or additions to documentation
help wanted
Help from contributors is welcomed
🐛 Bug
Hello!
I am trying to run the PPO algorithm with one of the environments we have created in Sinergym.
The point is that I have defined a MultiDiscrete action space (which according to the documentation is compatible), but the actions performed do not take into account the "start" parameter of the space definition.
As can be seen in the Traceback, the last action variable should be an integer value between 25 and 35, but it takes the values from 0 to 10.
I do not include test code in order not to increase the complexity of the problem, since I am using Sinergym as I have commented. The problem is simpler, and it can be seen in Traceback.
Code example
No response
Relevant log output / Error message
System Info
Checklist
The text was updated successfully, but these errors were encountered: