-
-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] classic.chess: Mismatch between observation shape in documentation and in code #922
Comments
The same, then I also found the returned observation is not oriented to the current nor specified agent. Code Examplefrom pettingzoo.classic import chess_v5
env = chess_v5.env(render_mode='ansi')
env.reset()
print(env.render())
action_mask = env.observe(f'player_0')['action_mask']
observation = env.observe(f'player_0')['observation']
table = observation[:,:,7] # The pawns
print("In the view of Player 0")
print(table)
possible_actions = np.where(action_mask>0)[0]
print(possible_actions)
action = 77
print(action)
env.env.step(action)
# After an action
print("------------------")
print(env.render())
observation = env.observe(f'player_0')['observation']
table = observation[:,:,7]
print("In the view of Player 0")
print(table)
observation = env.observe(f'player_1')['observation']
table = observation[:,:,7]
print("In the view of Player 1")
print(table) The Return
By the way, the return observation seems repeating the channel 7-20 eight times, such that the wrong observation having 111 channels. |
Facing the same issues, looks like some environments and/or documentation is outdated |
Hmm. After a quick look, it is probably both a problem with both the documentation and implementation. The documentation is out of date, because we decided to switch to alphazero style frame stacking a long time ago, and the documentation didn't catch up. This can be fixed by just noting that channels 7-19 are repeated 8 times, storing the game history for each player. I think the main error in the implementation is this line
Now, it is a string, so that condition does not work. The second error, is that the |
@Tientjie-san @jacob975 would either of you two be willing to submit a PR fixing these issues? Appreciate the issue and making us aware of the bugs. Otherwise let us know and we can have somebody else do it. |
@elliottower I am interested in fixing the orientation issues. @benblack769 thank you for your nice suggestion. Then, I would suggest not to apply mirror in class raw_env(AECEnv):
...
def observe(self, agent):
...
observation = np.dstack((observation[:, :, :7], self.board_history)) # (8x8x111)
# We need to swap the white 6 channels with black 6 channels
if self.possible_agents.index(agent):
# Section 1: Mirror the board
observation = np.flip(observation, axis=0)
# Section 2: Swap the white 6 channels with the black 6 channels
for i in range(1,9):
tmp = observation[..., 13*i-6 : 13*i].copy()
observation[..., 13*i-6 : 13*i] = observation[..., 13*i : 13*i+6]
observation[..., 13*i : 13*i+6] = tmp
...
return {"observation": observation, "action_mask": action_mask} For |
Definitely agree the rendering should nd consistent, but personally don’t know enough about how other implementations do chess to know if it’s a good idea to swap for observations. The comments in that file I believe say the swapping is so self play agents can learn better because the pieces always start on the bottom. But you could look into other libraries or papers potentially to check ( maybe @benblack769 has more insight). OpenSpiel chess for example, RLlib has a LeelaChessZero implementation and their own chess thing I think |
@jacob975 if you're still interested in fixing this, it would be great if you could join the discord and shoot me a DM so we can coordiante. |
Fixed in #1004 |
Describe the bug
The documentation (both on the website, and in the code) says that for the Chess environment, the observation shape (by default) is an 8x8x20 tensor that contains a "snapshot" representation of the current board state, without any history of the previous board states.
However, in my observations, with PettingZoo 1.22.3 and chess v5 environment, the actual observation shape is 8x8x111, and, as far as I understand from the code, it contains the previous board states. As far as I understand from the code, it's also impossible to turn this behavior off and return to the 8x8x20 tensors for board representation.
Code example
System info
PettingZoo was installed from pip.
Version of PettingZoo:
1.22.3
OS: Rocky Linux 9 in Docker, kernel
Linux 8dee67faa3f8 6.2.2-x64v3-xanmod1 #0~20230303.0f2ddc7 SMP PREEMPT_DYNAMIC Sat Mar 4 00:56:43 UTC x86_64 x86_64 x86_64 GNU/Linux
Python version:
Python 3.9.14
Additional context
I would love this issue to be resolved, or a method to get the 8x8x20 tensors back, as for my application, the RL agent will not have any access to the board states, it should learn to play using the "snapshot" board representations and no history.
Checklist
The text was updated successfully, but these errors were encountered: