Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] classic.chess: Mismatch between observation shape in documentation and in code #922

Closed
1 task done
x0wllaar opened this issue Mar 27, 2023 · 8 comments
Closed
1 task done
Labels
bug Something isn't working

Comments

@x0wllaar
Copy link

Describe the bug

The documentation (both on the website, and in the code) says that for the Chess environment, the observation shape (by default) is an 8x8x20 tensor that contains a "snapshot" representation of the current board state, without any history of the previous board states.

However, in my observations, with PettingZoo 1.22.3 and chess v5 environment, the actual observation shape is 8x8x111, and, as far as I understand from the code, it contains the previous board states. As far as I understand from the code, it's also impossible to turn this behavior off and return to the 8x8x20 tensors for board representation.

Code example

from pettingzoo.classic import chess_v5
env = chess_v5.env()
obs = env.unwrapped.observe("player_0")["observation"]
print(obs.shape)
#Expected: (8, 8, 20)
#Got: (8, 8, 111)

System info

PettingZoo was installed from pip.

Version of PettingZoo: 1.22.3

OS: Rocky Linux 9 in Docker, kernel Linux 8dee67faa3f8 6.2.2-x64v3-xanmod1 #0~20230303.0f2ddc7 SMP PREEMPT_DYNAMIC Sat Mar 4 00:56:43 UTC x86_64 x86_64 x86_64 GNU/Linux

Python version: Python 3.9.14

Additional context

I would love this issue to be resolved, or a method to get the 8x8x20 tensors back, as for my application, the RL agent will not have any access to the board states, it should learn to play using the "snapshot" board representations and no history.

Checklist

  • I have checked that there is no similar issue in the repo
@x0wllaar x0wllaar added the bug Something isn't working label Mar 27, 2023
@jacob975
Copy link
Contributor

jacob975 commented Apr 3, 2023

The same, then I also found the returned observation is not oriented to the current nor specified agent.

Code Example

from pettingzoo.classic import chess_v5
env = chess_v5.env(render_mode='ansi')
env.reset()
print(env.render())
action_mask = env.observe(f'player_0')['action_mask']
observation = env.observe(f'player_0')['observation']
table = observation[:,:,7] # The pawns
print("In the view of Player 0")
print(table)
possible_actions = np.where(action_mask>0)[0]
print(possible_actions)
action = 77
print(action)
env.env.step(action)
# After an action
print("------------------")
print(env.render())
observation = env.observe(f'player_0')['observation']
table = observation[:,:,7]
print("In the view of Player 0")
print(table)
observation = env.observe(f'player_1')['observation']
table = observation[:,:,7]
print("In the view of Player 1")
print(table)

The Return

r n b q k b n r
p p p p p p p p
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
P P P P P P P P
R N B Q K B N R
In the view of Player 0
[[False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]]
[  77   85  643  645  661  669 1245 1253 1829 1837 2413 2421 2997 3005
 3563 3565 3581 3589 4165 4173]
77
------------------
r n b q k b n r
p p p p p p p p
. . . . . . . .
. . . . . . . .
. . . . . . . .
P . . . . . . .
. P P P P P P P
R N B Q K B N R
In the view of Player 0
[[False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False  True]
 [ True  True  True  True  True  True  True False]
 [False False False False False False False False]]
In the view of Player 1
[[False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False False]
 [False False False False False False False  True]
 [ True  True  True  True  True  True  True False]
 [False False False False False False False False]]

By the way, the return observation seems repeating the channel 7-20 eight times, such that the wrong observation having 111 channels.

@Tientjie-san
Copy link

Facing the same issues, looks like some environments and/or documentation is outdated

@benblack769
Copy link
Contributor

Hmm. After a quick look, it is probably both a problem with both the documentation and implementation.

The documentation is out of date, because we decided to switch to alphazero style frame stacking a long time ago, and the documentation didn't catch up. This can be fixed by just noting that channels 7-19 are repeated 8 times, storing the game history for each player.

I think the main error in the implementation is this line

https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/classic/chess/chess.py#L266

current_agent used to be an integer, which was 0 or 1, so the boolean condition here https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/classic/chess/chess_utils.py#L205 made sense.

Now, it is a string, so that condition does not work.

The second error, is that the observe function does not flip the board_history depending on which agent gets the observation, so the off-turn agent gets the wrong observation.

@elliottower
Copy link
Contributor

Hmm. After a quick look, it is probably both a problem with both the documentation and implementation.

The documentation is out of date, because we decided to switch to alphazero style frame stacking a long time ago, and the documentation didn't catch up. This can be fixed by just noting that channels 7-19 are repeated 8 times, storing the game history for each player.

I think the main error in the implementation is this line

https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/classic/chess/chess.py#L266

current_agent used to be an integer, which was 0 or 1, so the boolean condition here https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/classic/chess/chess_utils.py#L205 made sense.

Now, it is a string, so that condition does not work.

The second error, is that the observe function does not flip the board_history depending on which agent gets the observation, so the off-turn agent gets the wrong observation.

@Tientjie-san @jacob975 would either of you two be willing to submit a PR fixing these issues? Appreciate the issue and making us aware of the bugs. Otherwise let us know and we can have somebody else do it.

@jacob975
Copy link
Contributor

jacob975 commented May 3, 2023

@elliottower I am interested in fixing the orientation issues. @benblack769 thank you for your nice suggestion. Then, I would suggest not to apply mirror in raw_env.step because it might mess up the board history. On the other hand, we can add a few more lines in raw_env.observe to mirror or transform the board history.

class raw_env(AECEnv):
    ...
    def observe(self, agent):
        ...
        observation = np.dstack((observation[:, :, :7], self.board_history)) # (8x8x111)
        # We need to swap the white 6 channels with black 6 channels
        if self.possible_agents.index(agent):
            # Section 1: Mirror the board
            observation  = np.flip(observation, axis=0)
            # Section 2: Swap the white 6 channels with the black 6 channels
            for i in range(1,9):
                tmp = observation[..., 13*i-6 : 13*i].copy()
                observation[..., 13*i-6 : 13*i] = observation[..., 13*i : 13*i+6]
                observation[..., 13*i : 13*i+6] = tmp
        ...
        return {"observation": observation, "action_mask": action_mask}

For raw_env.render(), it always render the board in the player 0 view. Althought this part is not consistent to the documentation, I think it is better leave it like this because the output of this function is not for RL agents but for humans. Maybe to put more description to this in documentation.

@elliottower
Copy link
Contributor

Definitely agree the rendering should nd consistent, but personally don’t know enough about how other implementations do chess to know if it’s a good idea to swap for observations. The comments in that file I believe say the swapping is so self play agents can learn better because the pieces always start on the bottom. But you could look into other libraries or papers potentially to check ( maybe @benblack769 has more insight). OpenSpiel chess for example, RLlib has a LeelaChessZero implementation and their own chess thing I think

@elliottower
Copy link
Contributor

@jacob975 if you're still interested in fixing this, it would be great if you could join the discord and shoot me a DM so we can coordiante.

@elliottower
Copy link
Contributor

Fixed in #1004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants