Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chess to white perspective, fix observation bug, add documentation #1004

Merged
merged 7 commits into from
Jul 6, 2023

Conversation

jacob975
Copy link
Contributor

@jacob975 jacob975 commented Jun 20, 2023

Description

Fix the issue # 992. In this pull request, the observations in env.board_history is now always toward the white player. However, users can still get observations orientated toward the black player by env.observe('player_1").

Fix the issue where the env.observe("player_x")["observation"] is left-side right mirroring compare to env.render().
This issue is contributed by two bugs. Firstly, in the pettingzoo/classic/chess/chess_utils.py#L286, It should be OURS=1 but OURS=0 in the last version. Vice versa forTHEIRS. Secondly, the function boards_to_ndarray in pettingzoo/classic/chess/chess_utils.py#L5 output an array boardimage but incorrectly with a 180 degree rotation. We use np.flip to fix it.

Corresponding updates on pettingzoo/classic/chess/chess_utils.py#L330-#L344 to keep the en passant flag working.

Add a few type hints for functions in pettingzoo/classic/chess/chess_utils.py and pettingzoo/classic/chess/chess.py

Revise comments in pettingzoo/classic/chess/chess.py to match the current situation, especially on the board orientation and env.observe().

Fixes # (issue), Depends on # (pull request)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.
To upload images to a PR -- simply drag and drop or copy paste.

Checklist:

  • [x ] I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • [x ] I have run pytest -v and no errors are present.
  • [x ] I have commented my code, particularly in hard-to-understand areas
  • [x ] I have made corresponding changes to the documentation
  • [x ] I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
  • [x ] I have added tests that prove my fix is effective or that my feature works
  • [x ] New and existing unit tests pass locally with my changes

@jacob975 jacob975 changed the title Minor fix on the comments of chess_v6 Minor fix on the chess_v6 Jun 20, 2023

Unlike AlphaZero, the observation space does not stack the observations previous moves by default. This can be accomplished using the `frame_stacking` argument of our wrapper.
Unlike AlphaZero, where the board orientation may vary, in our system, the `env.board_history` always maintains the orientation towards the white agent, with the white agent's king consistently positioned on the 1st row. In simpler terms, both players are observing the same board layout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on this, super thorough and well written, and good thinking to include the option for if people do want to observe the flipped board for black.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing. It is my pleasure.

observation = chess_utils.get_observation(
self.board, self.possible_agents.index(agent)
)
agent_index = self.possible_agents.index(agent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should make the variable naming here consistent with the step function, which does the following:

current_agent = self.agent_selection
current_index = self.agents.index(current_agent)

Note that self.possible_agents always has both agents, whereas self.agents can have one agent removed if they do an illegal move and terminate, which would make the next observation incorrect.
I can see an argument that agent_index is a better name than current_index, but probably best to just keep the way things were dont previously and rename your var to current_index.

TLDR probably want to just do current_index = self.agents.index(current_agent) and replace agent_index on the next line with current_index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replace the agent_index with current_index and also check the naming consistency.

@@ -272,7 +286,7 @@ def step(self, action):
self._accumulate_rewards()

# Update board after applying action
next_board = chess_utils.get_observation(self.board, current_agent)
next_board = chess_utils.get_observation(self.board, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment right above here noting that it's always from white's perspective (you note it in the docstring but that's long and people reading this code might be confused why it's 0). Could also explicitly do the arguments so people know what 0 means, i.e., get_observation(orig_board=self.board, player=0

Copy link
Contributor Author

@jacob975 jacob975 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion. I change to use player=0, and add a comment above saying it is always from the white's perspective.

@@ -8,8 +8,8 @@ def boards_to_ndarray(boards):
bits = np.unpackbits(arr8)
floats = bits.astype(bool)
boardstack = floats.reshape([len(boards), 8, 8])
boardimage = np.transpose(boardstack, [1, 2, 0])
return boardimage
boardstack = np.flip(np.transpose(boardstack, [1, 2, 0]), axis=[0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment above this line or in the docstring or something explaining why you're calling np.flip? I'm assuming it's the issue you noted in discord with the left side of the board being displayed as the right side for some reason, but future devs would likely be confused by this and want some explanation. Ideally we'd get to the bottom of it and see why it's doing that mirroring in the first place but I'm fine just fixing it for now and noting that there's some other underlying problem that leads to things being mirrored

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not 100% sure if boardimage is the right variable name but I say probably just keep that name for consistency's sake, to show which part is stacking and which is turning it into the actual image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a comment of why I use np.flip. I also add more explanation to this update in the pull request.
To summarize, in the last version, the function boards_to_ndarray outputs an observation that is rotated by 180 degrees compared to env.render().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is my typo. Now it is back to boardimage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What underlying condition caused the previous version to flip it though? I would think we could figure out the math somehow and resolve it. But maybe it’s due to Pygame’s rendering flipping things in observe? Although the underlying chess board I imagine you can do some sort of function to get it as an array. Maybe we can look into using that or at least check to see how it works.

Copy link
Contributor Author

@jacob975 jacob975 Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Th 180 rotation in the previous version is contributed by two factors. The line 8 bits = np.unpackbits(arr8) gives a bit map, by default, increments from RHS to LHS. Another factor is that the env.render() shows the first line at the bottom but numpy shows the first line on the top.

I am going to make comments and optimization about these in next revision.

@@ -194,7 +194,7 @@ def legal_moves(orig_board):
return legal_moves


def get_observation(orig_board, player):
def get_observation(orig_board, player: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind it would actually be nice to have type hints for all of this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a few type hints if it is obvious, e.g. chess.Board.

square = board.ep_square # (int) where the en passant happened
# Adjust the row number for the white pawn to the 1st if the en passant flag is set, and vice versa for black pawns.
# For example
# If the white play an en passant move, the opponent can play a special move called en passant capture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this doesn’t actually explain it, the idea is if a pawn is next to another piece and moves diagonally to still be next to it then it removes it, right? I’d just Google around and find a nice simple explanation

# Adjust the row number for the white pawn to the 1st if the en passant flag is set, and vice versa for black pawns.
# For example
# If the white play an en passant move, the opponent can play a special move called en passant capture.
# To show this, we denote the pawn at (row, col) = (1, `dest_square`) instead of (5, `dest_square`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t get why we denote it differently with 1, dest square. Is that just because it’s impossible for a pawn to go backwards so this is just the only encoding for a pawn being in en passant position? Is this something done by AlphaZero or others? Seems a bit odd.

Copy link
Contributor Author

@jacob975 jacob975 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a white pawn never walks to the 1st row. It is just a notation of the chance to make a en passant capture. It is written in the document pettingzoo/classic/chess/chess.py#L41. To my knowledge I have no idea whether it is conventional, but I just follow the definition already in pettinzoo.chess_v6.

Copy link
Contributor Author

@jacob975 jacob975 Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I review the paper of AlphaZero and the notation system of chess called Forsyth-Edwards Notation (FEN). AlphaZero mentioned some of special movement in chess like queen castling, but nothing about en passant capture. They did not explain but I guess the reason is a neural network can retrive information of en passant flag from their board history. On the other hand, FEN is a popular notation used on chess. I also see a function for it in the module chess. FEN use a individual string to denote whether a en passant flag is set or not, instead of putting into the 1st line of the board.

Back to our observation space, so far I revise the en passant section to keep its functionality. However, it is not conventional to my knowledge. In addition, we have provided a borad history where a model can retrive the information of en passant flag. Therefore, I would like to comment out this section of code.

Copy link
Contributor

@elliottower elliottower Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check other RL chess implementations such as openspiel or other papers? Just want to make sure it won’t mess things up for people. I think I linked it previously but RLlib has a LeelaChessZero model, maybe we can check if that would work with this change? I can try to get in contact with the person who did the pr for that if need be.

My initial thought is it makes sense, first row pawn seems very non standard and roundabout way of handling it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will check and inform you if I need help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this behavior from LC0, but I took a 2nd look at the AlphaZero paper and it definitely doesn't mention this. Presumably alphazero is relying on the board history to provide the necessary information for en-passant, and the Lc0 developers decided to add this information in explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the current code behavior is also done by leela chess zero that seems reasonable to do imo. Having some way of accessing the information is better than none. Do you have thoughts on the FEN system mentioned above @benblack769?

Copy link
Contributor

@benblack769 benblack769 Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, FEN's representation is nice because it is concise and unambiguous, but neural networks aren't necessarily the best at learning concise information, it can be better to duplicate information.

For a code reference, here is how Lc0 decodes FEN into its internal representation of en-passant https://github.com/LeelaChessZero/lc0/blob/master/src/chess/board.cc#L1114

@elliottower elliottower changed the title Minor fix on the chess_v6 Fix chess to white perspective, fix observation bug, add documentation Jun 21, 2023
"""
The LeelaChessZero-style en passant flag.
Adjust the row number for the white pawn to the 1st if the en passant flag is set, and vice versa for black pawns.
In FEN, the en passant flag is represented by the square that can be a possible target of an en passant, e.g. the `e3` in `4k3/8/8/8/4Pp2/8/8/4K3 b - e3 99 50`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks for the extra clarification

@elliottower elliottower merged commit bb119f4 into Farama-Foundation:master Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants