-
-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix chess to white perspective, fix observation bug, add documentation #1004
Conversation
|
||
Unlike AlphaZero, the observation space does not stack the observations previous moves by default. This can be accomplished using the `frame_stacking` argument of our wrapper. | ||
Unlike AlphaZero, where the board orientation may vary, in our system, the `env.board_history` always maintains the orientation towards the white agent, with the white agent's king consistently positioned on the 1st row. In simpler terms, both players are observing the same board layout. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job on this, super thorough and well written, and good thinking to include the option for if people do want to observe the flipped board for black.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing. It is my pleasure.
pettingzoo/classic/chess/chess.py
Outdated
observation = chess_utils.get_observation( | ||
self.board, self.possible_agents.index(agent) | ||
) | ||
agent_index = self.possible_agents.index(agent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should make the variable naming here consistent with the step function, which does the following:
current_agent = self.agent_selection
current_index = self.agents.index(current_agent)
Note that self.possible_agents always has both agents, whereas self.agents can have one agent removed if they do an illegal move and terminate, which would make the next observation incorrect.
I can see an argument that agent_index is a better name than current_index, but probably best to just keep the way things were dont previously and rename your var to current_index.
TLDR probably want to just do current_index = self.agents.index(current_agent)
and replace agent_index
on the next line with current_index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replace the agent_index
with current_index
and also check the naming consistency.
pettingzoo/classic/chess/chess.py
Outdated
@@ -272,7 +286,7 @@ def step(self, action): | |||
self._accumulate_rewards() | |||
|
|||
# Update board after applying action | |||
next_board = chess_utils.get_observation(self.board, current_agent) | |||
next_board = chess_utils.get_observation(self.board, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment right above here noting that it's always from white's perspective (you note it in the docstring but that's long and people reading this code might be confused why it's 0). Could also explicitly do the arguments so people know what 0 means, i.e., get_observation(orig_board=self.board, player=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion. I change to use player=0
, and add a comment above saying it is always from the white's perspective.
@@ -8,8 +8,8 @@ def boards_to_ndarray(boards): | |||
bits = np.unpackbits(arr8) | |||
floats = bits.astype(bool) | |||
boardstack = floats.reshape([len(boards), 8, 8]) | |||
boardimage = np.transpose(boardstack, [1, 2, 0]) | |||
return boardimage | |||
boardstack = np.flip(np.transpose(boardstack, [1, 2, 0]), axis=[0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave a comment above this line or in the docstring or something explaining why you're calling np.flip? I'm assuming it's the issue you noted in discord with the left side of the board being displayed as the right side for some reason, but future devs would likely be confused by this and want some explanation. Ideally we'd get to the bottom of it and see why it's doing that mirroring in the first place but I'm fine just fixing it for now and noting that there's some other underlying problem that leads to things being mirrored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not 100% sure if boardimage is the right variable name but I say probably just keep that name for consistency's sake, to show which part is stacking and which is turning it into the actual image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add a comment of why I use np.flip
. I also add more explanation to this update in the pull request.
To summarize, in the last version, the function boards_to_ndarray
outputs an observation that is rotated by 180 degrees compared to env.render()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is my typo. Now it is back to boardimage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What underlying condition caused the previous version to flip it though? I would think we could figure out the math somehow and resolve it. But maybe it’s due to Pygame’s rendering flipping things in observe? Although the underlying chess board I imagine you can do some sort of function to get it as an array. Maybe we can look into using that or at least check to see how it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Th 180 rotation in the previous version is contributed by two factors. The line 8 bits = np.unpackbits(arr8)
gives a bit map, by default, increments from RHS to LHS. Another factor is that the env.render() shows the first line at the bottom but numpy
shows the first line on the top.
I am going to make comments and optimization about these in next revision.
@@ -194,7 +194,7 @@ def legal_moves(orig_board): | |||
return legal_moves | |||
|
|||
|
|||
def get_observation(orig_board, player): | |||
def get_observation(orig_board, player: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind it would actually be nice to have type hints for all of this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add a few type hints if it is obvious, e.g. chess.Board
.
square = board.ep_square # (int) where the en passant happened | ||
# Adjust the row number for the white pawn to the 1st if the en passant flag is set, and vice versa for black pawns. | ||
# For example | ||
# If the white play an en passant move, the opponent can play a special move called en passant capture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this doesn’t actually explain it, the idea is if a pawn is next to another piece and moves diagonally to still be next to it then it removes it, right? I’d just Google around and find a nice simple explanation
# Adjust the row number for the white pawn to the 1st if the en passant flag is set, and vice versa for black pawns. | ||
# For example | ||
# If the white play an en passant move, the opponent can play a special move called en passant capture. | ||
# To show this, we denote the pawn at (row, col) = (1, `dest_square`) instead of (5, `dest_square`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t get why we denote it differently with 1, dest square. Is that just because it’s impossible for a pawn to go backwards so this is just the only encoding for a pawn being in en passant position? Is this something done by AlphaZero or others? Seems a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a white pawn never walks to the 1st row. It is just a notation of the chance to make a en passant capture. It is written in the document pettingzoo/classic/chess/chess.py#L41
. To my knowledge I have no idea whether it is conventional, but I just follow the definition already in pettinzoo.chess_v6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I review the paper of AlphaZero and the notation system of chess called Forsyth-Edwards Notation (FEN). AlphaZero mentioned some of special movement in chess like queen castling, but nothing about en passant capture. They did not explain but I guess the reason is a neural network can retrive information of en passant flag from their board history. On the other hand, FEN is a popular notation used on chess. I also see a function for it in the module chess
. FEN use a individual string to denote whether a en passant flag is set or not, instead of putting into the 1st line of the board.
Back to our observation space, so far I revise the en passant section to keep its functionality. However, it is not conventional to my knowledge. In addition, we have provided a borad history where a model can retrive the information of en passant flag. Therefore, I would like to comment out this section of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check other RL chess implementations such as openspiel or other papers? Just want to make sure it won’t mess things up for people. I think I linked it previously but RLlib has a LeelaChessZero model, maybe we can check if that would work with this change? I can try to get in contact with the person who did the pr for that if need be.
My initial thought is it makes sense, first row pawn seems very non standard and roundabout way of handling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will check and inform you if I need help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this behavior from LC0, but I took a 2nd look at the AlphaZero paper and it definitely doesn't mention this. Presumably alphazero is relying on the board history to provide the necessary information for en-passant, and the Lc0 developers decided to add this information in explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the current code behavior is also done by leela chess zero that seems reasonable to do imo. Having some way of accessing the information is better than none. Do you have thoughts on the FEN system mentioned above @benblack769?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, FEN's representation is nice because it is concise and unambiguous, but neural networks aren't necessarily the best at learning concise information, it can be better to duplicate information.
For a code reference, here is how Lc0 decodes FEN into its internal representation of en-passant https://github.com/LeelaChessZero/lc0/blob/master/src/chess/board.cc#L1114
""" | ||
The LeelaChessZero-style en passant flag. | ||
Adjust the row number for the white pawn to the 1st if the en passant flag is set, and vice versa for black pawns. | ||
In FEN, the en passant flag is represented by the square that can be a possible target of an en passant, e.g. the `e3` in `4k3/8/8/8/4Pp2/8/8/4K3 b - e3 99 50`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks for the extra clarification
Description
Fix the issue # 992. In this pull request, the observations in
env.board_history
is now always toward the white player. However, users can still get observations orientated toward the black player byenv.observe('player_1")
.Fix the issue where the env.observe("player_x")["observation"] is left-side right mirroring compare to
env.render()
.This issue is contributed by two bugs. Firstly, in the
pettingzoo/classic/chess/chess_utils.py#L286
, It should beOURS=1
butOURS=0
in the last version. Vice versa forTHEIRS
. Secondly, the functionboards_to_ndarray
inpettingzoo/classic/chess/chess_utils.py#L5
output an arrayboardimage
but incorrectly with a 180 degree rotation. We use np.flip to fix it.Corresponding updates on
pettingzoo/classic/chess/chess_utils.py#L330-#L344
to keep the en passant flag working.Add a few type hints for functions in
pettingzoo/classic/chess/chess_utils.py
andpettingzoo/classic/chess/chess.py
Revise comments in
pettingzoo/classic/chess/chess.py
to match the current situation, especially on the board orientation and env.observe().Fixes # (issue), Depends on # (pull request)
Type of change
Screenshots
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)pytest -v
and no errors are present.pytest -v
has generated that are related to my code to the best of my knowledge.