-
Notifications
You must be signed in to change notification settings - Fork 3
Cb2 Agents
In this section, we recreate the implementation of SimpleFollower. A SimpleFollower is a follower agent that waits for instructions which are comma-separated lists of directions. For example, the following instruction:
left, right, forward, backward, left, right, forward, forward, left, left, forward, ...
In other words, it's a command that spoon-feeds the follower exactly which keypresses to make. This is a very simple agent, but it's a good starting point for understanding how to create an agent. It's also surprisingly useful for unit testing and debugging.
Creating an agent is as simple as subclassing the Agent
base class. We provide
a script to do this for you, along with filling in some boilerplate code.
To generate a new agent, run the following command:
python3 -m cb2game.agents.create_agent --agent_name=MyAgent
If a name is not provided on the command line, the script will prompt you to enter an
agent name. You can always edit this later. Type in a CamelCaseAgentName
. For
example, MyAgent
. The script will create a new file in the current directory
called my_agent.py
. It will also create two new classes called MyAgent
and
MyAgentConfig
inside that file. The MyAgent
class will be a subclass of the
Agent
base class.
@dataclass
class MyAgentConfig(object):
"""Configuration for MyAgent."""
# Add configuration fields here.
# Then generate the agent config yaml with:
# `python3 -m cb2game.agents.generate_config my_agent`
pass
class MyAgent(Agent):
def __init__(self, config: MyAgentConfig):
# Initialize your agent here.
self.config = config
# OVERRIDES role
def role(self) -> Role:
# This function should be a one-liner.
# Return the role of your agent (Role.LEADER or Role.FOLLOWER).
raise NotImplementedError("Implement this...")
# OVERRIDES choose_action
def choose_action(self, game_state: GameState, action_mask=None) -> Action:
# Choose an action based on the current game state.
# Game state is defined here:
# https://github.com/lil-lab/cb2/blob/main/src/cb2game/pyclient/game_endpoint.py#L88
# Agent creation tutorial here:
# https://github.com/lil-lab/cb2/wiki/Cb2-Agents
raise NotImplementedError("Implement this...")
Let's go ahead and implement the first method, role
. This method should be a
one-liner. We're creating a follower agent. So we return Role.FOLLOWER
.
def role(self) -> Role:
return Role.FOLLOWER
The choose_action
method is the main method of the agent. It takes a
GameState
object and returns an Action
object. The GameState
dataclass
contains all the information about the current state of the game. GameState has
a number of convenience functions attached to it, but the core members are:
@dataclass
class GameState(DataClassJSONMixin):
"""Represents the state of the game at a given time. Unpacks to a tuple for compatibility with the old API."""
map_update: MapUpdate # The game map.
props: List[Prop] # The props in the game.
turn_state: TurnState # Moves left, current role, score, turns left, etc...
instructions: List[ObjectiveMessage] # List of follower instructions.
actors: List[Actor] # List of actors in the game.
live_feedback: List[LiveFeedback] = None # List of live feedback messages since the last call to step().
...
By the way, the props are actually just cards. We initially planned to support
more kinds of props, but for now it's just cards, and the datastructure is a bit
too generic for its own good. You can convert a prop to a card with
Card.FromProp()
. That will give you a more convenient interface.
Anyways, for the SimpleFollower implementation, everything the agent needs to do is in the instructions. Let's implement some helper functions. Since these are local to the file and are merely on the side for organization, we start the method names with an underscore to prevent other modules from importing them.
First, to convert an instruction from text into an Action that we can pass into the game:
def _actions_from_instruction(instruction):
actions = []
instruction_action_codes = instruction.split(",")
for action_code in instruction_action_codes:
action_code = action_code.strip().lower()
if len(action_code) == 0:
continue
if "forward".startswith(action_code):
actions.append(Action.Forwards())
elif "backward".startswith(action_code):
actions.append(Action.Backwards())
elif "left".startswith(action_code):
actions.append(Action.Left())
elif "right".startswith(action_code):
actions.append(Action.Right())
elif "random".startswith(action_code):
actions.append(Action.RandomMovementAction())
return actions
This function allows you to type in any prefix of the action word, and it'll still match. So "f", "for", and "forw" all match "forward". Since the leader is possibly a human, this is nice and convenient for them.
Then, a helper function to check which instruction is currently active. Instructions are sorted in the order that they were received, so the active instruction is the first one which hasn't been completed or cancelled:
def _get_active_instruction(instructions):
for instruction in instructions:
if not instruction.completed and not instruction.cancelled:
return instruction
return None
And now we're ready to start implementing choose_action()!
One more catch... The game only gives the follower 10 moves per turn. It's
possible that we receive an instruction with more than 10 moves. What to do?
Save the extra moves for the next turn. Thus, our implementation processes each
instruction, places the moves in a queue, and then pops off the queue as we go.
Once the queue is empty, we mark the instruction as done and move on to the next
instruction. The queue is stored in member variable self.actions
.
Here's the full implementation of choose_action():
def choose_action(self, game_state: GameState, action_mask=None) -> Action:
"""Chooses an action to take, given a game state.
This uses a very simple language to communicate with the leader. The leader specifies actions in an instruction like:
instruction: "forward, left, left, random, right, backwards".
This corresponds with simple follower actions, which the follower will then immediately take. "Random" results in a random action, from [left, forward, right, back].
"""
(map, cards, turn_state, instructions, actors, feedback) = game_state
# If no pending actions, parse them from the active instruction.
if len(self.actions) == 0:
active_instruction = _get_active_instruction(instructions)
if active_instruction is None:
logger.info(
f"No active instruction available. Invalid state. Taking NoopAction."
)
return Action.NoopAction()
self.actions.extend(_actions_from_instruction(active_instruction.text))
self.actions.append(Action.InstructionDone(active_instruction.uuid))
self.instructions_processed.add(active_instruction.uuid)
# Check actions again, in case none were parsed from the instruction.
if len(self.actions) == 0:
logger.info(
f"Ran out of commands to follow. Choosing {self.config.default_action}."
)
default_action_code = Action.ActionCode.from_str(self.config.default_action)
if default_action_code == Action.ActionCode.INSTRUCTION_DONE:
return Action.InstructionDone(active_instruction.uuid)
return Action(default_action_code)
# Return the next action.
action = self.actions[0]
self.actions.pop(0)
return action
Agents plug in seemlessly with our Evaluation framework. Given your new config file, my_agent.yaml, you can run the following command to evaluate your agent:
# Evaluate my_agent.yaml against 10 instructions from the database pointed to by default.yaml.
python3 -m cb2game.eval.run_eval --agent_config=my_agent.yaml --server_config=default.yaml --limit=10
Command line parameter --limit=N
limits eval to N instructions. This is useful
for quick testing.
When evaluating the simple agent against the released dataset, about 40-50
instructions pass evaluation. This is because there are no instructions of the
form that SimpleFollower
understands, so it always just marks the instruction
as done immediately. For about 40-50 instructions in the dataset, this is the
right thing to do. For docs on how to use eval with a downloaded dataset, see the
CB2 Database wiki page.
Agents can be used directly from the command line. In each of the examples
below, we use a SimpleFollower
bot. You can change this by specifying a
different agent config file.
Use the command line:
python3 -m cb2game.agents.remote_agent "https://cb2.ai" my_agent.yaml
Now, navigate to the URL https://cb2.ai/play?lobby_name=bot-sandbox
. If you
hit "Play Game", you should be paired up with your bot. Note that if someone
else is doing this simultaneously, you might get mixed up and pair with their
bot.
You can also launch a local server, in which case the command is:
python3 -m cb2game.agents.remote_agent "http://localhost:8080"
See the project Readme for more information on how to launch a local server.
By the way, all bots default to the "bot-sandbox" lobby, which is a lobby that only allows bots. If you want to play against a human, you'll need to specify a different lobby. Please do NOT use our server for this purpose, unless it's in the bot-sandbox lobby. Contact the CB2 research team if you're interested in deploying a model on our website.
If you have two agents, you can run a local game between them. This is useful for training a model against itself. You can quickly demo this by creating SimpleLeader and SimpleFollower config files:
# Create server config.
python3 -m cb2game.server.generate_config --all_defaults
# Create agent configs.
python3 -m cb2game.agents.generate_config cb2game.agents.simple_leader
python3 -m cb2game.agents.generate_config cb2game.agents.simple_follower
# Run a local game.
python3 -m cb2game.agents.local_agent_pair --num_games=10 --config_filepath="default.yaml" --leader_config="simple_leader.yaml" --follower_config="simple_follower.yaml"