Skip to content

Cb2 Agents

Jacob Sharf edited this page Jul 21, 2023 · 3 revisions

Creating and Using CB2 Agents

Agent Creation Tutorial.

In this section, we recreate the implementation of SimpleFollower. A SimpleFollower is a follower agent that waits for instructions which are comma-separated lists of directions. For example, the following instruction:

left, right, forward, backward, left, right, forward, forward, left, left, forward, ...

In other words, it's a command that spoon-feeds the follower exactly which keypresses to make. This is a very simple agent, but it's a good starting point for understanding how to create an agent. It's also surprisingly useful for unit testing and debugging.

Step 1: Create an agent.

Creating an agent is as simple as subclassing the Agent base class. We provide a script to do this for you, along with filling in some boilerplate code.

To generate a new agent, run the following command:

python3 -m cb2game.agents.create_agent --agent_name=MyAgent

If a name is not provided on the command line, the script will prompt you to enter an agent name. You can always edit this later. Type in a CamelCaseAgentName. For example, MyAgent. The script will create a new file in the current directory called my_agent.py. It will also create two new classes called MyAgent and MyAgentConfig inside that file. The MyAgent class will be a subclass of the Agent base class.

@dataclass
class MyAgentConfig(object):
    """Configuration for MyAgent."""

    # Add configuration fields here.
    # Then generate the agent config yaml with:
    # `python3 -m cb2game.agents.generate_config my_agent`
    pass


class MyAgent(Agent):
    def __init__(self, config: MyAgentConfig):
        # Initialize your agent here.
        self.config = config

    # OVERRIDES role
    def role(self) -> Role:
        # This function should be a one-liner.
        # Return the role of your agent (Role.LEADER or Role.FOLLOWER).
        raise NotImplementedError("Implement this...")

    # OVERRIDES choose_action
    def choose_action(self, game_state: GameState, action_mask=None) -> Action:
        # Choose an action based on the current game state.
        # Game state is defined here:
        # https://github.com/lil-lab/cb2/blob/main/src/cb2game/pyclient/game_endpoint.py#L88

        # Agent creation tutorial here:
        # https://github.com/lil-lab/cb2/wiki/Cb2-Agents
        raise NotImplementedError("Implement this...")

Let's go ahead and implement the first method, role. This method should be a one-liner. We're creating a follower agent. So we return Role.FOLLOWER.

    def role(self) -> Role:
        return Role.FOLLOWER

Step 2: Implement the choose_action method

The choose_action method is the main method of the agent. It takes a GameState object and returns an Action object. The GameState dataclass contains all the information about the current state of the game. GameState has a number of convenience functions attached to it, but the core members are:

@dataclass
class GameState(DataClassJSONMixin):
    """Represents the state of the game at a given time. Unpacks to a tuple for compatibility with the old API."""

    map_update: MapUpdate # The game map.
    props: List[Prop] # The props in the game.
    turn_state: TurnState # Moves left, current role, score, turns left, etc...
    instructions: List[ObjectiveMessage] # List of follower instructions.
    actors: List[Actor] # List of actors in the game.
    live_feedback: List[LiveFeedback] = None # List of live feedback messages since the last call to step().

    ...

By the way, the props are actually just cards. We initially planned to support more kinds of props, but for now it's just cards, and the datastructure is a bit too generic for its own good. You can convert a prop to a card with Card.FromProp(). That will give you a more convenient interface.

Anyways, for the SimpleFollower implementation, everything the agent needs to do is in the instructions. Let's implement some helper functions. Since these are local to the file and are merely on the side for organization, we start the method names with an underscore to prevent other modules from importing them.

First, to convert an instruction from text into an Action that we can pass into the game:

def _actions_from_instruction(instruction):
    actions = []
    instruction_action_codes = instruction.split(",")
    for action_code in instruction_action_codes:
        action_code = action_code.strip().lower()
        if len(action_code) == 0:
            continue
        if "forward".startswith(action_code):
            actions.append(Action.Forwards())
        elif "backward".startswith(action_code):
            actions.append(Action.Backwards())
        elif "left".startswith(action_code):
            actions.append(Action.Left())
        elif "right".startswith(action_code):
            actions.append(Action.Right())
        elif "random".startswith(action_code):
            actions.append(Action.RandomMovementAction())
    return actions

This function allows you to type in any prefix of the action word, and it'll still match. So "f", "for", and "forw" all match "forward". Since the leader is possibly a human, this is nice and convenient for them.

Then, a helper function to check which instruction is currently active. Instructions are sorted in the order that they were received, so the active instruction is the first one which hasn't been completed or cancelled:

def _get_active_instruction(instructions):
    for instruction in instructions:
        if not instruction.completed and not instruction.cancelled:
            return instruction
    return None

And now we're ready to start implementing choose_action()!

One more catch... The game only gives the follower 10 moves per turn. It's possible that we receive an instruction with more than 10 moves. What to do? Save the extra moves for the next turn. Thus, our implementation processes each instruction, places the moves in a queue, and then pops off the queue as we go. Once the queue is empty, we mark the instruction as done and move on to the next instruction. The queue is stored in member variable self.actions.

Here's the full implementation of choose_action():

    def choose_action(self, game_state: GameState, action_mask=None) -> Action:
        """Chooses an action to take, given a game state.

        This uses a very simple language to communicate with the leader. The leader specifies actions in an instruction like:

        instruction: "forward, left, left, random, right, backwards".

        This corresponds with simple follower actions, which the follower will then immediately take. "Random" results in a random action, from [left, forward, right, back].
        """
        (map, cards, turn_state, instructions, actors, feedback) = game_state
        # If no pending actions, parse them from the active instruction.
        if len(self.actions) == 0:
            active_instruction = _get_active_instruction(instructions)
            if active_instruction is None:
                logger.info(
                    f"No active instruction available. Invalid state. Taking NoopAction."
                )
                return Action.NoopAction()
            self.actions.extend(_actions_from_instruction(active_instruction.text))
            self.actions.append(Action.InstructionDone(active_instruction.uuid))
            self.instructions_processed.add(active_instruction.uuid)

        # Check actions again, in case none were parsed from the instruction.
        if len(self.actions) == 0:
            logger.info(
                f"Ran out of commands to follow. Choosing {self.config.default_action}."
            )
            default_action_code = Action.ActionCode.from_str(self.config.default_action)
            if default_action_code == Action.ActionCode.INSTRUCTION_DONE:
                return Action.InstructionDone(active_instruction.uuid)
            return Action(default_action_code)

        # Return the next action.
        action = self.actions[0]
        self.actions.pop(0)
        return action

Evaluating your agent.

Agents plug in seemlessly with our Evaluation framework. Given your new config file, my_agent.yaml, you can run the following command to evaluate your agent:

# Evaluate my_agent.yaml against 10 instructions from the database pointed to by default.yaml.
python3 -m cb2game.eval.run_eval --agent_config=my_agent.yaml --server_config=default.yaml --limit=10

Command line parameter --limit=N limits eval to N instructions. This is useful for quick testing.

When evaluating the simple agent against the released dataset, about 40-50 instructions pass evaluation. This is because there are no instructions of the form that SimpleFollower understands, so it always just marks the instruction as done immediately. For about 40-50 instructions in the dataset, this is the right thing to do. For docs on how to use eval with a downloaded dataset, see the CB2 Database wiki page.

Using your agent

Agents can be used directly from the command line. In each of the examples below, we use a SimpleFollower bot. You can change this by specifying a different agent config file.

Connect to a remote game.

Use the command line:

python3 -m cb2game.agents.remote_agent "https://cb2.ai" my_agent.yaml

Now, navigate to the URL https://cb2.ai/play?lobby_name=bot-sandbox. If you hit "Play Game", you should be paired up with your bot. Note that if someone else is doing this simultaneously, you might get mixed up and pair with their bot.

You can also launch a local server, in which case the command is:

python3 -m cb2game.agents.remote_agent "http://localhost:8080"

See the project Readme for more information on how to launch a local server.

By the way, all bots default to the "bot-sandbox" lobby, which is a lobby that only allows bots. If you want to play against a human, you'll need to specify a different lobby. Please do NOT use our server for this purpose, unless it's in the bot-sandbox lobby. Contact the CB2 research team if you're interested in deploying a model on our website.

Local game between two agents.

If you have two agents, you can run a local game between them. This is useful for training a model against itself. You can quickly demo this by creating SimpleLeader and SimpleFollower config files:

# Create server config.
python3 -m cb2game.server.generate_config --all_defaults

# Create agent configs.
python3 -m cb2game.agents.generate_config cb2game.agents.simple_leader
python3 -m cb2game.agents.generate_config cb2game.agents.simple_follower

# Run a local game.
python3 -m cb2game.agents.local_agent_pair --num_games=10 --config_filepath="default.yaml" --leader_config="simple_leader.yaml" --follower_config="simple_follower.yaml"