-
Notifications
You must be signed in to change notification settings - Fork 3
Cb2 Agents
- Agent Creation Tutorial.
- Evaluating your agent.
- Using your agent
- Using a built-in agent.
- Creating a built-in agent.
In this section, we step through the implementation of SimpleFollower. A SimpleFollower is a follower agent that waits for instructions which are comma-separated lists of directions. For example, the following instruction:
left, right, forward, backward, left, right, forward, forward, left, left, forward, ...
In other words, it's a command that spoon-feeds the follower exactly which keypresses to make. This is a very simple agent, but it's a good starting point for understanding how to create an agent. It's also surprisingly useful for unit testing and debugging.
Create a new file, my_agent.py. Let's start by filling in the boilerplate:
import cb2.agents.agent
class MyAgent(agent.Agent):
"""A demonstration follower bot."""
def choose_action(self, game_state: GameState, action_mask=None) -> Action:
"""Chooses the next action to take, given a game state."""
raise NotImplementedError()
def role(self) -> Role:
"""Returns the role of the agent."""
return Role.FOLLOWER
Note that we've actually already implemented the role
method for you. The
role
method returns the role of the agent. In this case, we're creating a
follower agent, so we return Role.FOLLOWER
. If we were creating a leader
agent, we would return Role.LEADER
.
The choose_action
method is the main method of the agent. It takes a
GameState
object and returns an Action
object. The GameState
dataclass
contains all the information about the current state of the game. GameState has
a number of convenience functions attached to it, but the core members are:
@dataclass
class GameState(DataClassJSONMixin):
"""Represents the state of the game at a given time. Unpacks to a tuple for compatibility with the old API."""
map_update: MapUpdate # The game map.
props: List[Prop] # The props in the game.
turn_state: TurnState # Moves left, current role, score, turns left, etc...
instructions: List[ObjectiveMessage] # List of follower instructions.
actors: List[Actor] # List of actors in the game.
live_feedback: List[LiveFeedback] = None # List of live feedback messages since the last call to step().
...
By the way, the props are actually just cards. We initially planned to support
more kinds of props, but for now it's just cards, and the datastructure is a bit
too generic for its own good. You can convert a prop to a card with
Card.FromProp()
. That will give you a more convenient interface.
Anyways, for the SimpleFollower implementation, everything the agent needs to do is in the instructions. Let's implement some helper functions. Since these are local to the file and are merely on the side for organization, we start the method names with an underscore to prevent other modules from importing them.
First, to convert an instruction from text into an Action that we can pass into the game:
def _actions_from_instruction(instruction):
actions = []
instruction_action_codes = instruction.split(",")
for action_code in instruction_action_codes:
action_code = action_code.strip().lower()
if len(action_code) == 0:
continue
if "forward".startswith(action_code):
actions.append(Action.Forwards())
elif "backward".startswith(action_code):
actions.append(Action.Backwards())
elif "left".startswith(action_code):
actions.append(Action.Left())
elif "right".startswith(action_code):
actions.append(Action.Right())
elif "random".startswith(action_code):
actions.append(Action.RandomMovementAction())
return actions
This function allows you to type in any prefix of the action word, and it'll still match. So "f", "for", and "forw" all match "forward". Since the leader is possibly a human, this is nice and convenient for them.
Then, a helper function to check which instruction is currently active. Instructions are sorted in the order that they were received, so the active instruction is the first one which hasn't been completed or cancelled:
def _get_active_instruction(instructions):
for instruction in instructions:
if not instruction.completed and not instruction.cancelled:
return instruction
return None
And now we're ready to start implementing choose_action()!
One more catch... The game only gives the follower 10 moves per turn. It's
possible that we receive an instruction with more than 10 moves. What to do?
Save the extra moves for the next turn. Thus, our implementation processes each
instruction, places the moves in a queue, and then pops off the queue as we go.
Once the queue is empty, we mark the instruction as done and move on to the next
instruction. The queue is stored in member variable self.actions
.
Here's the full implementation of choose_action():
def choose_action(self, game_state: GameState, action_mask=None) -> Action:
"""Chooses an action to take, given a game state.
This uses a very simple language to communicate with the leader. The leader specifies actions in an instruction like:
instruction: "forward, left, left, random, right, backwards".
This corresponds with simple follower actions, which the follower will then immediately take. "Random" results in a random action, from [left, forward, right, back].
"""
(map, cards, turn_state, instructions, actors, feedback) = game_state
# If no pending actions, parse them from the active instruction.
if len(self.actions) == 0:
active_instruction = _get_active_instruction(instructions)
if active_instruction is None:
logger.info(
f"No active instruction available. Invalid state. Taking NoopAction."
)
return Action.NoopAction()
self.actions.extend(_actions_from_instruction(active_instruction.text))
self.actions.append(Action.InstructionDone(active_instruction.uuid))
self.instructions_processed.add(active_instruction.uuid)
# Check actions again, in case none were parsed from the instruction.
if len(self.actions) == 0:
logger.info(
f"Ran out of commands to follow. Choosing {self.config.default_action}."
)
default_action_code = Action.ActionCode.from_str(self.config.default_action)
if default_action_code == Action.ActionCode.INSTRUCTION_DONE:
return Action.InstructionDone(active_instruction.uuid)
return Action(default_action_code)
# Return the next action.
action = self.actions[0]
self.actions.pop(0)
return action
Agents plug in seemlessly with our Evaluation framework. Our evaluation script is implemented for built-in agents in eval/run_eval.py. To run it on a custom agent you've created, you need to create a new eval script. But this is very easy.
You just need to import RunEval
from eval.run_eval
and pass in your instantiated agent, along with a few other parameters. Here's an example, which you can copy and paste into a file called eval_my_agent.py
:
from cb2.eval.run_eval import RunEval, InitPythonLogging
import fire
def main(
output_prefix: str = "eval_",
server_config: str,
limit: int=-1):
InitPythonLogging()
agent = ... # Instantiate your agent here!
RunEval(agent, output_prefix, server_config, limit)
if __name__ == "__main__":
fire.Fire(main)
You'll need to fill out the ...
with custom code to instantiate your agent. For built-in agents, we use the CreateAgent
helper function in agents/config.py, which takes in a config file and returns an instantiated agent. You can create your own configuration file.
In this example, we used Python Fire for easily creating a command line interface. This lets you run the script like this:
python -m eval_my_agent --agent_config=agent_config.json --server_config=server_config.json
Where command line arguments directly translate to function parameters of main()
. To test it out, you might want to run a quick test with limit = 1, so that it only runs one game. This will let you make sure that your agent is working as expected.
When evaluating the simple agent against the released dataset, about 40-50 instructions pass evaluation. This is because there are no instructions of the form that SimpleFollower
understands, so it always just marks the instruction as done immediately. For about 40-50 instructions in the dataset, this is the right thing to do.
Once an agent is defined, you can easily integrate it with provided CB2 code, to play either locally against other agents, or remotely against other players (agent or human) on a hosted CB2 server.
As mentioned previously, built-in agents use the CreateAgent
factory function in agents/config.py, which takes in a config file and returns an instantiated agent. By doing this, it is easy to write command line tools that can operate on different agent types generically. One way to develop an agent is to write the code inside of the CB2 repository, and to integrate it with the factory function in agents/config.py
. Then, you can use the command line tools to test your agent instead of writing custom scripts as shown in this section. See the "Built-in agents" section for more details. If you develop an agent this way, consider submitting a pull request to have your code committed to the CB2 repository for others to use. We just ask that you write a unit test for your agent in the agents/test_local_agent_pair.py
file. Even just pairing it with SimpleLeader or SimpleFollower (likely to be mostly useless, but at least proves your bot doesn't crash on startup) is better than nothing.
You can have two agents play each other locally by using the PlayNGames
function in cb2.agents.local_game
. This function takes in two instantiated
agents (one leader, one follower), and plays multiple games between them. It
then returns the resulting scores and game durations. All games are logged to
the local database. For accessing the data, see the CB2 Database
Doc.
Here's an example:
import cb2.server.db_utils as db_utils
from cb2.agents.local_agent_pair import PlayNGames
def main(
config_filepath="server/config/local-covers-config.yaml",
num_games=10,
log_to_db: bool = False,
slow_games: bool = False, # Whether to slow down the game for debugging.
):
logging.basicConfig(level=logging.INFO)
# Instantiate your pair of agents here.
leader_agent = ...
follower_agent = ...
scores, durations = PlayNGames(config_filepath, leader_agent, follower_agent, num_games, log_to_db, slow_games)
To make your agent play remotely against other agents or humans, you can use the
PlayRemoteGame
function in cb2.agents.remote_game
. This function takes in an
instantiated agent, and connects to a remote server, and plays a game. Here's an
example:
def main(
host: str, # Hostname of the server to connect to.
render: bool = False, # Whether to show the pygame 2D game visualization.
lobby: str = "bot-sandbox",
pause_per_turn: float = 0, # seconds.
agent_config_filepath: str = "agents/simple_follower.yaml",
):
"""Connects to a remote server from the command line and plays a game using the specified agent."""
agent = ... # Instantiate your agent here!
PlayRemoteGame(
host,
agent,
render,
lobby,
pause_per_turn,
)
if __name__ == "__main__":
fire.Fire(main)
Built-in agents can be used directly from the command line. In each of the examples below, we use a SimpleFollower
bot. You can change this by specifying a different agent config file.
Built-in agents are created with a configuration file. These are provided in agents/*.yaml. You can run eval on a built-in agent by using the eval/run_eval.py
command-line script. Here's an example:
python3 -m eval.run_eval --agent_config=agents/simple_follower.yaml --server_config=/path/to/cb2-data-base/config/human_human.yaml
This might take a while. If you want to test that it works, you can run eval on
a subset of instructions -- Use command line parameter --limit=N
to limit eval
to N instructions.
Note that if using the provided dataset, you'll need to modify the included config files to point to the correct data directory on your drive. This means changing a single line in the config file, e.g.:
"data_prefix": "/absolute/path/to/cb2-data-base/human_human",
Note that it's the directory containing the database, not the path to the database itself.
Use the command line:
python3 -m agents.remote_agent "https://cb2.ai"
Now, navigate to the URL https://cb2.ai/play?lobby_name=bot-sandbox
. If you
hit "Play Game", you should be paired up with your bot. Note that if someone
else is doing this simultaneously, you might get mixed up and pair with their
bot.
You can also launch a local server, in which case the command is:
python3 -m agents.remote_agent "http://localhost:8080"
By the way, all bots default to the "bot-sandbox" lobby, which is a lobby that only allows bots. If you want to play against a human, you'll need to specify a different lobby. Please do NOT use our server for this purpose, unless it's in the bot-sandbox lobby. Contact the CB2 research team if you're interested in deploying a model on our website.
If you develop an agent inside of the CB2 repository, then you can integrate it with the built-in agent factory methods.
First, create an entry for your bot in agents.config.AgentType
:
class AgentType(Enum):
NONE = 0
# Follower used for CB2 pilot study.
PILOT_FOLLOWER = 1
# Experimental follower that uses a text-only interface to OpenAI's GPT API.
GPT_FOLLOWER = 2
# Simple follower/leader for unit testing and debugging.
SIMPLE_FOLLOWER = 3
SIMPLE_LEADER = 4
# Descriptive text about what your bot does goes here.
YOUR_BOT_NAME_HERE = 5
def __str__(self):
return self.name
@staticmethod
def from_str(s: str):
return AgentType[s]
Make sure to leave a comment explaining what your agent does. It's worth it to spend a few minutes to write a good description and to choose a good name, because others will not have the context you do.
Next, add an entry to the CreateAgent
function in agents.config
:
def CreateAgent(config: AgentConfig) -> Agent:
agent_type = AgentType.from_str(config.agent_type)
if agent_type == AgentType.NONE:
return None
...
elif agent_type == AgentType.YOUR_BOT_NAME_HERE:
return CallToYourBotConstructorHere(...)
Your bot constructor can optionally take in a JSON-serializable dataclass config. We define this using the python library Mashumaro, like this:
@dataclass
class GPTFollowerConfig(DataClassJSONMixin):
"""Configuration for initializing a GPTFollower agent.
For help choosing a value for `model`, see:
https://platform.openai.com/docs/models/overview
To get an API key, see:
https://platform.openai.com/account/api-keys
"""
gpt_api_key: str
queueing_enabled: bool = False
model: str = "gpt-3.5-turbo"
maximum_tokens: int = (
3900 # Model maximum of 4097. Completion consumes some tokens too though.
)
You can then add this config to the Agent configuration file (agents/config.py):
@dataclass
class AgentConfig(DataClassJSONMixin):
name: str
comment: str
# agent_type must be one of the values in enum AgentType.
agent_type: str
gpt_follower_config: Optional[GPTFollowerConfig] = None
"""Configuration for initializing a GPTFollower agent."""
simple_follower_config: Optional[SimpleFollowerConfig] = None
"""Configuration for initializing a SimpleFollower agent."""
# <add your agent-specific-config here, making it optional like the above examples>
Now that you've done this, you can use your agent with the built-in command line tools by passing in a YAML or JSON file with the same structure as the main agent config definition. We recommend using YAML, as it's a superset of JSON that includes comments, so you can just write JSON and add helpful python-style comments. If you're submitting a pull request to merge your agent, please submit an example config file, like agents/gpt_follower_example.yaml or agents/simple_follower.yaml