-
Notifications
You must be signed in to change notification settings - Fork 3
CB2 Database
- CB2 Database location
- Downloading CB2 data from the server.
- Manually browsing the data.
- Writing Software to access the database.
CB2 uses Sqlite as a database backend because it's fast, optimized, and commonly used in other open source software. As an interface to the backend, we use Peewee as the ORM. This is a high-level interface for querying the database in Python.
By default, the CB2 database is located in a directory named cb2-game-dev
under
the user app directory returned by python library
appdirs
. You can also specify a custom
location for the database by modifying the data_prefix
field in the config
When starting a CB2 server, it prints out the location of the database. For example, the Database path
line here:
❯ python3 -m cb2game.server.main --config_filepath="server/config/local-covers-config.yaml"
[2023-06-30 12:08:36,951] root INFO [main:main:1391] Config file parsed.
[2023-06-30 12:08:36,952] root INFO [main:main:1392] data prefix:
[2023-06-30 12:08:36,952] root INFO [main:main:1393] Log directory: /Users/username/Library/Application Support/cb2-game-dev/game_records
[2023-06-30 12:08:36,952] root INFO [main:main:1394] Assets directory: /Users/username/Library/Application Support/cb2-game-dev/assets
[2023-06-30 12:08:36,952] root INFO [main:main:1395] Database path: /Users/username/Library/Application Support/cb2-game-dev/game_data.db
You can also find the location of the database by invoking the provided db_location
script:
❯ python3 -m cb2game.server.db_location --config_filepath="server/config/local-covers-config.yaml"
Database path: /Users/username/Library/Application Support/cb2-game-dev/game_data.db
The CB2 server provides a password-protected interface to download all server
data. Simply navigate to http://hostname:ip/data/download
and enter the
password. This will download a zip file containing the database and all other
server data. You can then pair this with the appropriate config file to use with
CB2's provided database utilities. To download a server's configuration file,
see http://hostname:ip/data/config
-- however you'll need to make a change to
the config file before using it locally (see next section) so that it points to the local database.
Most CB2 utilities work with config files instead of taking a filepath to the database directly. The config file contains the path to the database, as well as other server configuration. To create a config file for downloaded data, you can start with:
❯ python3 -m cb2game.server.generate_config --all_defaults
This will create a config file named default.yaml in the current directory. You then want to modify the line:
data_prefix: ''
To instead point to the directory containing the database. For example, if you
downloaded the database to /Users/username/Downloads/cb2-data/game_data.db
, you would modify
the line to:
data_prefix: '/Users/username/Downloads/cb2-data'
If you didn't save the database as game_data.db
, you should also modify the database_path_suffix
field to point to the correct file. For example, if you saved the database as mydb.db
, you would modify the line to:
database_path_suffix: 'mydb.db'
Though having the original config can be desirable to creating your own. For
example, when running Eval, using the original server config can be useful for
reproducibility of eval results. If you do download the config file from the
server, you'll need to modify data_prefix
to point to the local database.
There's a few ways to inspect the released data.
The best way to view the database is to use the /view/games URL endpoint on the
original server instance. You can also always create your own config file that
points to the downloaded DB and launch a server instance locally with that. See
section titled Creating a config file for downloaded data
for instructions on
creating a config file.
If you just want to browse the data, we highly recommend using Sqlite DB
Browser. This can be used to view the database
directly and makes it simple to manually peruse the records. The experience is
much better than reading a raw JSON file. First, choose a game in the Game
table, then go to the Event
table and filter all records by the game ID.
CB2 doesn't use JSON by default for a number of reasons:
- Sqlite DB file takes up far less space than JSON
- Sqlite is much faster than JSON
- Sqlite is more flexible than JSON (e.g. you can query it efficiently)
- Sqlite integrates with CB2 better.
If you still want to use JSON, we released our dataset in both sqlite and JSON
(in the same release). If you're using CB2 to collect your own dataset, you can
easily convert from our DB format to JSON using the db_to_json.py
script:
python3 -m cb2game.server.db_tools.db_to_json path/to/cb2-data-base/human_human/game_data.db OUTPUT.json --pretty=True
You can optionally enable pretty printing (--pretty=True
)
Since the JSON format takes up so much space, you might want to filter games before exporting to JSON, via:
python3 -m cb2game.server.db_tools.filter_db_games path/to/cb2-data-base/human_human/game path/to/game_ids_to_keep.txt
Where game_ids_to_keep
is a text file that contains a comma-separated list of
game IDs to keep in the database. Note that this makes destructive changes to
the Sqlite database, so you should make a copy of the database before running
this script.
For more on the JSON format, see src/cb2game/server/db_tools/db_to_json.py
. The JSON
format is nearly identical to the Sqlite schema, which is described below.
In order to make use of the CB2 database, you need to connect to it. First, you need the filepath to the database. Then, connect to it with:
from cb2game.server.schemas import base
base.SetDatabase(config)
base.ConnectDatabase()
The database schema is documented in depth in src/cb2game/server/schemas/game.py
and src/cb2game/server/schemas/events.py
.
In particular, Each Game
is recorded as a series of Event
s. The Event schema looks like:
class Event(BaseModel):
"""Game event record.
In CB2, games are recorded as lists of events. Each event is a single
atomic change to the game state. Events are stored in an sqlite database.
This class is used to store events in the database. Events are generated by
the game_recorder.py class. See server/game_recorder.py for that.
"""
# A UUID unique identifying this event. Unique across database.
id = UUIDField(primary_key=True, default=uuid.uuid4, unique=True)
# Pointer to the game this event occurred in
game = ForeignKeyField(Game, backref="events")
# Event type. See EventType enum above for the meaning of each value.
type = IntegerField(default=EventType.NONE)
# The current turn number. Each turn consists of a leader portion and a follower portion.
turn_number = IntegerField(null=True)
# A monotonically increasing integer. Events which happen considered
# simultaneous in-game are given the same tick. For example, a character
# stepping on a card and the card set completion event occurring.
tick = IntegerField()
# Server time when the event was creating. UTC.
server_time = DateTimeField(default=datetime.datetime.utcnow)
# Not currently populated. Local clock of the client when event occurred.
# Determined by packet transmissions time which is reported by the client.
# Nullable. UTC.
client_time = DateTimeField(null=True)
# Who triggered the event. See EventOrigin enum above.
origin = IntegerField(default=EventOrigin.NONE)
# Who's turn it is, currently.
role = TextField(default="") # 'Leader' or 'Follower'
# If an event references a previous event, it is linked here. The exact
# meaning depends on the type of the event. See EventType documentation
# above for more.
parent_event = ForeignKeyField("self", backref="children", null=True)
# A JSON-parseable string containing specific data about the event that
# occurred. For format for each Event, see EventType documentation above.
# For every event type with a data field, there's a python dataclass you can
# import and use to parse data. We use mashumaro for parsing. Example for a
# map update event:
#
# from cb2game.server.schemas.event import Event, EventType
# from cb2game.server.messages.map_update import MapUpdate
# # Some peewee query.
# map_event = Event.select().where(type=EventType.MAP_UPDATE, ...).get()
# map_update = MapUpdate.from_json(map_event.data)
#
# This gives you a CB2 MapUpdate object, defined in
# server/messages/map_update.py.
data = TextField(null=True)
# A brief/compressed or human readable representation of the event, if
# possible. Only defined for some event types, see EventType above for more
# documentation.
short_code = TextField(null=True)
# If applicable, the "location" of an event. For moves, this is the location
# *before* the action occurred. For live feedback, this is the follower
# location during the live feedback.
location = HecsCoordField(null=True)
# If applicable, the "orientation" of the agent. For moves, this is the
# location *before* the action occurred. For live feedback, this is the
# follower orientation during the live feedback.
orientation = IntegerField(null=True)
Each event's type dictates which game action it refers to.
class EventType(IntEnum):
"""Each event is tagged with a type. The type determines what the event is trying to signal.
... (comments documenting this enum redacted)
"""
NONE = 0
MAP_UPDATE = 1
INITIAL_STATE = 2
TURN_STATE = 3
START_OF_TURN = 4
PROP_UPDATE = 5
CARD_SPAWN = 6
CARD_SELECT = 7
CARD_SET = 8
...
In the source
code,
each type of this Enum is heavily documented, with instructions on how to decode
the Event.data
field for each event type. As an example, let's look at some
common message types:
In the game of CB2, an instruction can generate up to 3 events:
- INSTRUCTION_SENT: When the event is sent by the leader. It then gets loaded into the queue of instructions.
- INSTRUCTION_ACTIVATED: When an instruction reachs the front of the instruction queue, it is activated.
- INSTRUCTION_DONE/INSTRUCTION_CANCELLED: Either the instruction is marked as completed by the follower or it is cancelled by the leader. Note that this last event is optional -- if the game ends, then an event may be neither cancelled nor completed.
Here, the "parent_event" field of the Event schema is used to link the INSTRUCTION_ACTIVATED/INSTRUCTION_DONE/INSTRUCTION_CANCELLED events to the INSTRUCTION_SENT event.
Only INSTRUCTION_SENT contains data about the instruction. This data is stored
in the Event.data
field, and is a JSON-parseable string in the format of the
Objective
dataclass in src/cb2game/server/messages/objective.py
.
To decode the instruction data, you can use the Objective.from_json("...")
method.
Note that Objective
s have their own UUIDs, separate from the UUID in the Event
record. The Objective
UUID is saved in the Event.shortcode
of each
instruction-related event.
Example:
from cb2game.server.schemas.event import Event, EventType
from cb2game.server.schemas.game import Game
from cb2game.server.messages.objective import ObjectiveMessage
# Select all games.
games = Game.select()
for game in games:
instructions = Event.select().where(
Event.type == EventType.INSTRUCTION_SENT
).join(Game).where(Event.game == game)
for instruction in instructions:
decoded_message = ObjectiveMessage.from_json(instruction.data)
words.update(decoded_message.text.split(" "))
instruction_list.append(instruction.text)
Actions record movements made by the leader and the follower. Actions are marked with an event type of ACTION
The action
dataclass is relatively straightforward. It's located in
src/cb2game/server/messages/action.py
.
Like with instructions, the Action
dataclass contains a
Action.from_json("...")
method for decoding the Event.data
field. Events
containing actions also make use of the position_before and
orientation_before fields of the Event record. The displacement
field of the Action
dataclass
keeps track of the hexagonal displacement of each move. For more on how
hexagonal coordinates are handled in CB2, see the server-side implementation of
HecsCoord
in
src/cb2game/server/hex.py
. You can also learn more about the Hexagon efficient coordinate system here.
Map updates are broadcast whenever the game changes. While a normal CB2 game only ever has 1 static map, the map can be changed in custom Scenarios by an attached monitoring script.
Map updates are marked with an event type of MAP_UPDATE
. The MapUpdate
dataclass is located in src/cb2game/server/messages/map_update.py
To decode a map update, pass the Event.data field to MapUpdate.from_json("...")
. The MapUpdate class looks like:
@dataclass
class MapUpdate(DataClassJSONMixin):
rows: int
cols: int
# Tiles are flattened into a linear list.
# Here's some useful things you can do with a tile:
# tile.asset_id: Asset ID of the tile.
# tile.cell.coord: HECS coordinate of the tile.
# tile.cell.coord.to_offset_coordinates(): (x, y) coords in hex grid.
# tile.cell.boundary: Walkable boundary of the tile (edges).
# tile.cell.layer: Z-layer of the tile (water/ground/mountains).
# tile.rotation_degrees: Rotation of the tile's asset.
tiles: List[Tile]
metadata: Optional[MapMetadata] = field(default_factory=MapMetadata)
fog_start: Optional[int] = None
fog_end: Optional[int] = None
# Used in custom scenarios to tint the map.
color_tint: Color = Color(0, 0, 0, 0)
The map information is stored in a list of Tile
objects. Metadata from the map
generation algorithm is stored in metadata
, and contains a high-level
description of the overall map structure. This is used, for example, by
gpt_follower.py to create a text-only description of the map to feed to GPT-3.5
and GPT-4. This is specifically implemented in src/cb2game/pyclient/client_utils.py
, and
is very experimental at the moment.