Skip to content

Commit

Permalink
add script docs, update main readme
Browse files Browse the repository at this point in the history
Signed-off-by: Bryce Ferenczi <[email protected]>
  • Loading branch information
5had3z committed May 31, 2024
1 parent 0085d3e commit 0f8321e
Show file tree
Hide file tree
Showing 10 changed files with 118 additions and 64 deletions.
1 change: 1 addition & 0 deletions .github/workflows/readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ jobs:
sudo apt-get install doxygen sphinx-doc cmake pipx python3-sphinx-rtd-theme python3-breathe libboost-iostreams1.74-dev libtbb-dev
pipx install black git+https://github.com/5had3z/pybind11-stubgen.git
pip3 install torch --index-url https://download.pytorch.org/whl/cpu --break-system-packages
pip3 install typer matplotlib opencv-python-headless --break-system-packages
- name: Checkout repo
uses: actions/checkout@v4
- name: Build main library for python bindings
Expand Down
78 changes: 27 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,99 +18,75 @@ See [https://5had3z.github.io/sc2-serializer/index.html](https://5had3z.github.i

### General

To use the SCII replay observer for converting replays, you will need to initialize the 3rdparty submodule(s).

```
To use the StarCraftII replay observer for converting replays, you will need to initialize the 3rdparty submodule(s).
```bash
git submodule update --init --recursive
```

To build if you use vscode you should be able to just use the cmake extension Otherwise the CLI should be the following.

```
cmake -B build
cmake --build build --target ALL_BUILD
```

### Python Dependencies

This library relies on a python script to find the dataVersion for launching replays. This python script uses [mpyq](https://github.com/eagleflo/mpyq), which can be installed with

```
pip install mpyq
This the observer also relies on a python script to query game version information when launching replays. This python script uses [mpyq](https://github.com/eagleflo/mpyq), which can be installed with
```bash
pip3 install mpyq
```

To select a target python instance when compiling for linux during cmake configure step, you can set `-DPython3_EXECUTABLE=/usr/bin/python3.10`. If using VSCode this can also be achieved in the settings.json file by adding the below:

If you need to manually select a target python instance when compiling for linux during cmake configure step, you can set `-DPython3_EXECUTABLE=/usr/bin/python3.10`. If using vscode this can also be achieved by adding the below in `.vscode/settings.json`:
```json
"cmake.configureSettings": {
"Python3_EXECUTABLE": "/usr/bin/python3.10"
}
```


### Linux

This requires >=gcc-13 since some c++23 features are used.
If using ubuntu 18.04 or higher, you can get this via the test toolchain ppa on ubuntu. To update cmake to latest and greatest, follow the instructions [here](https://apt.kitware.com/).
Compilation requires >=gcc-13 since some c++23 features are used. If using ubuntu 18.04 or higher, you can get this via the test toolchain ppa on ubuntu shown below. To update cmake to latest and greatest, follow the instructions [here](https://apt.kitware.com/).

```
```bash
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update
sudo apt install gcc-13 g++-13
```

To use the SCII replay observer, you will need to initialize the 3rdparty submodule(s).

```
git submodule update --init --recursive
```

To build if you use vscode you should be able to just use the cmake extension and select gcc-13 toolchain. Otherwise the CLI should be the following.

```
```bash
CC=/usr/bin/gcc-13 CXX=/usr/bin/g++-13 cmake -B build -GNinja
cmake --build build
```

### Windows

Only tested with Visual Studio 2022 Version 17.7.6, _MSC_VER=1936. Downloading TBB from intel is also required, [link here](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#onetbb) (Tested with 2021.11.0).
Only tested with Visual Studio 2022 Version 17.7.6, _MSC_VER=1936. Downloading and installing TBB from intel is also required, [link here](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#onetbb) (Tested with 2021.11.0). You will need to make sure `PYTHONHOME` is set correctly. This package will compile both boost and zlib for you. If you already have these somewhere else and want to save the compilation time, you'll have to modify CMakeLists.txt yourself. Otherwise you should be able to compile this library with standard cmake commands or the vscode extension.

I would not recommend Python 3.12, [dm-tree will not compile](https://github.com/google-deepmind/tree/issues/109)

You will need to make sure PYTHONHOME is set correctly.
```shell
cmake -B build
cmake --build build
```

This package will compile both boost and zlib for you. If you already have these somewhere else and want to save the compilation time, you'll have to modify CMakeLists.txt yourself.

## Building Python Bindigs
## Building Python Bindings

Currently requires >=gcc-11 and should be relatively simple to install since CMake Package Manager deals with C++ dependencies. The library bindings are called `sc2_serializer` as that's mainly what it does.
```sh
pip3 install .
Currently requires >=gcc-11 and should be relatively simple to install since CMake Package Manager deals with C++ dependencies, however you will have to install `libboost-iostreams-dev` when building for linux. The library bindings module is called `sc2_serializer` and includes a few extra dataset sampling utilities and an example PyTorch dataloader for outcome prediction.
```bash
sudo apt install libboost-iostreams-dev
pip3 install git+https://github.com/5had3z/sc2-serializer.git
```

If you install in editable mode, you won't get the auto-gen stubs, you can add this manually (you need to install my fork from pyproject.toml)
```sh
If you clone this repo and install with editable mode, you won't get the auto-gen stubs, you can add this manually (you need to install my fork from pyproject.toml)
```bash
pip3 install -e .
pybind11-stubgen _sc2_serializer --module-path build/_sc2_serializer.cpython-310-x86_64-linux-gnu.so -o src/sc2_serializer
```

It is also faster to iterate while developing by installing in editable mode, removing pip's compiled version `src/sc2_serializer/_sc2_serializer.cpython-310-x86_64-linux-gnu.so` and symbolically linking to `build/_sc2_serializer.cpython-310-x86_64-linux-gnu.so` instead for incremental builds. You will have manually update the stub with the previously mentioned script if api changes are made.
It is faster to iterate while developing by installing in editable mode, removing pip's compiled version `src/sc2_serializer/_sc2_serializer.cpython-310-x86_64-linux-gnu.so` and symbolically linking to `build/_sc2_serializer.cpython-310-x86_64-linux-gnu.so` instead for incremental builds. You will have manually update the stub with the previously mentioned command if API changes are made.

## Generating SQL Database
## Development Dependencies

To generate meta-data for all the replays, we require additional dependencies, which can be installed with:
The [scripts](./scripts/) folder contains a bunch of utilities used for developing and creating a StarCraftII Dataset. Additional dependencies required by these can be installed with the `dev` option when installing the main library (or you can peek [pyproject.toml](pyproject.toml) and install them manually).

```bash
pip install sc2-replay-parser[database]
pip3 install -e .[dev]
```

Set the environment variable "DATAPATH" to the directory containing "*.SC2Replays" files.

Run `python gen_database.py --workspace <OUTPUT_DIR> --workers=8`

- <OUTPUT_DIR> is the output directory.
- --workers sets the number of dataloader workers.


## Git hooks

The CI will run several checks on the new code pushed to the repository. These checks can also be run locally without waiting for the CI by following the steps below:
Expand Down
7 changes: 5 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,12 @@
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import sys
import subprocess
from pathlib import Path

sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))


# Doxygen
subprocess.call("doxygen Doxyfile", shell=True)
Expand Down
2 changes: 1 addition & 1 deletion docs/cpp_api/data_structures.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _api_data:
.. _api_data_structures:

Starcraft II Data Structures
============================
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,6 @@ Table of Contents
self
replay_data
dataloading
scripts
benchmark
cpp_api/index
12 changes: 5 additions & 7 deletions docs/replay_data.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,29 @@
.. _api_data:

Replay Data
===========

Pre-Converted Tournament Data
-----------------------------

Pre-serialized tournament data is available to `download <https://bridges.monash.edu/articles/dataset/Tournament_Starcraft_II/25865566>`_. Each replay database file is named after the tournament it was gathered from. The associated SQLite database (gamedata.db) which contains metadata on each of the replays is also included in this pack. This data was gathered with the ``Action`` converter, i.e. replay observations are only recorded on player actions, which is the same method as `AlphaStar-Unplugged <https://github.com/google-deepmind/alphastar/blob/main/alphastar/unplugged/data/README.md>`_. Another tournament dataset created (and not currently posted online) used ``Strided+Action`` with a stride of ~1sec (IIRC?). This increased the overall size of the dataset to ~90GB, rather than ``Action`` only ~55.5GB.
Pre-serialized tournament data is available to `download <https://bridges.monash.edu/articles/dataset/Tournament_Starcraft_II/25865566>`_. Each replay database file is named after the tournament it was gathered from. The associated SQLite database (gamedata.db) which contains metadata on each of the replays is also included in this pack. This data was gathered with the ``Action`` converter, i.e. replay observations are only recorded on player actions, which is the same method as `AlphaStar-Unplugged <https://github.com/google-deepmind/alphastar/blob/main/alphastar/unplugged/data/README.md>`_. Another tournament dataset created (and not currently posted online) used ``Strided+Action`` with a stride of ~1sec (IIRC?). This increased the overall size of the dataset to ~83GB, rather than ``Action`` only ~55.5GB.


Downloading Replay Data
Downloading Replay Data
-----------------------

Blizzard Replay Packs
^^^^^^^^^^^^^^^^^^^^^

Blizzard have replay packs available grouped by game version played on. These replay packs can be downloaded using their `download_replays.py` script which can be found `here <https://github.com/Blizzard/s2client-proto/tree/master/samples/replay-api>`_. The game version associated with each replay pack to actually run the replays can also be downloaded from `here <https://github.com/Blizzard/s2client-proto#downloads>`_.

Tournament Replay Packs
Tournament Replay Packs
^^^^^^^^^^^^^^^^^^^^^^^

Tournament replay packs are gathered and provided by a `community website <https://lotv.spawningtool.com/replaypacks>`_ which is regularly updated. Unfortunately Blizzard have not released headless linux versions of StarCraft II since 4.10 (at the time of writing). The newer tournament replays must be played with the windows client. ``sc2-serializer`` is compatible with being compiled and run on windows. We include a script to launch many copies of StarCraftII using a unique port for communication to enable processing of many tournament replays in parallel on a windows machine.

A particular problem with tournament replays is that many of the game sub-versions and maps are unique and need to be downloaded. Blizzard's CLI replay client is unable to download this data automatically, hence each replay that doesn't work due to missing data needs to be individually opened by a user with the normal game client to initiate the download process. I assuming posting the client data is prohibited by some non-distribution eula, or else I would post this to save someone else many hours of this dull task. One key tip to check if data is missing and has to be downloaded is that the minimap preview in-game is displayed when the data is accounted for, and not present when missing. So skip over games if the minimap preview is there, and manually open games when it is not. After the game begins, exit the game and repeat. Then the conversion process can be run, mostly uninterrupted. There are some games when the client freezes at the same point in the replay. This usually cannot be fixed by restarting the replay, the replay is just not convertable for unknown reasons.
A particular problem with tournament replays is that many of the game sub-versions and maps are unique and need to be downloaded. Blizzard's CLI replay client is unable to download this data automatically, hence each replay that doesn't work due to missing data needs to be individually opened by a user with the normal game client to initiate the download process. I assuming posting the client data is prohibited by some non-distribution eula, or else I would post this to save someone else many hours of this dull task. One key tip to check if data is missing and has to be downloaded is that the minimap preview in-game is displayed when the data is accounted for, and not present when missing. So skip over games if the minimap preview is there, and manually open games when it is not. After the game begins, exit the game and repeat. Then the conversion process can be run, mostly uninterrupted. There are some games when the client freezes at the same point in the replay. This usually cannot be fixed by restarting the replay, the replay is just not convertible for unknown reasons.


Converting Replays
Converting Replays
------------------

Once you have acquired replays to serialize, ``sc2_converter`` is used to re-run the replays and record the observations to a new ``.SC2Replays`` file. ``sc2_converter`` includes a ``-h/--help`` flag to print out all the current options available for the conversion process. An example of running this program is below.
Expand Down
67 changes: 67 additions & 0 deletions docs/scripts.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
.. _scripts:

Scripts
=======

Various helper scripts are included in the complete source code but not included in the python library, since they mostly deal with the dataset generation, rather than dataloading.

find_all_versions
-----------------

.. autofunction:: find_all_versions.compare_replays_and_game

.. autofunction:: find_all_versions.write_replay_versions


gen_info_header
---------------

Generates generated_info.hpp


gen_info_yaml
-------------

Generates game_info.yaml


inspect_replay
--------------

.. autofunction:: inspect_replay.inspect

.. autofunction:: inspect_replay.count


make_partitions
---------------

.. autofunction:: make_partitions.main


merge_info_yaml
---------------

.. autofunction:: merge_info_yaml.main


replay_parallel
---------------

.. autofunction:: replay_parallel.main


replay_sql
----------

.. autofunction:: replay_sql.create

.. autofunction:: replay_sql.create_individual

.. autofunction:: replay_sql.merge


review_resources
----------------

.. autofunction:: review_resources.main
9 changes: 8 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,14 @@ classifiers = [
dependencies = ["numpy"]

[project.optional-dependencies]
database = ["torch>=1.11.0", "typer>=0.9.0"]
dev = [
"torch>=1.11.0",
"typer>=0.9.0",
"pysc2",
"matplotlib",
"opencv-python",
"pyyaml",
]

[project.urls]
"Homepage" = "https://github.com/5had3z/sc2-serializer"
Expand Down
2 changes: 1 addition & 1 deletion scripts/replay_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def get_target_folders(base_folder: Path, replay_list: Path | None):


@app.command()
def run(
def main(
converter: Annotated[Path, typer.Option()],
outfolder: Annotated[Path, typer.Option()],
replays: Annotated[Path, typer.Option()],
Expand Down
3 changes: 2 additions & 1 deletion scripts/review_resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,8 @@ def main(
file: Annotated[Path, typer.Option(help=".txt or .SC2Replays file to read")],
idx: Annotated[int, typer.Option(help="Index to sample from .SC2Replays")] = 0,
):
""""""
"""Plot how the resources are changing over time. Good to check our modifications
with visibility and default values are working as expected"""
if not file.exists():
raise FileNotFoundError(file)

Expand Down

0 comments on commit 0f8321e

Please sign in to comment.