This repo contains the code used for the 3rd place solution of the Google - Fast or Slow? Predict AI Model Runtime competition on Kaggle.
The final submission was a combination of different networks trained at different stages during the development of this repo. For completeness, all submissions have separate branches. However, the state of the repo during the development was sometimes quite messy. The codes of the submissions follow all the same pattern, but use slightly different network architectures, conventions etc. Experiments show that the main branch achieves the same or better results for the layout and is therefore recommended. Note that the Kendall's Tau score is a very noisy evaluation metric. Training the same network in the same way can lead to a difference of 0.1 in the score. One has to be careful when comparing different networks and always train them multiple times.
The main branch can only be used for the layout collection. The tile collection has its own branch and does not need any special preparation.
This package was developed and tested with Python 3.11. You might want to install pytorch
and torch_scatter
manually with pre-build wheels using the same CUDA version. Otherwise, the installation is in theory as easy as
pip install -e .
This repo depends on pytorch
which should be installed with GPU support.
The package uses the data of the competition and has two scripts in the scripts
folder.
The first one add_features.py
extracts additional features from the data and adds some derived features.
In detail, it does:
- Extract additional features from the protocol buffer files
- Add positional encodings using RWPE with the asymmetric and symmetric adjacency matrices
- Logs features with large dynamic range
- Creates new features from the 30 dimensional features in the form of
x % 128
and(x // 128) / 10
, wherex
is the original feature. This is done because of the register size of the TPU. - Add a virtual output node to the graph that connects all nodes with outputs.
It requires pointers to directories containing the protocol buffer files and the npz files. You can have a look at the
full signature with python scripts/add_features.py --help
. Note that the RWPE can take a while and use a lot of
memory for the larger graph.
The second script train.py
trains a network on the data. You can have a look at the signature with
python scripts/train.py --help
. The script requires a path to a directory containing the npz files generated
with add_features.py
. After every epoch, validation and test set are evaluated and saved along with the
model. If there are multiple GPUs available, one can specify the number of GPUs to use. Note that the backend and
master port are hard-coded in the script and might need to be changed.
Pre-commit hooks come in any color you'd like
pre-commit install -c .hooks/.pre-commit-config.yaml
- Clean up, better docs