TPU Graph

Overview

This repo contains the code used for the 3rd place solution of the Google - Fast or Slow? Predict AI Model Runtime competition on Kaggle.

The final submission was a combination of different networks trained at different stages during the development of this repo. For completeness, all submissions have separate branches. However, the state of the repo during the development was sometimes quite messy. The codes of the submissions follow all the same pattern, but use slightly different network architectures, conventions etc. Experiments show that the main branch achieves the same or better results for the layout and is therefore recommended. Note that the Kendall's Tau score is a very noisy evaluation metric. Training the same network in the same way can lead to a difference of 0.1 in the score. One has to be careful when comparing different networks and always train them multiple times.

The main branch can only be used for the layout collection. The tile collection has its own branch and does not need any special preparation.

Installation

This package was developed and tested with Python 3.11. You might want to install pytorch and torch_scatter manually with pre-build wheels using the same CUDA version. Otherwise, the installation is in theory as easy as

pip install -e .

Dependencies

This repo depends on pytorch which should be installed with GPU support.

Usage

The package uses the data of the competition and has two scripts in the scripts folder. The first one add_features.py extracts additional features from the data and adds some derived features. In detail, it does:

Extract additional features from the protocol buffer files
Add positional encodings using RWPE with the asymmetric and symmetric adjacency matrices
Logs features with large dynamic range
Creates new features from the 30 dimensional features in the form of x % 128 and (x // 128) / 10, where x is the original feature. This is done because of the register size of the TPU.
Add a virtual output node to the graph that connects all nodes with outputs.

It requires pointers to directories containing the protocol buffer files and the npz files. You can have a look at the full signature with python scripts/add_features.py --help. Note that the RWPE can take a while and use a lot of memory for the larger graph.

The second script train.py trains a network on the data. You can have a look at the signature with python scripts/train.py --help. The script requires a path to a directory containing the npz files generated with add_features.py. After every epoch, validation and test set are evaluated and saved along with the model. If there are multiple GPUs available, one can specify the number of GPUs to use. Note that the backend and master port are hard-coded in the script and might need to be changed.

Development

Hooks

Pre-commit hooks come in any color you'd like

pre-commit install -c .hooks/.pre-commit-config.yaml

TODO

Clean up, better docs

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
.hooks		.hooks
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TPU Graph

Overview

Installation

Dependencies

Usage

Development

Hooks

TODO

About

Releases

Packages

Languages

jafluri/kaggle_tpu_graph

Folders and files

Latest commit

History

Repository files navigation

TPU Graph

Overview

Installation

Dependencies

Usage

Development

Hooks

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages