STAPLER

Getting Started

Installation

Clone the repo

git clone https://github.com/NKI-AI/STAPLER.git

Navigate to the directory containing setup.py
```
cd STAPLER
```
Install the STAPLER package (should take less than 10 minutes)
```
 python -m pip install .
```

Data and model checkpoints

The following data is available here:

data/train:
- TCR and peptide datasets used to pre-train STAPLER
- The training dataset
- 5 train and validation folds created from the training dataset used to fine-tune STAPLER
data/test:
- VDJDB+ and VDJDB+ETN test datasets used to test STAPLER
model/pretrained_model:
- 1 pre-trained model checkpoint
model/finetuned_model:
- 5 fine-tuned model checkpoints (one for each fold).
predictions:
- 5 predictions for each fold on the VDJDB+ETN test set
- 1 ensembled prediction of all 5-fold predictions on the VDJDB+ETN test set

Requirements

STAPLER was pre-trained and fine-tuned using an a100 GPU. At this moment no other GPU's have been tested.

Setup

Inside the tools directory the following file should be changed:

.env.example: Which is an environment file with paths to data, model checkpoints and output paths. It should be adapt the .env.example to your local file-system and then change the file-name to .env. For more information see python-dotenv.

Usage

Pre-training, fine-tuning and testing of STAPLER

Inside the tools directory contains the following files to pre-train, fine-tune and/or test STAPLER on a SLURM cluster. Also provide an argument to --partition to specify the partition to use.

sbatch pretrain_STAPLER.sh: Pre-train STAPLER.
sbatch train_STAPLER.sh: Fine-tune STAPLER using 5-fold cross-validation.
sbatch test_STAPLER.sh: Test on a test set using a fine-tuned model checkpoint.

Alternatively run STAPLER directly on a machine with an appropriate GPU (see requirements).

python pretrain.py: Pre-train STAPLER.
python train_5_fold.py: Fine-tune STAPLER using 5-fold cross-validation.
python test.py: Test on a test set using a fine-tuned model checkpoint.

Required GPU time

The pre-training should take a day, fine-tuning should take a couple of hours per fold and testing/inference should take a couple of minutes for all 5-fold predictions.

Custom parameters

To experiment with custom model parameters change the paramteres inside the config directory (implemented using Hydra). The config directory contains the following main configuration files:

pretrain.yaml: Configuration parameters file for pre-training.
train_5_fold.yaml: Configuration parameters file for fine-tuning.
test.yaml: Configuration parameters file for testing.

Issues and feature requests

To request a feature or to discuss any issues, please let us know by opening an issue on the issues page.

The notebooks used to make the pre-print figures will be available soon

Contact

Corresponding author: Ton Schumacher

Ton Schumacher group (NKI) - Group website -

Ai for Oncology group (NKI) - Group website -

Acknowledgments

The development of the STAPLER model is the result of a collaboration between the Schumacher lab AIforOncology lab at the Netherlands Cancer Institute. The following people contributed to the development of the model:

Bjørn Kwee (implementation, development, evaluation, refactoring)
Marius Messemaker (supervision, development, evaluation)
Eric Marcus (supervision, refactoring)
Wouter Scheper (supervision)
Jonas Teuwen (supervision)
Ton Schumacher (supervision)

A part of the data was provided, and consequent results were interpreted by the following people from the Wu lab (DFCI and Harvard Medical School):

Giacomo Oliveira
Catherine Wu

STAPLER is built on top of the x-transformers package

License

Distributed under the Apache 2.0 License.

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

STAPLER

Getting Started

Installation

Data and model checkpoints

Requirements

Setup

Usage

Pre-training, fine-tuning and testing of STAPLER

Required GPU time

Custom parameters

Issues and feature requests

Contact

Acknowledgments

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

STAPLER

Getting Started

Installation

Data and model checkpoints

Requirements

Setup

Usage

Pre-training, fine-tuning and testing of STAPLER

Required GPU time

Custom parameters

Issues and feature requests

Contact

Acknowledgments

License