This repo contains the 10x Genomic datasets filtered using iIRAP and ICON denoising frameworks.
-
raw.csv contains positive examples of binding peptide-TCR pairs from the original 10x Genomic dataset;
-
ITRAP.csv and icon.csv are the result of filtering the original raw.csv using ITRAP and ICON, respectively;
-
raw_train.csv, ITRAP_train.csv and icon_train.csv are the datasets used to train the relative models. Here, the positive data points are derived from the above-described datasets. The negative data is generated by paring the positive TCRs to a peptide different from its target cognate (denoted as swapped negatives). The negative set is augmented by adding negative control TCRs from https://github.com/viragbioinfo/IMMREP_2022_TCRSpecificity. The data is randomly split into 5 partitions fot 5-fold cross-validation. Here, the list of epitopes is restricted to 4 (GILGFVFTL, GLCTLVAML, ELAGIGILTV, IVTDFSVIK);
-
vdj_eval.csv contains an evaluation set derived from VDJdb. Also here, the positive TCRs reger to 4 epitopes (GILGFVFTL, GLCTLVAML, ELAGIGILTV, IVTDFSVIK) and the negative data is generated by swapping positive TCR-peptide pairs and augmented by adding negative controls.