Spawrious is a challenging OOD image classification benchmark (link to paper). It consists of 6 separate OOD challenges split into two types: one-to-one and many-to-many spurious correlation challenges.
The dataset contains images of 4 dog breeds, found in 6 locations. The entire dataset consists of ~152,000 images, but each challenge only requires a subset of this. As a result, the repo allows users to only download the mimimal dataset required for a given spawrious challenge.
Datasets take the following names:
entire_dataset
o2o_easy
o2o_medium
o2o_hard
m2m_easy
m2m_medium
m2m_hard
Running the command below retrieves the appropriate dataset at a user specified user directory (and downloads the dataset if not available), trains a resnet18, and evaluates the results on the OOD test set.
python example.py --data_dir <path to data dir> --dataset <one of the list above>
pip install git+https://github.com/aengusl/spawrious.git
from spawrious.torch import get_spawrious_dataset
# spawrious.tf if using tensorflow or jax
dataset = "m2m_medium"
data_dir = ".data/"
val_split = 0.2
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
spawrious = get_spawrious_dataset(dataset_name=dataset, root_dir=data_dir)
train_set = spawrious.get_train_dataset()
test_set = spawrious.get_test_dataset()
val_size = int(len(train_set) * val_split)
train_set, val_set = torch.utils.data.random_split(
train_set, [len(train_set) - val_size, val_size]
)
If you want to generate your own data, or understand how we generated ours, take a look at generate_dataset.py
. To run this file, you additionally need to install diffusers
and transformers
.
@misc{lynch2023spawrious,
title={Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases},
author={Aengus Lynch and Gbètondji J-S Dovonon and Jean Kaddour and Ricardo Silva},
year={2023},
eprint={2303.05470},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This work is licensed under a Creative Commons Attribution 4.0 International License.