This GitHub repo accompanies the paper titled Evaluating Out-of-Distribution Performance on Document Image Classifiers and the dataset(s) described therein. The paper was published in the NeurIPS track on Datasets and Benchmarks in 2022. The data descibed in that paper is an out-of-distribution companion dataset for the RVL-CDIP document image classification dataset.
A link to the dataset(s) can be found here: https://tinyurl.com/4he6my23
The RVL-CDIP-N set has been put on Hugging Face's datasets platform by Jordy Van Landeghem here (note that these images are JPEG files).
If you wish to cite our paper and/or dataset(s), please use:
@article{larson-2022-rvl-cdip-ood,
author={Larson, Stefan and Lim, Gordon and Ai, Yutong and Kuang, David and Leach, Kevin},
title={Evaluating Out-of-Distribution Performance on Document Image Classifiers},
year={2022},
url={https://arxiv.org/pdf/2210.07448.pdf},
journal={arXiv preprint arXiv:2210.07448}
}