This document describes the contents of the split for SQuAD-T dataset.
Following https://stanford-qa.com/ , we re-distribute this data split under the CC BY-SA 4.0 license(https://creativecommons.org/licenses/by-sa/4.0/legalcode).
The structure of this release is:
---train.json
---dev.json
---test.json
The release contains the following data: 86,830 question-passage pairs for training, 5,825 for development, and 5,825 for test.
If you find this data useful in your research, please consider citing:
@inproceedings{tan2018know,
title={I Know There Is No Answer: Modeling Answer Validation for Machine Reading Comprehension},
author={Tan, Chuanqi and Wei, Furu and Zhou, Qingyu and Yang, Nan and Lv, Weifeng and Zhou, Ming},
booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
pages={85--97},
year={2018},
organization={Springer}
}